WorldWideScience

Sample records for protein structural features

  1. Understanding Protein-Protein Interactions Using Local Structural Features

    DEFF Research Database (Denmark)

    Planas-Iglesias, Joan; Bonet, Jaume; García-García, Javier

    2013-01-01

    Protein-protein interactions (PPIs) play a relevant role among the different functions of a cell. Identifying the PPI network of a given organism (interactome) is useful to shed light on the key molecular mechanisms within a biological system. In this work, we show the role of structural features...... interacting and non-interacting protein pairs to classify the structural features that sustain the binding (or non-binding) behavior. Our study indicates that not only the interacting region but also the rest of the protein surface are important for the interaction fate. The interpretation...... to score the likelihood of the interaction between two proteins and to develop a method for the prediction of PPIs. We have tested our method on several sets with unbalanced ratios of interactions and non-interactions to simulate real conditions, obtaining accuracies higher than 25% in the most unfavorable...

  2. STRUCTURAL FEATURES OF PLANT CHITINASES AND CHITIN-BINDING PROTEINS

    NARCIS (Netherlands)

    BEINTEMA, JJ

    1994-01-01

    Structural features of plant chitinases and chitin-binding proteins are discussed. Many of these proteins consist of multiple domains,of which the chitin-binding hevein domain is a predominant one. X-ray and NMR structures of representatives of the major classes of these proteins are available now,

  3. Linking structural features of protein complexes and biological function.

    Science.gov (United States)

    Sowmya, Gopichandran; Breen, Edmond J; Ranganathan, Shoba

    2015-09-01

    Protein-protein interaction (PPI) establishes the central basis for complex cellular networks in a biological cell. Association of proteins with other proteins occurs at varying affinities, yet with a high degree of specificity. PPIs lead to diverse functionality such as catalysis, regulation, signaling, immunity, and inhibition, playing a crucial role in functional genomics. The molecular principle of such interactions is often elusive in nature. Therefore, a comprehensive analysis of known protein complexes from the Protein Data Bank (PDB) is essential for the characterization of structural interface features to determine structure-function relationship. Thus, we analyzed a nonredundant dataset of 278 heterodimer protein complexes, categorized into major functional classes, for distinguishing features. Interestingly, our analysis has identified five key features (interface area, interface polar residue abundance, hydrogen bonds, solvation free energy gain from interface formation, and binding energy) that are discriminatory among the functional classes using Kruskal-Wallis rank sum test. Significant correlations between these PPI interface features amongst functional categories are also documented. Salt bridges correlate with interface area in regulator-inhibitors (r = 0.75). These representative features have implications for the prediction of potential function of novel protein complexes. The results provide molecular insights for better understanding of PPIs and their relation to biological functions. © 2015 The Protein Society.

  4. Critical Features of Fragment Libraries for Protein Structure Prediction.

    Science.gov (United States)

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  5. FeatureMap3D - a tool to map protein features and sequence conservation onto homologous structures in the PDB

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Rapacki, Krzysztof; Stærfeldt, Hans Henrik

    2006-01-01

    FeatureMap3D is a web-based tool that maps protein features onto 3D structures. The user provides sequences annotated with any feature of interest, such as post-translational modifications, protease cleavage sites or exonic structure and FeatureMap3D will then search the Protein Data Bank (PDB) f...

  6. An automated approach to network features of protein structure ensembles

    Science.gov (United States)

    Bhattacharyya, Moitrayee; Bhat, Chanda R; Vishveshwara, Saraswathi

    2013-01-01

    Network theory applied to protein structures provides insights into numerous problems of biological relevance. The explosion in structural data available from PDB and simulations establishes a need to introduce a standalone-efficient program that assembles network concepts/parameters under one hood in an automated manner. Herein, we discuss the development/application of an exhaustive, user-friendly, standalone program package named PSN-Ensemble, which can handle structural ensembles generated through molecular dynamics (MD) simulation/NMR studies or from multiple X-ray structures. The novelty in network construction lies in the explicit consideration of side-chain interactions among amino acids. The program evaluates network parameters dealing with topological organization and long-range allosteric communication. The introduction of a flexible weighing scheme in terms of residue pairwise cross-correlation/interaction energy in PSN-Ensemble brings in dynamical/chemical knowledge into the network representation. Also, the results are mapped on a graphical display of the structure, allowing an easy access of network analysis to a general biological community. The potential of PSN-Ensemble toward examining structural ensemble is exemplified using MD trajectories of an ubiquitin-conjugating enzyme (UbcH5b). Furthermore, insights derived from network parameters evaluated using PSN-Ensemble for single-static structures of active/inactive states of β2-adrenergic receptor and the ternary tRNA complexes of tyrosyl tRNA synthetases (from organisms across kingdoms) are discussed. PSN-Ensemble is freely available from http://vishgraph.mbu.iisc.ernet.in/PSN-Ensemble/psn_index.html. PMID:23934896

  7. Structural features that predict real-value fluctuations of globular proteins.

    Science.gov (United States)

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  8. Common structural features of cholesterol binding sites in crystallized soluble proteins.

    Science.gov (United States)

    Bukiya, Anna N; Dopico, Alejandro M

    2017-06-01

    Cholesterol-protein interactions are essential for the architectural organization of cell membranes and for lipid metabolism. While cholesterol-sensing motifs in transmembrane proteins have been identified, little is known about cholesterol recognition by soluble proteins. We reviewed the structural characteristics of binding sites for cholesterol and cholesterol sulfate from crystallographic structures available in the Protein Data Bank. This analysis unveiled key features of cholesterol-binding sites that are present in either all or the majority of sites: i ) the cholesterol molecule is generally positioned between protein domains that have an organized secondary structure; ii ) the cholesterol hydroxyl/sulfo group is often partnered by Asn, Gln, and/or Tyr, while the hydrophobic part of cholesterol interacts with Leu, Ile, Val, and/or Phe; iii ) cholesterol hydrogen-bonding partners are often found on α-helices, while amino acids that interact with cholesterol's hydrophobic core have a slight preference for β-strands and secondary structure-lacking protein areas; iv ) the steroid's C21 and C26 constitute the "hot spots" most often seen for steroid-protein hydrophobic interactions; v ) common "cold spots" are C8-C10, C13, and C17, at which contacts with the proteins were not detected. Several common features we identified for soluble protein-steroid interaction appear evolutionarily conserved. Copyright © 2017 by the American Society for Biochemistry and Molecular Biology, Inc.

  9. Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

    Directory of Open Access Journals (Sweden)

    Taigang Liu

    2015-12-01

    Full Text Available The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE. These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

  10. Guiding exploration in conformational feature space with Lipschitz underestimation for ab-initio protein structure prediction.

    Science.gov (United States)

    Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen

    2018-04-01

    Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. GFP-like proteins as ubiquitous metazoan superfamily: evolution of functional features and structural complexity.

    Science.gov (United States)

    Shagin, Dmitry A; Barsova, Ekaterina V; Yanushevich, Yurii G; Fradkov, Arkady F; Lukyanov, Konstantin A; Labas, Yulii A; Semenova, Tatiana N; Ugalde, Juan A; Meyers, Ann; Nunez, Jose M; Widder, Edith A; Lukyanov, Sergey A; Matz, Mikhail V

    2004-05-01

    Homologs of the green fluorescent protein (GFP), including the recently described GFP-like domains of certain extracellular matrix proteins in Bilaterian organisms, are remarkably similar at the protein structure level, yet they often perform totally unrelated functions, thereby warranting recognition as a superfamily. Here we describe diverse GFP-like proteins from previously undersampled and completely new sources, including hydromedusae and planktonic Copepoda. In hydromedusae, yellow and nonfluorescent purple proteins were found in addition to greens. Notably, the new yellow protein seems to follow exactly the same structural solution to achieving the yellow color of fluorescence as YFP, an engineered yellow-emitting mutant variant of GFP. The addition of these new sequences made it possible to resolve deep-level phylogenetic relationships within the superfamily. Fluorescence (most likely green) must have already existed in the common ancestor of Cnidaria and Bilateria, and therefore GFP-like proteins may be responsible for fluorescence and/or coloration in virtually any animal. At least 15 color diversification events can be inferred following the maximum parsimony principle in Cnidaria. Origination of red fluorescence and nonfluorescent purple-blue colors on several independent occasions provides a remarkable example of convergent evolution of complex features at the molecular level.

  12. Insights into structural features determining odorant affinities to honey bee odorant binding protein 14.

    Science.gov (United States)

    Schwaighofer, Andreas; Pechlaner, Maria; Oostenbrink, Chris; Kotlowski, Caroline; Araman, Can; Mastrogiacomo, Rosa; Pelosi, Paolo; Knoll, Wolfgang; Nowak, Christoph; Larisika, Melanie

    2014-04-18

    Molecular interactions between odorants and odorant binding proteins (OBPs) are of major importance for understanding the principles of selectivity of OBPs towards the wide range of semiochemicals. It is largely unknown on a structural basis, how an OBP binds and discriminates between odorant molecules. Here we examine this aspect in greater detail by comparing the C-minus OBP14 of the honey bee (Apis mellifera L.) to a mutant form of the protein that comprises the third disulfide bond lacking in C-minus OBPs. Affinities of structurally analogous odorants featuring an aromatic phenol group with different side chains were assessed based on changes of the thermal stability of the protein upon odorant binding monitored by circular dichroism spectroscopy. Our results indicate a tendency that odorants show higher affinity to the wild-type OBP suggesting that the introduced rigidity in the mutant protein has a negative effect on odorant binding. Furthermore, we show that OBP14 stability is very sensitive to the position and type of functional groups in the odorant. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins

    Directory of Open Access Journals (Sweden)

    Walsh Ian

    2006-09-01

    Full Text Available Abstract Background We describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of Cα traces for short proteins (up to 200 amino acids. Results The servers are state-of-the-art, with secondary structure predicted correctly for nearly 80% of residues (currently the top performance on EVA, 2-class solvent accessibility nearly 80% correct, and contact maps exceeding 50% precision on the top non-diagonal contacts. A preliminary implementation of the predictor of protein Cα traces featured among the top 20 Novel Fold predictors at the last CASP6 experiment as group Distill (ID 0348. The majority of the servers, including the Cα trace predictor, now take into account homology information from the PDB, when available, resulting in greatly improved reliability. Conclusion All predictions are freely available through a simple joint web interface and the results are returned by email. In a single submission the user can send protein sequences for a total of up to 32k residues to all or a selection of the servers. Distill is accessible at the address: http://distill.ucd.ie/distill/.

  14. Prediction of protein structural features by use of artificial neural networks

    DEFF Research Database (Denmark)

    Petersen, Bent

    . There is a huge over-representation of DNA sequences when comparing the amount of experimentally verified proteins with the amount of DNA sequences. The academic and industrial research community therefore has to rely on structure predictions instead of waiting for the time consuming experimentally determined...

  15. Platyhelminth Venom Allergen-Like (VAL) proteins: revealing structural diversity, class-specific features and biological associations across the phylum

    Science.gov (United States)

    CHALMERS, IAIN W.; HOFFMANN, KARL F.

    2012-01-01

    SUMMARY During platyhelminth infection, a cocktail of proteins is released by the parasite to aid invasion, initiate feeding, facilitate adaptation and mediate modulation of the host immune response. Included amongst these proteins is the Venom Allergen-Like (VAL) family, part of the larger sperm coating protein/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS) superfamily. To explore the significance of this protein family during Platyhelminthes development and host interactions, we systematically summarize all published proteomic, genomic and immunological investigations of the VAL protein family to date. By conducting new genomic and transcriptomic interrogations to identify over 200 VAL proteins (228) from species in all 4 traditional taxonomic classes (Trematoda, Cestoda, Monogenea and Turbellaria), we further expand our knowledge related to platyhelminth VAL diversity across the phylum. Subsequent phylogenetic and tertiary structural analyses reveal several class-specific VAL features, which likely indicate a range of roles mediated by this protein family. Our comprehensive analysis of platyhelminth VALs represents a unifying synopsis for understanding diversity within this protein family and a firm context in which to initiate future functional characterization of these enigmatic members. PMID:22717097

  16. Fascin- and α-Actinin-Bundled Networks Contain Intrinsic Structural Features that Drive Protein Sorting.

    Science.gov (United States)

    Winkelman, Jonathan D; Suarez, Cristian; Hocky, Glen M; Harker, Alyssa J; Morganthaler, Alisha N; Christensen, Jenna R; Voth, Gregory A; Bartles, James R; Kovar, David R

    2016-10-24

    Cells assemble and maintain functionally distinct actin cytoskeleton networks with various actin filament organizations and dynamics through the coordinated action of different sets of actin-binding proteins. The biochemical and functional properties of diverse actin-binding proteins, both alone and in combination, have been increasingly well studied. Conversely, how different sets of actin-binding proteins properly sort to distinct actin filament networks in the first place is not nearly as well understood. Actin-binding protein sorting is critical for the self-organization of diverse dynamic actin cytoskeleton networks within a common cytoplasm. Using in vitro reconstitution techniques including biomimetic assays and single-molecule multi-color total internal reflection fluorescence microscopy, we discovered that sorting of the prominent actin-bundling proteins fascin and α-actinin to distinct networks is an intrinsic behavior, free of complicated cellular signaling cascades. When mixed, fascin and α-actinin mutually exclude each other by promoting their own recruitment and inhibiting recruitment of the other, resulting in the formation of distinct fascin- or α-actinin-bundled domains. Subdiffraction-resolution light microscopy and negative-staining electron microscopy revealed that fascin domains are densely packed, whereas α-actinin domains consist of widely spaced parallel actin filaments. Importantly, other actin-binding proteins such as fimbrin and espin show high specificity between these two bundle types within the same reaction. Here we directly observe that fascin and α-actinin intrinsically segregate to discrete bundled domains that are specifically recognized by other actin-binding proteins. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Structural-Functional Features of the Thyrotropin Receptor: A Class A G-Protein-Coupled Receptor at Work.

    Science.gov (United States)

    Kleinau, Gunnar; Worth, Catherine L; Kreuchwig, Annika; Biebermann, Heike; Marcinkowski, Patrick; Scheerer, Patrick; Krause, Gerd

    2017-01-01

    The thyroid-stimulating hormone receptor (TSHR) is a member of the glycoprotein hormone receptors, a sub-group of class A G-protein-coupled receptors (GPCRs). TSHR and its endogenous ligand thyrotropin (TSH) are of essential importance for growth and function of the thyroid gland and proper function of the TSH/TSHR system is pivotal for production and release of thyroid hormones. This receptor is also important with respect to pathophysiology, such as autoimmune (including ophthalmopathy) or non-autoimmune thyroid dysfunctions and cancer development. Pharmacological interventions directly targeting the TSHR should provide benefits to disease treatment compared to currently available therapies of dysfunctions associated with the TSHR or the thyroid gland. Upon TSHR activation, the molecular events conveying conformational changes from the extra- to the intracellular side of the cell across the membrane comprise reception, conversion, and amplification of the signal. These steps are highly dependent on structural features of this receptor and its intermolecular interaction partners, e.g., TSH, antibodies, small molecules, G-proteins, or arrestin. For better understanding of signal transduction, pathogenic mechanisms such as autoantibody action and mutational modifications or for developing new pharmacological strategies, it is essential to combine available structural data with functional information to generate homology models of the entire receptor. Although so far these insights are fragmental, in the past few decades essential contributions have been made to investigate in-depth the involved determinants, such as by structure determination via X-ray crystallography. This review summarizes available knowledge (as of December 2016) concerning the TSHR protein structure, associated functional aspects, and based on these insights we suggest several receptor complex models. Moreover, distinct TSHR properties will be highlighted in comparison to other class A GPCRs to

  18. Unique Features of Halophilic Proteins.

    Science.gov (United States)

    Arakawa, Tsutomu; Yamaguchi, Rui; Tokunaga, Hiroko; Tokunaga, Masao

    2017-01-01

    Proteins from moderate and extreme halophiles have unique characteristics. They are highly acidic and hydrophilic, similar to intrinsically disordered proteins. These characteristics make the halophilic proteins soluble in water and fold reversibly. In addition to reversible folding, the rate of refolding of halophilic proteins from denatured structure is generally slow, often taking several days, for example, for extremely halophilic proteins. This slow folding rate makes the halophilic proteins a novel model system for folding mechanism analysis. High solubility and reversible folding also make the halophilic proteins excellent fusion partners for soluble expression of recombinant proteins.

  19. Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach

    Directory of Open Access Journals (Sweden)

    Khader Shameer

    2010-06-01

    Full Text Available 3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neuro-degenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.

  20. A DNA Structural Alphabet Distinguishes Structural Features of DNA Bound to Regulatory Proteins and in the Nucleosome Core Particle

    Czech Academy of Sciences Publication Activity Database

    Schneider, Bohdan; Bozikova, Paulina; Čech, P.; Svozil, D.; Černý, Jiří

    2017-01-01

    Roč. 8, č. 10 (2017), č. článku 278. ISSN 2073-4425 R&D Projects: GA MŠk(CZ) ED1.1.00/02.0109 Grant - others:GA MŠk(CZ) EF16_013/0001777 Institutional support: RVO:86652036 Keywords : DNA * DNA-protein recognition * transcription factors Subject RIV: EB - Genetics ; Molecular Biology OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) Impact factor: 3.600, year: 2016

  1. Structures composing protein domains.

    Science.gov (United States)

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel; Hudeček, Jiří

    2013-08-01

    This review summarizes available data concerning intradomain structures (IS) such as functionally important amino acid residues, short linear motifs, conserved or disordered regions, peptide repeats, broadly occurring secondary structures or folds, etc. IS form structural features (units or elements) necessary for interactions with proteins or non-peptidic ligands, enzyme reactions and some structural properties of proteins. These features have often been related to a single structural level (e.g. primary structure) mostly requiring certain structural context of other levels (e.g. secondary structures or supersecondary folds) as follows also from some examples reported or demonstrated here. In addition, we deal with some functionally important dynamic properties of IS (e.g. flexibility and different forms of accessibility), and more special dynamic changes of IS during enzyme reactions and allosteric regulation. Selected notes concern also some experimental methods, still more necessary tools of bioinformatic processing and clinically interesting relationships. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  2. Proteínas quinases: características estruturais e inibidores químicos Kinase protein: structural features and chemical inhibitors

    Directory of Open Access Journals (Sweden)

    Bárbara V. Silva

    2009-01-01

    Full Text Available Protein kinases are one of the largest protein families and they are responsible for regulation of a great number of signal transduction pathways in cells, through the phosphorylation of serine, threonine, or tyrosine residues. Deregulation of these enzymes is associated with several diseases including cancer, diabetes and inflammation. For this reason, specific inhibition of tyrosine or serine/threonine kinases may represent an interesting therapeutic approach. The most important types of protein kinases, their structural features and chemical inhibitors are discussed in this paper. Emphasis is given to the small-molecule drugs that target the ATP-binding sites of these enzymes.

  3. A Novel Method Using Abstract Convex Underestimation in Ab-Initio Protein Structure Prediction for Guiding Search in Conformational Feature Space.

    Science.gov (United States)

    Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng

    2016-01-01

    To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more

  4. Thermodynamic database for proteins: features and applications.

    Science.gov (United States)

    Gromiha, M Michael; Sarai, Akinori

    2010-01-01

    We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.

  5. Features and Recursive Structure

    Directory of Open Access Journals (Sweden)

    Kuniya Nasukawa

    2015-01-01

    Full Text Available Based on the cross-linguistic tendency that weak vowels are realized with a central quality such as ə, ɨ, or ɯ, this paper attempts to account for this choice by proposing that the nucleus itself is one of the three monovalent vowel elements |A|, |I| and |U| which function as the building blocks of melodic structure. I claim that individual languages make a parametric choice to determine which of the three elements functions as the head of a nuclear expression. In addition, I show that elements can be freely concatenated to create melodic compounds. The resulting phonetic value of an element compound is determined by the specific elements it contains and by the head-dependency relations between those elements. This concatenation-based recursive mechanism of melodic structure can also be extended to levels above the segment, thus ultimately eliminating the need for syllabic constituents. This approach reinterprets the notion of minimalism in phonology by opposing the string-based flat structure.

  6. Structural features of PhoX, one of the phosphate-binding proteins from Pho regulon of Xanthomonas citri

    Science.gov (United States)

    Pegos, Vanessa R.; Santos, Rodrigo M. L.; Medrano, Francisco J.

    2017-01-01

    In Escherichia coli, the ATP-Binding Cassette transporter for phosphate is encoded by the pstSCAB operon. PstS is the periplasmic component responsible for affinity and specificity of the system and has also been related to a regulatory role and chemotaxis during depletion of phosphate. Xanthomonas citri has two phosphate-binding proteins: PstS and PhoX, which are differentially expressed under phosphate limitation. In this work, we focused on PhoX characterization and comparison with PstS. The PhoX three-dimensional structure was solved in a closed conformation with a phosphate engulfed in the binding site pocket between two domains. Comparison between PhoX and PstS revealed that they originated from gene duplication, but despite their similarities they show significant differences in the region that interacts with the permeases. PMID:28542513

  7. Structural Features Reminiscent of ATP-Driven Protein Translocases Are Essential for the Function of a Type III Secretion-Associated ATPase.

    Science.gov (United States)

    Kato, Junya; Lefebre, Matthew; Galán, Jorge E

    2015-09-01

    Many bacterial pathogens and symbionts utilize type III secretion systems to interact with their hosts. These machines have evolved to deliver bacterial effector proteins into eukaryotic target cells to modulate a variety of cellular functions. One of the most conserved components of these systems is an ATPase, which plays an essential role in the recognition and unfolding of proteins destined for secretion by the type III pathway. Here we show that structural features reminiscent of other ATP-driven protein translocases are essential for the function of InvC, the ATPase associated with a Salmonella enterica serovar Typhimurium type III secretion system. Mutational and functional analyses showed that a two-helix-finger motif and a conserved loop located at the entrance of and within the predicted pore formed by the hexameric ATPase are essential for InvC function. These findings provide mechanistic insight into the function of this highly conserved component of type III secretion machines. Type III secretion machines are essential for the virulence or symbiotic relationships of many bacteria. These machines have evolved to deliver bacterial effector proteins into host cells to modulate cellular functions, thus facilitating bacterial colonization and replication. An essential component of these machines is a highly conserved ATPase, which is necessary for the recognition and secretion of proteins destined to be delivered by the type III secretion pathway. Using modeling and structure and function analyses, we have identified structural features of one of these ATPases from Salmonella enterica serovar Typhimurium that help to explain important aspects of its function. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  8. Novel structural features drive DNA binding properties of Cmr, a CRP family protein in TB complex mycobacteria.

    Science.gov (United States)

    Ranganathan, Sridevi; Cheung, Jonah; Cassidy, Michael; Ginter, Christopher; Pata, Janice D; McDonough, Kathleen A

    2018-01-09

    Mycobacterium tuberculosis (Mtb) encodes two CRP/FNR family transcription factors (TF) that contribute to virulence, Cmr (Rv1675c) and CRPMt (Rv3676). Prior studies identified distinct chromosomal binding profiles for each TF despite their recognizing overlapping DNA motifs. The present study shows that Cmr binding specificity is determined by discriminator nucleotides at motif positions 4 and 13. X-ray crystallography and targeted mutational analyses identified an arginine-rich loop that expands Cmr's DNA interactions beyond the classical helix-turn-helix contacts common to all CRP/FNR family members and facilitates binding to imperfect DNA sequences. Cmr binding to DNA results in a pronounced asymmetric bending of the DNA and its high level of cooperativity is consistent with DNA-facilitated dimerization. A unique N-terminal extension inserts between the DNA binding and dimerization domains, partially occluding the site where the canonical cAMP binding pocket is found. However, an unstructured region of this N-terminus may help modulate Cmr activity in response to cellular signals. Cmr's multiple levels of DNA interaction likely enhance its ability to integrate diverse gene regulatory signals, while its novel structural features establish Cmr as an atypical CRP/FNR family member. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Structural and functional features of self-assembling protein nanoparticles produced in endotoxin-free Escherichia coli.

    Science.gov (United States)

    Rueda, Fabián; Céspedes, María Virtudes; Sánchez-Chardi, Alejandro; Seras-Franzoso, Joaquin; Pesarrodona, Mireia; Ferrer-Miralles, Neus; Vázquez, Esther; Rinas, Ursula; Unzueta, Ugutz; Mamat, Uwe; Mangues, Ramón; García-Fruitós, Elena; Villaverde, Antonio

    2016-04-08

    Production of recombinant drugs in process-friendly endotoxin-free bacterial factories targets to a lessened complexity of the purification process combined with minimized biological hazards during product application. The development of nanostructured recombinant materials in innovative nanomedical activities expands such a need beyond plain functional polypeptides to complex protein assemblies. While Escherichia coli has been recently modified for the production of endotoxin-free proteins, no data has been so far recorded regarding how the system performs in the fabrication of smart nanostructured materials. We have here explored the nanoarchitecture and in vitro and in vivo functionalities of CXCR4-targeted, self-assembling protein nanoparticles intended for intracellular delivery of drugs and imaging agents in colorectal cancer. Interestingly, endotoxin-free materials exhibit a distinguishable architecture and altered size and target cell penetrability than counterparts produced in conventional E. coli strains. These variant nanoparticles show an eventual proper biodistribution and highly specific and exclusive accumulation in tumor upon administration in colorectal cancer mice models, indicating a convenient display and function of the tumor homing peptides and high particle stability under physiological conditions. The observations made here support the emerging endotoxin-free E. coli system as a robust protein material producer but are also indicative of a particular conformational status and organization of either building blocks or oligomers. This appears to be promoted by multifactorial stress-inducing conditions upon engineering of the E. coli cell envelope, which impacts on the protein quality control of the cell factory.

  10. Feature generation and representations for protein-protein interaction classification.

    Science.gov (United States)

    Lan, Man; Tan, Chew Lim; Su, Jian

    2009-10-01

    Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.

  11. Functionality of system components: Conservation of protein function in protein feature space

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Ussery, David; Brunak, Søren

    2003-01-01

    well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea......Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating...... they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform...

  12. Bioinorganic Chemistry of Parkinson's Disease: Affinity and Structural Features of Cu(I) Binding to the Full-Length β-Synuclein Protein.

    Science.gov (United States)

    Miotto, Marco C; Pavese, Mayra D; Quintanar, Liliana; Zweckstetter, Markus; Griesinger, Christian; Fernández, Claudio O

    2017-09-05

    Alterations in the levels of copper in brain tissue and formation of α-synuclein (αS)-copper complexes might play a key role in the amyloid aggregation of αS and the onset of Parkinson's disease (PD). Recently, we demonstrated that formation of the high-affinity Cu(I) complex with the N-terminally acetylated form of the protein αS substantially increases and stabilizes local conformations with α-helical secondary structure and restricted motility. In this work, we performed a detailed NMR-based structural characterization of the Cu(I) complexes with the full-length acetylated form of its homologue β-synuclein (βS), which is colocalized with αS in vivo and can bind copper ions. Our results show that, similarly to αS, the N-terminal region of βS constitutes the preferential binding interface for Cu(I) ions, encompassing two independent and noninteractive Cu(I) binding sites. According to these results, βS binds the metal ion with higher affinity than αS, in a coordination environment that involves the participation of Met-1, Met-5, and Met-10 residues (site 1). Compared to αS, the shift of His from position 50 to 65 in the N-terminal region of βS does not change the Cu(I) affinity features at that site (site 2). Interestingly, the formation of the high-affinity βS-Cu(I) complex at site 1 in the N-terminus promotes a short α-helix conformation that is restricted to the 1-5 segment of the AcβS sequence, which differs with the substantial increase in α-helix conformations seen for N-terminally acetylated αS upon Cu(I) complexation. Our NMR data demonstrate conclusively that the differences observed in the conformational transitions triggered by Cu(I) binding to AcαS and AcβS find a correlation at the level of their backbone dynamic properties; added to the potential biological implications of these findings, this fact opens new avenues of investigations into the bioinorganic chemistry of PD.

  13. Structural mapping of the coiled-coil domain of a bacterial condensin and comparative analyses across all domains of life suggest conserved features of SMC proteins.

    Science.gov (United States)

    Waldman, Vincent M; Stanage, Tyler H; Mims, Alexandra; Norden, Ian S; Oakley, Martha G

    2015-06-01

    The structural maintenance of chromosomes (SMC) proteins form the cores of multisubunit complexes that are required for the segregation and global organization of chromosomes in all domains of life. These proteins share a common domain structure in which N- and C- terminal regions pack against one another to form a globular ATPase domain. This "head" domain is connected to a central, globular, "hinge" or dimerization domain by a long, antiparallel coiled coil. To date, most efforts for structural characterization of SMC proteins have focused on the globular domains. Recently, however, we developed a method to map interstrand interactions in the 50-nm coiled-coil domain of MukB, the divergent SMC protein found in γ-proteobacteria. Here, we apply that technique to map the structure of the Bacillus subtilis SMC (BsSMC) coiled-coil domain. We find that, in contrast to the relatively complicated coiled-coil domain of MukB, the BsSMC domain is nearly continuous, with only two detectable coiled-coil interruptions. Near the middle of the domain is a break in coiled-coil structure in which there are three more residues on the C-terminal strand than on the N-terminal strand. Close to the head domain, there is a second break with a significantly longer insertion on the same strand. These results provide an experience base that allows an informed interpretation of the output of coiled-coil prediction algorithms for this family of proteins. A comparison of such predictions suggests that these coiled-coil deviations are highly conserved across SMC types in a wide variety of organisms, including humans. © 2015 Wiley Periodicals, Inc.

  14. Archaeal MCM Proteins as an Analog for the Eukaryotic Mcm2–7 Helicase to Reveal Essential Features of Structure and Function

    Science.gov (United States)

    Miller, Justin M.; Enemark, Eric J.

    2015-01-01

    In eukaryotes, the replicative helicase is the large multisubunit CMG complex consisting of the Mcm2–7 hexameric ring, Cdc45, and the tetrameric GINS complex. The Mcm2–7 ring assembles from six different, related proteins and forms the core of this complex. In archaea, a homologous MCM hexameric ring functions as the replicative helicase at the replication fork. Archaeal MCM proteins form thermostable homohexamers, facilitating their use as models of the eukaryotic Mcm2–7 helicase. Here we review archaeal MCM helicase structure and function and how the archaeal findings relate to the eukaryotic Mcm2–7 ring. PMID:26539061

  15. Structural features of lignohumic acids

    Czech Academy of Sciences Publication Activity Database

    Novák, František; Šestauberová, Martina; Hrabal, R.

    2015-01-01

    Roč. 1093, August (2015), s. 179-185 ISSN 0022-2860 Institutional support: RVO:60077344 Keywords : C-13 NMR * FTIR * humic acids * lignohumate * lignosulfonate * structure Subject RIV: DF - Soil Science Impact factor: 1.780, year: 2015

  16. Extracting knowledge from protein structure geometry

    DEFF Research Database (Denmark)

    Røgen, Peter; Koehl, Patrice

    2013-01-01

    potential from geometric knowledge extracted from native and misfolded conformers of protein structures. This new potential, Metric Protein Potential (MPP), has two main features that are key to its success. Firstly, it is composite in that it includes local and nonlocal geometric information on proteins...

  17. A feature-based approach to modeling protein-protein interaction hot spots.

    Science.gov (United States)

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-05-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.

  18. Modularity in protein structures: study on all-alpha proteins.

    Science.gov (United States)

    Khan, Taushif; Ghosh, Indira

    2015-01-01

    Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.

  19. Protein Structure Prediction by Protein Threading

    Science.gov (United States)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  20. Structural insight and flexible features of NS5 proteins from all four serotypes of Dengue virus in solution

    Energy Technology Data Exchange (ETDEWEB)

    Saw, Wuan Geok; Tria, Giancarlo; Grüber, Ardina; Subramanian Manimekalai, Malathy Sony; Zhao, Yongqian; Chandramohan, Arun; Srinivasan Anand, Ganesh; Matsui, Tsutomu; Weiss, Thomas M.; Vasudevan, Subhash G.; Grüber, Gerhard

    2015-10-31

    Infection by the four serotypes ofDengue virus(DENV-1 to DENV-4) causes an important arthropod-borne viral disease in humans. The multifunctional DENV nonstructural protein 5 (NS5) is essential for capping and replication of the viral RNA and harbours a methyltransferase (MTase) domain and an RNA-dependent RNA polymerase (RdRp) domain. In this study, insights into the overall structure and flexibility of the entire NS5 of all fourDengue virusserotypes in solution are presented for the first time. The solution models derived revealed an arrangement of the full-length NS5 (NS5FL) proteins with the MTase domain positioned at the top of the RdRP domain. The DENV-1 to DENV-4 NS5 forms are elongated and flexible in solution, with DENV-4 NS5 being more compact relative to NS5 from DENV-1, DENV-2 and DENV-3. Solution studies of the individual MTase and RdRp domains show the compactness of the RdRp domain as well as the contribution of the MTase domain and the ten-residue linker region to the flexibility of the entire NS5. Swapping the ten-residue linker between DENV-4 NS5FL and DENV-3 NS5FL demonstrated its importance in MTase–RdRp communication and in concerted interaction with viral and host proteins, as probed by amide hydrogen/deuterium mass spectrometry. Conformational alterations owing to RNA binding are presented.

  1. Intumescent features of nucleic acids and proteins

    International Nuclear Information System (INIS)

    Alongi, Jenny; Cuttica, Fabio; Blasio, Alessandro Di; Carosio, Federico; Malucelli, Giulio

    2014-01-01

    Highlights: • The combustion resistance of DNA and caseins to different heat fluxes was studied. • Upon heating, DNA and caseins exhibited an intumescent behaviour. • The char derived from DNA was more stable and coherent than that from caseins. - Abstract: Are nucleic acids and proteins intumescent molecules? In order to get an answer, in the present manuscript, powders of deoxyribose nucleic acids (DNA) and caseins have been exposed to different heat fluxes under a cone calorimeter source and to the direct application of a propane flame. Under these conditions, DNA and caseins exhibited a typical intumescent behaviour, generating a coherent expanded cellular carbonaceous residue (char), extremely resistant to heat exposure. The resulting volumetric expansion as well as the resistance of the formed char turned out to be dependent on (i) the chemical structure of the chosen biomacromolecule, (ii) the evolution of ammonia and (iii) the adopted heat flux in cone calorimetry tests (namely, 25, 35, 50 and 75 kW/m 2 ). The presence of ribose units within the DNA backbone determined the formation of highly expanded and coherent residues as compared to those obtained from caseins. Indeed, under a heat flux of 35 kW/m 2 , when a carbon source (i.e. common cane sugar) was added to caseins, the resulting char was similar to that formed by DNA. Furthermore, the char expansion was ascribed to the evolution of ammonia released by these biomacromolecules upon heating, as detected by thermogravimetry coupled to infrared spectroscopy, and confirmed by scanning electron microscopy experiments performed on the bubbles present in the residues of flammability tests

  2. Intumescent features of nucleic acids and proteins

    Energy Technology Data Exchange (ETDEWEB)

    Alongi, Jenny, E-mail: jenny.alongi@polito.it; Cuttica, Fabio; Blasio, Alessandro Di; Carosio, Federico; Malucelli, Giulio

    2014-09-10

    Highlights: • The combustion resistance of DNA and caseins to different heat fluxes was studied. • Upon heating, DNA and caseins exhibited an intumescent behaviour. • The char derived from DNA was more stable and coherent than that from caseins. - Abstract: Are nucleic acids and proteins intumescent molecules? In order to get an answer, in the present manuscript, powders of deoxyribose nucleic acids (DNA) and caseins have been exposed to different heat fluxes under a cone calorimeter source and to the direct application of a propane flame. Under these conditions, DNA and caseins exhibited a typical intumescent behaviour, generating a coherent expanded cellular carbonaceous residue (char), extremely resistant to heat exposure. The resulting volumetric expansion as well as the resistance of the formed char turned out to be dependent on (i) the chemical structure of the chosen biomacromolecule, (ii) the evolution of ammonia and (iii) the adopted heat flux in cone calorimetry tests (namely, 25, 35, 50 and 75 kW/m{sup 2}). The presence of ribose units within the DNA backbone determined the formation of highly expanded and coherent residues as compared to those obtained from caseins. Indeed, under a heat flux of 35 kW/m{sup 2}, when a carbon source (i.e. common cane sugar) was added to caseins, the resulting char was similar to that formed by DNA. Furthermore, the char expansion was ascribed to the evolution of ammonia released by these biomacromolecules upon heating, as detected by thermogravimetry coupled to infrared spectroscopy, and confirmed by scanning electron microscopy experiments performed on the bubbles present in the residues of flammability tests.

  3. Filtering high-throughput protein-protein interaction data using a combination of genomic features

    Directory of Open Access Journals (Sweden)

    Patil Ashwini

    2005-04-01

    Full Text Available Abstract Background Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. Results In this study, we use a combination of 3 genomic featuresstructurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90% and good specificity (63%. We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at http://helix.protein.osaka-u.ac.jp/htp/. Conclusion A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.

  4. NAPS: Network Analysis of Protein Structures

    Science.gov (United States)

    Chakrabarty, Broto; Parekh, Nita

    2016-01-01

    Traditionally, protein structures have been analysed by the secondary structure architecture and fold arrangement. An alternative approach that has shown promise is modelling proteins as a network of non-covalent interactions between amino acid residues. The network representation of proteins provide a systems approach to topological analysis of complex three-dimensional structures irrespective of secondary structure and fold type and provide insights into structure-function relationship. We have developed a web server for network based analysis of protein structures, NAPS, that facilitates quantitative and qualitative (visual) analysis of residue–residue interactions in: single chains, protein complex, modelled protein structures and trajectories (e.g. from molecular dynamics simulations). The user can specify atom type for network construction, distance range (in Å) and minimal amino acid separation along the sequence. NAPS provides users selection of node(s) and its neighbourhood based on centrality measures, physicochemical properties of amino acids or cluster of well-connected residues (k-cliques) for further analysis. Visual analysis of interacting domains and protein chains, and shortest path lengths between pair of residues are additional features that aid in functional analysis. NAPS support various analyses and visualization views for identifying functional residues, provide insight into mechanisms of protein folding, domain-domain and protein–protein interactions for understanding communication within and between proteins. URL:http://bioinf.iiit.ac.in/NAPS/. PMID:27151201

  5. Understanding Structural Features of Microbial Lipases–-An Overview

    Directory of Open Access Journals (Sweden)

    John Geraldine Sandana Mala

    2008-01-01

    Full Text Available The structural elucidations of microbial lipases have been of prime interest since the 1980s. Knowledge of structural features plays an important role in designing and engineering lipases for specific purposes. Significant structural data have been presented for few microbial lipases, while, there is still a structure-deficit, that is, most lipase structures are yet to be resolved. A search for ‘lipase structure’ in the RCSB Protein Data Bank ( http://www.rcsb.org/pdb/ returns only 93 hits (as of September 2007 and, the NCBI database ( http://www.ncbi.nlm.nih.gov reports 89 lipase structures as compared to 14719 core nucleotide records. It is therefore worthwhile to consider investigations on the structural analysis of microbial lipases. This review is intended to provide a collection of resources on the instrumental, chemical and bioinformatics approaches for structure analyses. X-ray crystallography is a versatile tool for the structural biochemists and is been exploited till today. The chemical methods of recent interests include molecular modeling and combinatorial designs. Bioinformatics has surged striking interests in protein structural analysis with the advent of innumerable tools. Furthermore, a literature platform of the structural elucidations so far investigated has been presented with detailed descriptions as applicable to microbial lipases. A case study of Candida rugosa lipase (CRL has also been discussed which highlights important structural features also common to most lipases. A general profile of lipase has been vividly described with an overview of lipase research reviewed in the past.

  6. Heterochiral Knottin Protein: Folding and Solution Structure.

    Science.gov (United States)

    Mong, Surin K; Cochran, Frank V; Yu, Hongtao; Graziano, Zachary; Lin, Yu-Shan; Cochran, Jennifer R; Pentelute, Bradley L

    2017-10-31

    Homochirality is a general feature of biological macromolecules, and Nature includes few examples of heterochiral proteins. Herein, we report on the design, chemical synthesis, and structural characterization of heterochiral proteins possessing loops of amino acids of chirality opposite to that of the rest of a protein scaffold. Using the protein Ecballium elaterium trypsin inhibitor II, we discover that selective β-alanine substitution favors the efficient folding of our heterochiral constructs. Solution nuclear magnetic resonance spectroscopy of one such heterochiral protein reveals a homogeneous global fold. Additionally, steered molecular dynamics simulation indicate β-alanine reduces the free energy required to fold the protein. We also find these heterochiral proteins to be more resistant to proteolysis than homochiral l-proteins. This work informs the design of heterochiral protein architectures containing stretches of both d- and l-amino acids.

  7. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

    Science.gov (United States)

    Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

    2017-09-22

    DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  8. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

    Directory of Open Access Journals (Sweden)

    Kaiyang Qu

    2017-09-01

    Full Text Available DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF, is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  9. Feature-Based and String-Based Models for Predicting RNA-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Donald Adjeroh

    2018-03-01

    Full Text Available In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI. In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences, and structure information (protein and RNA secondary structures. This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.

  10. Feature Extraction for Structural Dynamics Model Validation

    Energy Technology Data Exchange (ETDEWEB)

    Farrar, Charles [Los Alamos National Laboratory; Nishio, Mayuko [Yokohama University; Hemez, Francois [Los Alamos National Laboratory; Stull, Chris [Los Alamos National Laboratory; Park, Gyuhae [Chonnam Univesity; Cornwell, Phil [Rose-Hulman Institute of Technology; Figueiredo, Eloi [Universidade Lusófona; Luscher, D. J. [Los Alamos National Laboratory; Worden, Keith [University of Sheffield

    2016-01-13

    As structural dynamics becomes increasingly non-modal, stochastic and nonlinear, finite element model-updating technology must adopt the broader notions of model validation and uncertainty quantification. For example, particular re-sampling procedures must be implemented to propagate uncertainty through a forward calculation, and non-modal features must be defined to analyze nonlinear data sets. The latter topic is the focus of this report, but first, some more general comments regarding the concept of model validation will be discussed.

  11. Simultaneous determination of protein structure and dynamics

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Best, Robert B.; DePristo, M. A.

    2005-01-01

    at the atomic level about the structural and dynamical features of proteins-with the ability of molecular dynamics simulations to explore a wide range of protein conformations. We illustrate the method for human ubiquitin in solution and find that there is considerable conformational heterogeneity throughout......We present a protocol for the experimental determination of ensembles of protein conformations that represent simultaneously the native structure and its associated dynamics. The procedure combines the strengths of nuclear magnetic resonance spectroscopy-for obtaining experimental information...... the protein structure. The interior atoms of the protein are tightly packed in each individual conformation that contributes to the ensemble but their overall behaviour can be described as having a significant degree of liquid-like character. The protocol is completely general and should lead to significant...

  12. Protein functional features are reflected in the patterns of mRNA translation speed.

    Science.gov (United States)

    López, Daniel; Pazos, Florencio

    2015-07-09

    The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.

  13. Protein structure based prediction of catalytic residues.

    Science.gov (United States)

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  14. Proteolysis of bovine beta-lactoglobulin during thermal treatment in subdenaturing conditions highlights some structural features of the temperature-modified protein and yields fragments with low immunoreactivity

    DEFF Research Database (Denmark)

    Iametti, S.; Rasmussen, P.; Frøkiær, Hanne

    2002-01-01

    Bovine beta-lactoglobulin was hydrolyzed with trypsin or chymotrypsin in the course of heat treatment at 55, 60 and 65 C at neutral pH. At these temperatures beta-lactoglobulin undergoes significant but reversible structural changes. In the conditions used in the present study, beta......-lactoglobulin was virtually insensitive to proteolysis by either enzyme at room temperature, but underwent extensive proteolysis when either protease was present during the heat treatment. High-temperature proteolysis occurs in a progressive manner. Mass spectrometry analysis of some large-sized breakdown intermediates...

  15. Solution Structure of 4′-Phosphopantetheine - GmACP3 from Geobacter metallireducens: A Specialized Acyl Carrier Protein with Atypical Structural Features and a Putative Role in Lipopolysaccharide Biosynthesis†

    Science.gov (United States)

    Ramelot, Theresa A.; Smola, Matthew J.; Lee, Hsiau-Wei; Ciccosanti, Colleen; Hamilton, Keith; Acton, Thomas B.; Xiao, Rong; Everett, John K.; Prestegard, James H.; Montelione, Gaetano T.; Kennedy, Michael A.

    2011-01-01

    GmACP3 from Geobacter metallireducens is a specialized acyl carrier protein (ACP) whose gene, gmet_2339, is located near genes encoding many proteins involved in lipopolysaccharide (LPS) biosynthesis, indicating a likely function for GmACP3 in LPS production. By overexpression in Escherichia coli, about 50% holo-GmACP3 and 50% apo-GmACP3 were obtained. Apo-GmACP3 exhibited slow precipitation and non-monomeric behavior by 15N NMR relaxation measurements. Addition of 4′-phosphopantetheine (4′-PP) via enzymatic conversion by E. coli holo-ACP synthase, resulted in stable >95% holo-GmACP3 that was characterized as monomeric by 15N relaxation measurements and had no indication of conformational exchange. We have determined a high-resolution solution structure of holo-GmACP3 by standard NMR methods, including refinement with two sets of NH residual dipolar couplings, allowing for a detailed structural analysis of the interactions between 4′-PP and GmACP3. Whereas the overall four helix bundle topology is similar to previously solved ACP structures, this structure has unique characteristics, including an ordered 4′-PP conformation that places the thiol at the entrance to a central hydrophobic cavity near a conserved hydrogen-bonded Trp-His pair. These residues are part of a conserved WDSLxH/N motif found in GmACP3 and it’s orthologs. The helix locations and the large hydrophobic cavity are more similar to medium- and long-chain acyl-ACPs than to other apo- and holo-ACP structures. Taken together, structural characterization along with bioinformatic analysis of nearby genes suggest that GmACP3 is involved in lipid A acylation, possibly by atypical long-chain hydroxy fatty acids, and potentially involved in synthesis of secondary metabolites. PMID:21235239

  16. Prediction of human protein function from post-translational modifications and localization features

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Blom, Nikolaj

    2002-01-01

    a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects......We have developed an entirely sequence-based method that identifies and integrates relevant features that can be used to assign proteins of unknown function to functional classes, and enzyme categories for enzymes. We show that strategies for the elucidation of protein function may benefit from...

  17. Oligomeric protein structure networks: insights into protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Brinda KV

    2005-12-01

    Full Text Available Abstract Background Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues with special emphasis to protein interfaces. Results A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb. A few predictions of interface hot

  18. Structural features that optimize high temperature superconductivity

    International Nuclear Information System (INIS)

    Jorgensen, J.D.; Argonne Nat. Lab., IL; Hinks, D.G.; Argonne Nat. Lab., IL; Chmaissem, O.; Argonne Nat. Lab., IL; Argyriou, D.N.; Argonne Nat. Lab., IL; Mitchell, J.F.; Argonne Nat. Lab., IL; Dabrowski, B.

    1996-01-01

    Studies of a large number of compounds have provided a consistent picture of what structural features give rise to the highest T c 's in copper-oxide superconductors. For example, various defects can be introduced into the blocking layer to provide the optimum carrier concentration, but defects that form in or adjacent to the CuO 2 layers will lower T c and eventually destroy superconductivity. After these requirements are satisfied, the highest T c 's are observed for compounds (such as the HgBa 2 Ca n-1 Cu n O 2n+2+x family) that have flat and square CuO 2 planes and long apical Cu-O bonds. This conclusion is confirmed by the study of materials in which the flatness of the CuO 2 plane can be varied in a systematic way. In more recent work, attention has focused on how the structure can be modified, for example, by chemical substitution, to improve flux pinning properties. Two strategies are being investigated: (1) Increasing the coupling of pancake vortices to form vortex lines by shortening or ''metallizing'' the blocking layer; and (2) the formation of defects that pin flux. (orig.)

  19. Structure and non-structure of centrosomal proteins.

    Science.gov (United States)

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  20. Sequence-based feature prediction and annotation of proteins

    DEFF Research Database (Denmark)

    Juncker, Agnieszka; Jensen, Lars J.; Pierleoni, Andrea

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome....

  1. Selecting protein families for environmental features based on manifold regularization.

    Science.gov (United States)

    Jiang, Xingpeng; Xu, Weiwei; Park, E K; Li, Guangrong

    2014-06-01

    Recently, statistics and machine learning have been developed to identify functional or taxonomic features of environmental features or physiological status. Important proteins (or other functional and taxonomic entities) to environmental features can be potentially used as biosensors. A major challenge is how the distribution of protein and gene functions embodies the adaption of microbial communities across environments and host habitats. In this paper, we propose a novel regularization method for linear regression to adapt the challenge. The approach is inspired by local linear embedding (LLE) and we call it a manifold-constrained regularization for linear regression (McRe). The novel regularization procedure also has potential to be used in solving other linear systems. We demonstrate the efficiency and the performance of the approach in both simulation and real data.

  2. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    Science.gov (United States)

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  3. Protein interfacial structure and nanotoxicology

    International Nuclear Information System (INIS)

    White, John W.; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M.

    2009-01-01

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between β-casein and κ-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a β-casein monolayer is attacked by a κ-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a β-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  4. Protein interfacial structure and nanotoxicology

    Energy Technology Data Exchange (ETDEWEB)

    White, John W. [Research School of Chemistry, Australian National University, Canberra (Australia)], E-mail: jww@rsc.anu.edu.au; Perriman, Adam W.; McGillivray, Duncan J.; Lin, J.-M. [Research School of Chemistry, Australian National University, Canberra (Australia)

    2009-02-21

    Here we briefly recapitulate the use of X-ray and neutron reflectometry at the air-water interface to find protein structures and thermodynamics at interfaces and test a possibility for understanding those interactions between nanoparticles and proteins which lead to nanoparticle toxicology through entry into living cells. Stable monomolecular protein films have been made at the air-water interface and, with a specially designed vessel, the substrate changed from that which the air-water interfacial film was deposited. This procedure allows interactions, both chemical and physical, between introduced species and the monomolecular film to be studied by reflectometry. The method is briefly illustrated here with some new results on protein-protein interaction between {beta}-casein and {kappa}-casein at the air-water interface using X-rays. These two proteins are an essential component of the structure of milk. In the experiments reported, specific and directional interactions appear to cause different interfacial structures if first, a {beta}-casein monolayer is attacked by a {kappa}-casein solution compared to the reverse. The additional contrast associated with neutrons will be an advantage here. We then show the first results of experiments on the interaction of a {beta}-casein monolayer with a nanoparticle titanium oxide sol, foreshadowing the study of the nanoparticle 'corona' thought to be important for nanoparticle-cell wall penetration.

  5. Structural features of subtype-selective EP receptor modulators.

    Science.gov (United States)

    Markovič, Tijana; Jakopin, Žiga; Dolenc, Marija Sollner; Mlinarič-Raščan, Irena

    2017-01-01

    Prostaglandin E2 is a potent endogenous molecule that binds to four different G-protein-coupled receptors: EP1-4. Each of these receptors is a valuable drug target, with distinct tissue localisation and signalling pathways. We review the structural features of EP modulators required for subtype-selective activity, as well as the structural requirements for improved pharmacokinetic parameters. Novel EP receptor subtype selective agonists and antagonists appear to be valuable drug candidates in the therapy of many pathophysiological states, including ulcerative colitis, glaucoma, bone healing, B cell lymphoma, neurological diseases, among others, which have been studied in vitro, in vivo and in early phase clinical trials. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Structural entanglements in protein complexes

    Science.gov (United States)

    Zhao, Yani; Chwastyk, Mateusz; Cieplak, Marek

    2017-06-01

    We consider multi-chain protein native structures and propose a criterion that determines whether two chains in the system are entangled or not. The criterion is based on the behavior observed by pulling at both termini of each chain simultaneously in the two chains. We have identified about 900 entangled systems in the Protein Data Bank and provided a more detailed analysis for several of them. We argue that entanglement enhances the thermodynamic stability of the system but it may have other functions: burying the hydrophobic residues at the interface and increasing the DNA or RNA binding area. We also study the folding and stretching properties of the knotted dimeric proteins MJ0366, YibK, and bacteriophytochrome. These proteins have been studied theoretically in their monomeric versions so far. The dimers are seen to separate on stretching through the tensile mechanism and the characteristic unraveling force depends on the pulling direction.

  7. Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting

    OpenAIRE

    Wang, Liwen; Chance, Mark R.

    2011-01-01

    Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described.

  8. Soliton concepts and protein structure

    Science.gov (United States)

    Krokhotin, Andrei; Niemi, Antti J.; Peng, Xubiao

    2012-03-01

    Structural classification shows that the number of different protein folds is surprisingly small. It also appears that proteins are built in a modular fashion from a relatively small number of components. Here we propose that the modular building blocks are made of the dark soliton solution of a generalized discrete nonlinear Schrödinger equation. We find that practically all protein loops can be obtained simply by scaling the size and by joining together a number of copies of the soliton, one after another. The soliton has only two loop-specific parameters, and we compute their statistical distribution in the Protein Data Bank (PDB). We explicitly construct a collection of 200 sets of parameters, each determining a soliton profile that describes a different short loop. The ensuing profiles cover practically all those proteins in PDB that have a resolution which is better than 2.0 Å, with a precision such that the average root-mean-square distance between the loop and its soliton is less than the experimental B-factor fluctuation distance. We also present two examples that describe how the loop library can be employed both to model and to analyze folded proteins.

  9. Protein single-model quality assessment by feature-based probability density functions.

    Science.gov (United States)

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  10. Structural features in Ni-Al alloys

    International Nuclear Information System (INIS)

    Abylkalykova, R.B.; Kveglis, L.I.; Rakhimova, U.A.; Nasokhova, Sh.B.; Tazhibaeva, G.B.

    2007-01-01

    Purpose of the work is study of structural transformations under diverse memory effect in Ni-Al alloys. Examination were conducted in following composition samples: Ni -75 at.% and Al - 25 at.%. The work is devoted to clarification reasons both formation atom-ordered structures in inter-grain boundaries of bulk samples under temperature action and static load. Revealed inter-grain inter-boundary layers in Ni-Al alloy both bulk and surface state have complicated structure

  11. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    Science.gov (United States)

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  12. Prediction of heterodimeric protein complexes from weighted protein-protein interaction networks using novel features and kernel functions.

    Directory of Open Access Journals (Sweden)

    Peiying Ruan

    Full Text Available Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.

  13. Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure.

    Science.gov (United States)

    Li, Tao; Li, Qian-Zhong

    2012-11-07

    RNA-protein interactions play important roles in various biological processes. The precise detection of RNA-protein interaction sites is very important for understanding essential biological processes and annotating the function of the proteins. In this study, based on various features from amino acid sequence and structure, including evolutionary information, solvent accessible surface area and torsion angles (φ, ψ) in the backbone structure of the polypeptide chain, a computational method for predicting RNA-binding sites in proteins is proposed. When the method is applied to predict RNA-binding sites in three datasets: RBP86 containing 86 protein chains, RBP107 containing 107 proteins chains and RBP109 containing 109 proteins chains, better sensitivities and specificities are obtained compared to previously published methods in five-fold cross-validation tests. In order to make further examination for the efficiency of our method, the RBP107 dataset is used as training set, RBP86 and RBP109 datasets are used as the independent test sets. In addition, as examples of our prediction, RNA-binding sites in a few proteins are presented. The annotated results are consistent with the PDB annotation. These results show that our method is useful for annotating RNA binding sites of novel proteins.

  14. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    Science.gov (United States)

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Characterization of protein and carbohydrate mid-IR spectral features in crop residues

    Science.gov (United States)

    Xin, Hangshu; Zhang, Yonggen; Wang, Mingjun; Li, Zhongyu; Wang, Zhibo; Yu, Peiqiang

    2014-08-01

    To the best of our knowledge, a few studies have been conducted on inherent structure spectral traits related to biopolymers of crop residues. The objective of this study was to characterize protein and carbohydrate structure spectral features of three field crop residues (rice straw, wheat straw and millet straw) in comparison with two crop vines (peanut vine and pea vine) by using Fourier transform infrared spectroscopy (FTIR) technique with attenuated total reflectance (ATR). Also, multivariate analyses were performed on spectral data sets within the regions mainly related to protein and carbohydrate in this study. The results showed that spectral differences existed in mid-IR peak intensities that are mainly related to protein and carbohydrate among these crop residue samples. With regard to protein spectral profile, peanut vine showed the greatest mid-IR band intensities that are related to protein amide and protein secondary structures, followed by pea vine and the rest three field crop straws. The crop vines had 48-134% higher spectral band intensity than the grain straws in spectral features associated with protein. Similar trends were also found in the bands that are mainly related to structural carbohydrates (such as cellulosic compounds). However, the field crop residues had higher peak intensity in total carbohydrates region than the crop vines. Furthermore, spectral ratios varied among the residue samples, indicating that these five crop residues had different internal structural conformation. However, multivariate spectral analyses showed that structural similarities still exhibited among crop residues in the regions associated with protein biopolymers and carbohydrate. Further study is needed to find out whether there is any relationship between spectroscopic information and nutrition supply in various kinds of crop residue when fed to animals.

  16. Recognition of functional sites in protein structures.

    Science.gov (United States)

    Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J

    2004-06-04

    Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.

  17. Structural safety features for superconducting magnets

    International Nuclear Information System (INIS)

    Lehner, J.; Reich, M.; Powell, J.; Bezler, P.; Gardner, D.; Yu, W.; Chang, T.Y.

    1975-01-01

    A survey has been carried out for various potential structural safety problems of superconducting fusion magnets. These areas include: (1) Stresses due to inhomogeneous temperature distributions in magnets where normal regions have been initiated. (2) Stress distributions and yield forces due to cracks and failed regions. (3) Superconducting magnet response due to seismic excitation. These analyses have been carried out using a variety of large capacity finite element computer codes that allow for the evaluation of stresses in elastic or elastic-plastic zones and around singularities in the magnet structure. Thus far, these analyses have been carried out on UWMAK-I type magnet systems

  18. Protein Structure Refinement by Optimization

    DEFF Research Database (Denmark)

    Carlsen, Martin

    on whether the three-dimensional structure of a homologous sequence is known. Whether or not a protein model can be used for industrial purposes depends on the quality of the predicted structure. A model can be used to design a drug when the quality is high. The overall goal of this project is to assess...... that correlates maximally to a native-decoy distance. The main contribution of this thesis is methods developed for analyzing the performance of metrically trained knowledge-based potentials and for optimizing their performance while making them less dependent on the decoy set used to define them. We focus...... being at-least a local minimum of the potential. To address how far the current functional form of the potential is from an ideal potential we present two methods for finding the optimal metrically trained potential that simultaneous has a number of native structures as a local minimum. Our results...

  19. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    Science.gov (United States)

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent

  20. Feature-based motion control for near-repetitive structures

    NARCIS (Netherlands)

    Best, de J.J.T.H.

    2011-01-01

    In many manufacturing processes, production steps are carried out on repetitive structures which consist of identical features placed in a repetitive pattern. In the production of these repetitive structures one or more consecutive steps are carried out on the features to create the final product.

  1. Subsurface structures of buried features in the lunar Procellarum region

    Science.gov (United States)

    Wang, Wenrui; Heki, Kosuke

    2017-07-01

    The Gravity Recovery and Interior Laboratory (GRAIL) mission unraveled numbers of features showing strong gravity anomalies without prominent topographic signatures in the lunar Procellarum region. These features, located in different geologic units, are considered to have complex subsurface structures reflecting different evolution processes. By using the GRAIL level-1 data, we estimated the free-air and Bouguer gravity anomalies in several selected regions including such intriguing features. With the three-dimensional inversion technique, we recovered subsurface density structures in these regions.

  2. Internet of Things: Structure, Features and Management

    Directory of Open Access Journals (Sweden)

    Aleksandrovičs Vladislavs

    2016-12-01

    Full Text Available Internet of Things (IoT - a rapidly developing technology today and most likely everyday thing in the future. Numerous devices, computing machines and build-in sensors connected in a single dynamic network continuously receive and exchange information from the outer environment. Huge data clusters are collected and put to use in handmade applications that scrupulously manage and control given objectives. In this way, an interactive technical infrastructure is created, which can oversee and infiltrate any person’s vital processes. Though separately every device and technological solution in the IoT can be known for many years, each architecture is unique and provides new challenges for the network owner. This research aims to investigate IoT general structure and management aspects with the knowledge of which the authors will try to answer a trivial question whether it is possible to comprehensively control such a tremendous structure with the current level of technology.

  3. Organizational structure features supporting knowledge management processes

    OpenAIRE

    Claver-Cortés, Enrique; Zaragoza Sáez, Patrocinio del Carmen; Pertusa-Ortega, Eva

    2007-01-01

    Purpose – The idea that knowledge management can be a potential source of competitive advantage has gained strength in the last few years. However, a number of business actions are needed to generate an appropriate environment and infrastructure for knowledge creation, transfer and application. Among these actions there stands out the design of an organizational structure, the link of which with knowledge management is the main concern here. More specifically, the present paper has as its aim...

  4. A feature-based approach to modeling protein–protein interaction hot spots

    Science.gov (United States)

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-01-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533

  5. The sequence, structure and evolutionary features of HOTAIR in mammals

    Science.gov (United States)

    2011-01-01

    Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals

  6. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    Science.gov (United States)

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  7. Unique structural features facilitate lizard tail autotomy.

    Science.gov (United States)

    Sanggaard, Kristian W; Danielsen, Carl Chr; Wogensen, Lise; Vinding, Mads S; Rydtoft, Louise M; Mortensen, Martin B; Karring, Henrik; Nielsen, Niels Chr; Wang, Tobias; Thøgersen, Ida B; Enghild, Jan J

    2012-01-01

    Autotomy refers to the voluntary shedding of a body part; a renowned example is tail loss among lizards as a response to attempted predation. Although many aspects of lizard tail autotomy have been studied, the detailed morphology and mechanism remains unclear. In the present study, we showed that tail shedding by the Tokay gecko (Gekko gecko) and the associated extracellular matrix (ECM) rupture were independent of proteolysis. Instead, lizard caudal autotomy relied on biological adhesion facilitated by surface microstructures. Results based on bio-imaging techniques demonstrated that the tail of Gekko gecko was pre-severed at distinct sites and that its structural integrity depended on the adhesion between these segments.

  8. Structural features of the Compact Ignition Tokamak

    International Nuclear Information System (INIS)

    Citrolo, J.; Brown, G.; Rogoff, P.

    1987-01-01

    The Compact Ignition Tokamak (CIT) is undergoing preliminary structural design and definitions. It will be relatively inexpensive with ignition capabilities. During the definition phase it was concluded that the TF coil should be assembled from the laminate copper-Inconel plates since copper alone cannot sustain the expected magnetic and thermal loads. An extensive test program is being initiated to investigate the various materials, and their elastic and inelastic response and to develop the constitutive equations required for the selection of design criteria and for the stress analysis of this device. Finite element analysis nonlinear material capabilities are being used to study, predict and correlate the machine behavior

  9. Unique structural features facilitate lizard tail autotomy.

    Directory of Open Access Journals (Sweden)

    Kristian W Sanggaard

    Full Text Available Autotomy refers to the voluntary shedding of a body part; a renowned example is tail loss among lizards as a response to attempted predation. Although many aspects of lizard tail autotomy have been studied, the detailed morphology and mechanism remains unclear. In the present study, we showed that tail shedding by the Tokay gecko (Gekko gecko and the associated extracellular matrix (ECM rupture were independent of proteolysis. Instead, lizard caudal autotomy relied on biological adhesion facilitated by surface microstructures. Results based on bio-imaging techniques demonstrated that the tail of Gekko gecko was pre-severed at distinct sites and that its structural integrity depended on the adhesion between these segments.

  10. SDSL-ESR-based protein structure characterization.

    Science.gov (United States)

    Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A

    2010-03-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.

  11. Structural features that optimize high temperature superconductivity

    Energy Technology Data Exchange (ETDEWEB)

    Jorgensen, J.D.; Hinks, D.G. Chmaissem, O.; Argyriou, D.N.; Mitchell, J.F. [Argonne National Lab., IL (United States); Dabrowski, B. [Northern Illinois Univ., De Kalb, IL (United States). Dept. of Physics

    1996-01-01

    For example, various defects can be introduced into the blocking layer to provide the optimum carrier concentration, but defects that form in or adjacent to the CuO{sub 2} layers will lower T{sub c} and eventually destroy superconductivity. After these requirements are satisfied, the highest T{sub c}`s are observed for compounds (such as the HgBa{sub 2}Ca{sub n-1}CuO{sub 2n{plus}2{plus}x} family) that have flat and square CuO{sub 2} planes and long apical Cu-O bonds. This conclusion is confirmed by the study of materials in which the flatness of the CuO{sub 2} plane can be varied in a systematic way. In more recent work, attention has focused on how the structure can be modified, for example, by chemical substitution, to improve flux pinning properties. Two strategies are being investigated: (1) Increasing the coupling of pancake vortices to form vortex-lines by shortening or ``metallizing`` the blocking layer; and (2) the formation of defects that pin flux.

  12. Features of micromorphological structure of medicinal hyssop

    Directory of Open Access Journals (Sweden)

    Lyudmyla A. Kotyuk

    2016-06-01

    Full Text Available Micromorphological peculiarities of the structure of vegetative and generative organs of Hyssopus officinalis were analyzed. The epidermis of H. officinalis reveals diacyctic stomata and external outgrowths: glandular and covering trichomes, as well as peltate essential oil glands. Capitate and bent indumentary (covering trichomes occur on the stem, while on the leaves peltate glands, conical and bent uni- and multicellular trichomes were observed. On the calyx, in the midrib region, there are peltate glands while the ribs are densely covered with indumentary and glandular trichomes. The corolla’s adaxial surface is covered with long indumentary trichomes, with sparse peltate glands occurring on the margins. The highest density of essential oil peltate glands is found on the adaxial surface of the calyx upper lip (15.8±2.54 pcs./mm2 and on the leaf abaxial surface (13.6±2.40 pcs./mm2. Glands with the largest diameter (47.82±2.82 μm are located on the leaf adaxial surface.

  13. Structural features that optimize high temperature superconductivity

    International Nuclear Information System (INIS)

    Jorgensen, J.D.; Hinks, D.G. Chmaissem, O.; Argyriou, D.N.; Mitchell, J.F.; Dabrowski, B.

    1996-01-01

    For example, various defects can be introduced into the blocking layer to provide the optimum carrier concentration, but defects that form in or adjacent to the CuO 2 layers will lower T c and eventually destroy superconductivity. After these requirements are satisfied, the highest T c 's are observed for compounds (such as the HgBa 2 Ca n-1 CuO 2n+2+x family) that have flat and square CuO 2 planes and long apical Cu-O bonds. This conclusion is confirmed by the study of materials in which the flatness of the CuO 2 plane can be varied in a systematic way. In more recent work, attention has focused on how the structure can be modified, for example, by chemical substitution, to improve flux pinning properties. Two strategies are being investigated: (1) Increasing the coupling of pancake vortices to form vortex-lines by shortening or ''metallizing'' the blocking layer; and (2) the formation of defects that pin flux

  14. Protein enriched pasta: structure and digestibility of its protein network.

    Science.gov (United States)

    Laleg, Karima; Barron, Cécile; Santé-Lhoutellier, Véronique; Walrand, Stéphane; Micard, Valérie

    2016-02-01

    Wheat (W) pasta was enriched in 6% gluten (G), 35% faba (F) or 5% egg (E) to increase its protein content (13% to 17%). The impact of the enrichment on the multiscale structure of the pasta and on in vitro protein digestibility was studied. Increasing the protein content (W- vs. G-pasta) strengthened pasta structure at molecular and macroscopic scales but reduced its protein digestibility by 3% by forming a higher covalently linked protein network. Greater changes in the macroscopic and molecular structure of the pasta were obtained by varying the nature of protein used for enrichment. Proteins in G- and E-pasta were highly covalently linked (28-32%) resulting in a strong pasta structure. Conversely, F-protein (98% SDS-soluble) altered the pasta structure by diluting gluten and formed a weak protein network (18% covalent link). As a result, protein digestibility in F-pasta was significantly higher (46%) than in E- (44%) and G-pasta (39%). The effect of low (55 °C, LT) vs. very high temperature (90 °C, VHT) drying on the protein network structure and digestibility was shown to cause greater molecular changes than pasta formulation. Whatever the pasta, a general strengthening of its structure, a 33% to 47% increase in covalently linked proteins and a higher β-sheet structure were observed. However, these structural differences were evened out after the pasta was cooked, resulting in identical protein digestibility in LT and VHT pasta. Even after VHT drying, F-pasta had the best amino acid profile with the highest protein digestibility, proof of its nutritional interest.

  15. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  16. Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.

    Directory of Open Access Journals (Sweden)

    Lu-Lu Zheng

    Full Text Available Pyrrolidone carboxylic acid (PCA is formed during a common post-translational modification (PTM of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR and incremental feature selection (IFS. We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.

  17. Crystal structures of barley thioredoxin h isoforms HvTrxh1 and HvTrxh2 reveal features involved in protein recognition and possibly in discriminating the isoform specificity

    DEFF Research Database (Denmark)

    Maeda, Kenji; Hägglund, Per; Finnie, Christine

    2008-01-01

    segment of one HvTrxh1 molecule is positioned along a shallow hydrophobic groove at the primary nucleophile Cys40 of another HvTrxh1 molecule. The association mode can serve as a model for the target protein recognition by Trx, as it brings the Met82 C gamma atom (gamma position as a disulfide sulfur......) of the bound loop segment in the proximity of the Cys40 thiol. The interaction involves three characteristic backbone-backbone hydrogen bonds in an antiparallel beta-sheet-like arrangement, similar to the arrangement observed in the structure of an engineered, covalently bound complex between Trx...... and a substrate protein, as reported by Maeda et al. in an earlier paper. The occurrence of an intermolecular salt bridge between Glu80 of the bound loop segment and Arg101 near the hydrophobic groove suggests that charge complementarity plays a role in the specificity of Trx. In HvTrxh2, isoleucine corresponds...

  18. Structure-based barcoding of proteins.

    Science.gov (United States)

    Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin

    2014-01-01

    A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.

  19. Differential role of molten globule and protein folding in distinguishing unique features of botulinum neurotoxin.

    Science.gov (United States)

    Kumar, Raj; Kukreja, Roshan V; Cai, Shuowei; Singh, Bal R

    2014-06-01

    Botulinum neurotoxins (BoNTs) are proteins of great interest not only because of their extreme toxicity but also paradoxically for their therapeutic applications. All the known serotypes (A-G) have varying degrees of longevity and potency inside the neuronal cell. Differential chemical modifications such as phosphorylation and ubiquitination have been suggested as possible mechanisms for their longevity, but the molecular basis of the longevity remains unclear. Since the endopeptidase domain (light chain; LC) of toxin apparently survives inside the neuronal cells for months, it is important to examine the structural features of this domain to understand its resistance to intracellular degradation. Published crystal structures (both botulinum neurotoxins and endopeptidase domain) have not provided adequate explanation for the intracellular longevity of the domain. Structural features obtained from spectroscopic analysis of LCA and LCB were similar, and a PRIME (PReImminent Molten Globule Enzyme) conformation appears to be responsible for their optimal enzymatic activity at 37°C. LCE, on the other hand, was although optimally active at 37°C, but its active conformation differed from the PRIME conformation of LCA and LCB. This study establishes and confirms our earlier finding that an optimally active conformation of these proteins in the form of PRIME exists for the most poisonous poison, botulinum neurotoxin. There are substantial variations in the structural and functional characteristics of these active molten globule related structures among the three BoNT endopeptidases examined. These differential conformations of LCs are important in understanding the fundamental structural features of proteins, and their possible connection to intracellular longevity could provide significant clues for devising new countermeasures and effective therapeutics. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    Science.gov (United States)

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  1. The interface of protein structure, protein biophysics, and molecular evolution

    Science.gov (United States)

    Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon

    2012-01-01

    Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593

  2. SDSL-ESR-based protein structure characterization

    NARCIS (Netherlands)

    Strancar, J.; Kavalenka, A.A.; Urbancic, I.; Ljubetic, A.; Hemminga, M.A.

    2010-01-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be

  3. Overcoming barriers to membrane protein structure determination.

    Science.gov (United States)

    Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst

    2011-04-01

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.

  4. Structural health monitoring feature design by genetic programming

    International Nuclear Information System (INIS)

    Harvey, Dustin Y; Todd, Michael D

    2014-01-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and other high-capital or life-safety critical structures. Conventional data processing involves pre-processing and extraction of low-dimensional features from in situ time series measurements. The features are then input to a statistical pattern recognition algorithm to perform the relevant classification or regression task necessary to facilitate decisions by the SHM system. Traditional design of signal processing and feature extraction algorithms can be an expensive and time-consuming process requiring extensive system knowledge and domain expertise. Genetic programming, a heuristic program search method from evolutionary computation, was recently adapted by the authors to perform automated, data-driven design of signal processing and feature extraction algorithms for statistical pattern recognition applications. The proposed method, called Autofead, is particularly suitable to handle the challenges inherent in algorithm design for SHM problems where the manifestation of damage in structural response measurements is often unclear or unknown. Autofead mines a training database of response measurements to discover information-rich features specific to the problem at hand. This study provides experimental validation on three SHM applications including ultrasonic damage detection, bearing damage classification for rotating machinery, and vibration-based structural health monitoring. Performance comparisons with common feature choices for each problem area are provided demonstrating the versatility of Autofead to produce significant algorithm improvements on a wide range of problems. (paper)

  5. Structural basis for target protein recognition by the protein disulfide reductase thioredoxin

    DEFF Research Database (Denmark)

    Maeda, Kenji; Hägglund, Per; Finnie, Christine

    2006-01-01

    Thioredoxin is ubiquitous and regulates various target proteins through disulfide bond reduction. We report the structure of thioredoxin (HvTrxh2 from barley) in a reaction intermediate complex with a protein substrate, barley alpha-amylase/subtilisin inhibitor (BASI). The crystal structure...... of this mixed disulfide shows a conserved hydrophobic motif in thioredoxin interacting with a sequence of residues from BASI through van der Waals contacts and backbone-backbone hydrogen bonds. The observed structural complementarity suggests that the recognition of features around protein disulfides plays...... a major role in the specificity and protein disulfide reductase activity of thioredoxin. This novel insight into the function of thioredoxin constitutes a basis for comprehensive understanding of its biological role. Moreover, comparison with structurally related proteins shows that thioredoxin shares...

  6. Mapping monomeric threading to protein-protein structure prediction.

    Science.gov (United States)

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  7. Metallacyclopentadienes: structural features and coordination in transition metal complexes

    International Nuclear Information System (INIS)

    Dolgushin, Fedor M; Yanovsky, Aleksandr I; Antipin, Mikhail Yu

    2004-01-01

    Results of structural studies of polynuclear transition metal complexes containing the metallacyclopentadiene fragment are overviewed. The structural features of the complexes in relation to the nature of the substituents in the organic moiety of the metallacycles, the nature of the transition metals and their ligand environment are analysed. The main structural characteristics corresponding to different modes of coordination of metallacyclopentadienes to one or two additional metal centres are revealed.

  8. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    Science.gov (United States)

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. © 2013 The Authors. Published by Elsevier Ltd All rights reserved.

  9. PSAIA – Protein Structure and Interaction Analyzer

    Directory of Open Access Journals (Sweden)

    Vlahoviček Kristian

    2008-04-01

    Full Text Available Abstract Background PSAIA (Protein Structure and Interaction Analyzer was developed to compute geometric parameters for large sets of protein structures in order to predict and investigate protein-protein interaction sites. Results In addition to most relevant established algorithms, PSAIA offers a new method PIADA (Protein Interaction Atom Distance Algorithm for the determination of residue interaction pairs. We found that PIADA produced more satisfactory results than comparable algorithms implemented in PSAIA. Particular advantages of PSAIA include its capacity to combine different methods to detect the locations and types of interactions between residues and its ability, without any further automation steps, to handle large numbers of protein structures and complexes. Generally, the integration of a variety of methods enables PSAIA to offer easier automation of analysis and greater reliability of results. PSAIA can be used either via a graphical user interface or from the command-line. Results are generated in either tabular or XML format. Conclusion In a straightforward fashion and for large sets of protein structures, PSAIA enables the calculation of protein geometric parameters and the determination of location and type for protein-protein interaction sites. XML formatted output enables easy conversion of results to various formats suitable for statistic analysis. Results from smaller data sets demonstrated the influence of geometry on protein interaction sites. Comprehensive analysis of properties of large data sets lead to new information useful in the prediction of protein-protein interaction sites.

  10. Automated local bright feature image analysis of nuclear protein distribution identifies changes in tissue phenotype

    International Nuclear Information System (INIS)

    Knowles, David; Sudar, Damir; Bator, Carol; Bissell, Mina

    2006-01-01

    The organization of nuclear proteins is linked to cell and tissue phenotypes. When cells arrest proliferation, undergo apoptosis, or differentiate, the distribution of nuclear proteins changes. Conversely, forced alteration of the distribution of nuclear proteins modifies cell phenotype. Immunostaining and fluorescence microscopy have been critical for such findings. However, there is an increasing need for quantitative analysis of nuclear protein distribution to decipher epigenetic relationships between nuclear structure and cell phenotype, and to unravel the mechanisms linking nuclear structure and function. We have developed imaging methods to quantify the distribution of fluorescently-stained nuclear protein NuMA in different mammary phenotypes obtained using three-dimensional cell culture. Automated image segmentation of DAPI-stained nuclei was generated to isolate thousands of nuclei from three-dimensional confocal images. Prominent features of fluorescently-stained NuMA were detected using a novel local bright feature analysis technique, and their normalized spatial density calculated as a function of the distance from the nuclear perimeter to its center. The results revealed marked changes in the distribution of the density of NuMA bright features as non-neoplastic cells underwent phenotypically normal acinar morphogenesis. In contrast, we did not detect any reorganization of NuMA during the formation of tumor nodules by malignant cells. Importantly, the analysis also discriminated proliferating non-neoplastic cells from proliferating malignant cells, suggesting that these imaging methods are capable of identifying alterations linked not only to the proliferation status but also to the malignant character of cells. We believe that this quantitative analysis will have additional applications for classifying normal and pathological tissues

  11. An Algebro-Topological Description of Protein Domain Structure

    Science.gov (United States)

    Penner, Robert Clark; Knudsen, Michael; Wiuf, Carsten; Andersen, Jørgen Ellegaard

    2011-01-01

    The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Protein structure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object – a so-called fatgraph. The description is discrete yet gives rise to a 2-dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we invesigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH. PMID:21629687

  12. Solution NMR structure determination of proteins revisited

    International Nuclear Information System (INIS)

    Billeter, Martin; Wagner, Gerhard; Wuethrich, Kurt

    2008-01-01

    This 'Perspective' bears on the present state of protein structure determination by NMR in solution. The focus is on a comparison of the infrastructure available for NMR structure determination when compared to protein crystal structure determination by X-ray diffraction. The main conclusion emerges that the unique potential of NMR to generate high resolution data also on dynamics, interactions and conformational equilibria has contributed to a lack of standard procedures for structure determination which would be readily amenable to improved efficiency by automation. To spark renewed discussion on the topic of NMR structure determination of proteins, procedural steps with high potential for improvement are identified

  13. Validation-driven protein-structure improvement

    NARCIS (Netherlands)

    Touw, W.G.

    2016-01-01

    High-quality protein structure models are essential for many Life Science applications, such as protein engineering, molecular dynamics, drug design, and homology modelling. The WHAT_CHECK model validation project and the PDB_REDO model optimisation project have shown that many structure models in

  14. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    Science.gov (United States)

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

  15. Preparation of gluten-free bread using a meso-structured whey protein particle system

    NARCIS (Netherlands)

    Riemsdijk, van L.E.; Goot, van der A.J.; Hamer, R.J.; Boom, R.M.

    2011-01-01

    This article presents a novel method for making gluten-free bread using mesoscopically structured whey protein. The use of the meso-structured protein is based on the hypothesis that the gluten structure present in a developed wheat dough features a particle structure on a mesoscopic length scale

  16. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza

    2017-02-08

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  17. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z.; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  18. Analysis of Conserved Structural Features of Selenoprotein K | Al ...

    African Journals Online (AJOL)

    Selenium plays important roles in human health and these roles may be exerted through its presence in selenoproteins. Among the 25 selenoproteins in human is selenoprotein K (SelK) whose exact function is still unclear. Here, we investigated the conserved structural features of SelK using bioinformatics as an approach ...

  19. Amino acid code of protein secondary structure.

    Science.gov (United States)

    Shestopalov, B V

    2003-01-01

    The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three

  20. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng; Gao, Xin

    2016-01-01

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated

  1. Automated protein structure calculation from NMR data

    International Nuclear Information System (INIS)

    Williamson, Mike P.; Craven, C. Jeremy

    2009-01-01

    Current software is almost at the stage to permit completely automatic structure determination of small proteins of <15 kDa, from NMR spectra to structure validation with minimal user interaction. This goal is welcome, as it makes structure calculation more objective and therefore more easily validated, without any loss in the quality of the structures generated. Moreover, it releases expert spectroscopists to carry out research that cannot be automated. It should not take much further effort to extend automation to ca 20 kDa. However, there are technological barriers to further automation, of which the biggest are identified as: routines for peak picking; adoption and sharing of a common framework for structure calculation, including the assembly of an automated and trusted package for structure validation; and sample preparation, particularly for larger proteins. These barriers should be the main target for development of methodology for protein structure determination, particularly by structural genomics consortia

  2. Structural anatomy of telomere OB proteins.

    Science.gov (United States)

    Horvath, Martin P

    2011-10-01

    Telomere DNA-binding proteins protect the ends of chromosomes in eukaryotes. A subset of these proteins are constructed with one or more OB folds and bind with G+T-rich single-stranded DNA found at the extreme termini. The resulting DNA-OB protein complex interacts with other telomere components to coordinate critical telomere functions of DNA protection and DNA synthesis. While the first crystal and NMR structures readily explained protection of telomere ends, the picture of how single-stranded DNA becomes available to serve as primer and template for synthesis of new telomere DNA is only recently coming into focus. New structures of telomere OB fold proteins alongside insights from genetic and biochemical experiments have made significant contributions towards understanding how protein-binding OB proteins collaborate with DNA-binding OB proteins to recruit telomerase and DNA polymerase for telomere homeostasis. This review surveys telomere OB protein structures alongside highly comparable structures derived from replication protein A (RPA) components, with the goal of providing a molecular context for understanding telomere OB protein evolution and mechanism of action in protection and synthesis of telomere DNA.

  3. 3D complex: a structural classification of protein complexes.

    Directory of Open Access Journals (Sweden)

    Emmanuel D Levy

    2006-11-01

    Full Text Available Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.

  4. Complete fold annotation of the human proteome using a novel structural feature space.

    Science.gov (United States)

    Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

    2017-04-13

    Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.

  5. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    -trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...

  6. Structural symmetry and protein function.

    Science.gov (United States)

    Goodsell, D S; Olson, A J

    2000-01-01

    The majority of soluble and membrane-bound proteins in modern cells are symmetrical oligomeric complexes with two or more subunits. The evolutionary selection of symmetrical oligomeric complexes is driven by functional, genetic, and physicochemical needs. Large proteins are selected for specific morphological functions, such as formation of rings, containers, and filaments, and for cooperative functions, such as allosteric regulation and multivalent binding. Large proteins are also more stable against denaturation and have a reduced surface area exposed to solvent when compared with many individual, smaller proteins. Large proteins are constructed as oligomers for reasons of error control in synthesis, coding efficiency, and regulation of assembly. Symmetrical oligomers are favored because of stability and finite control of assembly. Several functions limit symmetry, such as interaction with DNA or membranes, and directional motion. Symmetry is broken or modified in many forms: quasisymmetry, in which identical subunits adopt similar but different conformations; pleomorphism, in which identical subunits form different complexes; pseudosymmetry, in which different molecules form approximately symmetrical complexes; and symmetry mismatch, in which oligomers of different symmetries interact along their respective symmetry axes. Asymmetry is also observed at several levels. Nearly all complexes show local asymmetry at the level of side chain conformation. Several complexes have reciprocating mechanisms in which the complex is asymmetric, but, over time, all subunits cycle through the same set of conformations. Global asymmetry is only rarely observed. Evolution of oligomeric complexes may favor the formation of dimers over complexes with higher cyclic symmetry, through a mechanism of prepositioned pairs of interacting residues. However, examples have been found for all of the crystallographic point groups, demonstrating that functional need can drive the evolution of

  7. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  8. Protein structure: geometry, topology and classification

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, William R.; May, Alex C.W.; Brown, Nigel P.; Aszodi, Andras [Division of Mathematical Biology, National Institute for Medical Research, London (United Kingdom)

    2001-04-01

    The structural principals of proteins are reviewed and analysed from a geometric perspective with a view to revealing the underlying regularities in their construction. Computer methods for the automatic comparison and classification of these structures are then reviewed with an analysis of the statistical significance of comparing different shapes. Following an analysis of the current state of the classification of proteins, more abstract geometric and topological representations are explored, including the occurrence of knotted topologies. The review concludes with a consideration of the origin of higher-level symmetries in protein structure. (author)

  9. Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude

    2008-11-15

    Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.

  10. Accessible surface area of proteins from purely sequence information and the importance of global features

    Science.gov (United States)

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-03-01

    We present a new approach for predicting the accessible surface area of proteins. The novelty of this approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Rather, sequential window information and the global monomer and dimer compositions of the chain are used. We find that much of the lost accuracy due to the elimination of evolutionary information is recouped by the use of global features. Furthermore, this new predictor produces similar results for proteins with or without sequence homologs deposited in the Protein Data Bank, and hence shows generalizability. Finally, these predictions are obtained in a small fraction (1/1000) of the time required to run mutation profile based prediction. All these factors indicate the possible usability of this work in de-novo protein structure prediction and in de-novo protein design using iterative searches. Funded in part by the financial support of the National Institutes of Health through Grants R01GM072014 and R01GM073095, and the National Science Foundation through Grant NSF MCB 1071785.

  11. Fast loop modeling for protein structures

    Science.gov (United States)

    Zhang, Jiong; Nguyen, Son; Shang, Yi; Xu, Dong; Kosztin, Ioan

    2015-03-01

    X-ray crystallography is the main method for determining 3D protein structures. In many cases, however, flexible loop regions of proteins cannot be resolved by this approach. This leads to incomplete structures in the protein data bank, preventing further computational study and analysis of these proteins. For instance, all-atom molecular dynamics (MD) simulation studies of structure-function relationship require complete protein structures. To address this shortcoming, we have developed and implemented an efficient computational method for building missing protein loops. The method is database driven and uses deep learning and multi-dimensional scaling algorithms. We have implemented the method as a simple stand-alone program, which can also be used as a plugin in existing molecular modeling software, e.g., VMD. The quality and stability of the generated structures are assessed and tested via energy scoring functions and by equilibrium MD simulations. The proposed method can also be used in template-based protein structure prediction. Work supported by the National Institutes of Health [R01 GM100701]. Computer time was provided by the University of Missouri Bioinformatics Consortium.

  12. Protein Molecular Structures, Protein SubFractions, and Protein Availability Affected by Heat Processing: A Review

    International Nuclear Information System (INIS)

    Yu, P.

    2007-01-01

    The utilization and availability of protein depended on the types of protein and their specific susceptibility to enzymatic hydrolysis (inhibitory activities) in the gastrointestine and was highly associated with protein molecular structures. Studying internal protein structure and protein subfraction profiles leaded to an understanding of the components that make up a whole protein. An understanding of the molecular structure of the whole protein was often vital to understanding its digestive behavior and nutritive value in animals. In this review, recently obtained information on protein molecular structural effects of heat processing was reviewed, in relation to protein characteristics affecting digestive behavior and nutrient utilization and availability. The emphasis of this review was on (1) using the newly advanced synchrotron technology (S-FTIR) as a novel approach to reveal protein molecular chemistry affected by heat processing within intact plant tissues; (2) revealing the effects of heat processing on the profile changes of protein subfractions associated with digestive behaviors and kinetics manipulated by heat processing; (3) prediction of the changes of protein availability and supply after heat processing, using the advanced DVE/OEB and NRC-2001 models, and (4) obtaining information on optimal processing conditions of protein as intestinal protein source to achieve target values for potential high net absorbable protein in the small intestine. The information described in this article may give better insight in the mechanisms involved and the intrinsic protein molecular structural changes occurring upon processing.

  13. Structural and Molecular Modeling Features of P2X Receptors

    Directory of Open Access Journals (Sweden)

    Luiz Anastacio Alves

    2014-03-01

    Full Text Available Currently, adenosine 5'-triphosphate (ATP is recognized as the extracellular messenger that acts through P2 receptors. P2 receptors are divided into two subtypes: P2Y metabotropic receptors and P2X ionotropic receptors, both of which are found in virtually all mammalian cell types studied. Due to the difficulty in studying membrane protein structures by X-ray crystallography or NMR techniques, there is little information about these structures available in the literature. Two structures of the P2X4 receptor in truncated form have been solved by crystallography. Molecular modeling has proven to be an excellent tool for studying ionotropic receptors. Recently, modeling studies carried out on P2X receptors have advanced our knowledge of the P2X receptor structure-function relationships. This review presents a brief history of ion channel structural studies and shows how modeling approaches can be used to address relevant questions about P2X receptors.

  14. Human cancer protein-protein interaction network: a structural perspective.

    Directory of Open Access Journals (Sweden)

    Gozde Kar

    2009-12-01

    Full Text Available Protein-protein interaction networks provide a global picture of cellular function and biological processes. Some proteins act as hub proteins, highly connected to others, whereas some others have few interactions. The dysfunction of some interactions causes many diseases, including cancer. Proteins interact through their interfaces. Therefore, studying the interface properties of cancer-related proteins will help explain their role in the interaction networks. Similar or overlapping binding sites should be used repeatedly in single interface hub proteins, making them promiscuous. Alternatively, multi-interface hub proteins make use of several distinct binding sites to bind to different partners. We propose a methodology to integrate protein interfaces into cancer interaction networks (ciSPIN, cancer structural protein interface network. The interactions in the human protein interaction network are replaced by interfaces, coming from either known or predicted complexes. We provide a detailed analysis of cancer related human protein-protein interfaces and the topological properties of the cancer network. The results reveal that cancer-related proteins have smaller, more planar, more charged and less hydrophobic binding sites than non-cancer proteins, which may indicate low affinity and high specificity of the cancer-related interactions. We also classified the genes in ciSPIN according to phenotypes. Within phenotypes, for breast cancer, colorectal cancer and leukemia, interface properties were found to be discriminating from non-cancer interfaces with an accuracy of 71%, 67%, 61%, respectively. In addition, cancer-related proteins tend to interact with their partners through distinct interfaces, corresponding mostly to multi-interface hubs, which comprise 56% of cancer-related proteins, and constituting the nodes with higher essentiality in the network (76%. We illustrate the interface related affinity properties of two cancer-related hub

  15. Protein Structure and the Sequential Structure of mRNA

    DEFF Research Database (Denmark)

    Brunak, Søren; Engelbrecht, Jacob

    1996-01-01

    entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment, By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets, These signals do not originate from......A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed, We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting...... protein, The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain, A complete search for GenBank nucleotide sequences coding for structural...

  16. Protein structure database search and evolutionary classification.

    Science.gov (United States)

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  17. Modeling protein structures: construction and their applications.

    Science.gov (United States)

    Ring, C S; Cohen, F E

    1993-06-01

    Although no general solution to the protein folding problem exists, the three-dimensional structures of proteins are being successfully predicted when experimentally derived constraints are used in conjunction with heuristic methods. In the case of interleukin-4, mutagenesis data and CD spectroscopy were instrumental in the accurate assignment of secondary structure. In addition, the tertiary structure was highly constrained by six cysteines separated by many residues that formed three disulfide bridges. Although the correct structure was a member of a short list of plausible structures, the "best" structure was the topological enantiomer of the experimentally determined conformation. For many proteases, other experimentally derived structures can be used as templates to identify the secondary structure elements. In a procedure called modeling by homology, the structure of a known protein is used as a scaffold to predict the structure of another related protein. This method has been used to model a serine and a cysteine protease that are important in the schistosome and malarial life cycles, respectively. The model structures were then used to identify putative small molecule enzyme inhibitors computationally. Experiments confirm that some of these nonpeptidic compounds are active at concentrations of less than 10 microM.

  18. Proteins with Novel Structure, Function and Dynamics

    Science.gov (United States)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  19. Overcoming barriers to membrane protein structure determination

    NARCIS (Netherlands)

    Bill, Roslyn M.; Henderson, Peter J. F.; Iwata, So; Kunji, Edmund R. S.; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G.; Vogel, Horst

    After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new

  20. Predicting protein amidation sites by orchestrating amino acid sequence features

    Science.gov (United States)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  1. Structural Basis for Target Protein Regcognition by Thiredoxin

    DEFF Research Database (Denmark)

    Maeda, Kenji

    2007-01-01

    Ser) and a mutant of an in vitro substrate alpha-amylase/subtilisin inhibitor (BASI) (Cys144Ser), as a reaction intermediate-mimic of Trx-catalyzed disulfide reduction. The resultant structure showed a sequence of BASI residues along a conserved hydrophobic groove constituted of three loop segments...... of Trx-fold proteins glutaredoxin and glutathione transferase. This study suggests that the features of main chain conformation as well as charge property around disulfide bonds in protein substrates are important factors for interaction with Trx. Moreover, this study describes a detailed structural......Thioredoxin (Trx) is an ubiquitous protein disulfide reductase that possesses two redox active cysteines in the conserved active site sequence motif, Trp-CysN-Gly/Pro-Pro-CysC situated in the so called Trx-fold. The lack of insight into the protein substrate recognition mechanism of Trx has to date...

  2. Fundamental Characteristics of AAA+ Protein Family Structure and Function.

    Science.gov (United States)

    Miller, Justin M; Enemark, Eric J

    2016-01-01

    Many complex cellular events depend on multiprotein complexes known as molecular machines to efficiently couple the energy derived from adenosine triphosphate hydrolysis to the generation of mechanical force. Members of the AAA+ ATPase superfamily (ATPases Associated with various cellular Activities) are critical components of many molecular machines. AAA+ proteins are defined by conserved modules that precisely position the active site elements of two adjacent subunits to catalyze ATP hydrolysis. In many cases, AAA+ proteins form a ring structure that translocates a polymeric substrate through the central channel using specialized loops that project into the central channel. We discuss the major features of AAA+ protein structure and function with an emphasis on pivotal aspects elucidated with archaeal proteins.

  3. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  4. Structural conceptual models of water-conducting features at Aespoe

    International Nuclear Information System (INIS)

    Bossart, P.; Mazurek, M.; Hermansson, Jan

    1998-01-01

    Within the framework of the Fracture Classification and Characterization Project (FCC), water conducting features (WCF) in the Aespoe tunnel system and on the surface of Aespoe Island are being characterized over a range of scales. The larger-scale hierarchies of WCF are mostly constituted of fault arrays, i.e. brittle structures that accommodated episodes of shear strain. The smaller-scale WCF (contained within blocks 1 m. Structural evidence indicates that the fractures within the TRUE-1 block constitute an interconnected system with a pronounced anisotropy

  5. A 'periodic table' for protein structures.

    Science.gov (United States)

    Taylor, William R

    2002-04-11

    Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.

  6. Feature selection and nearest centroid classification for protein mass spectrometry

    Directory of Open Access Journals (Sweden)

    Levner Ilya

    2005-03-01

    Full Text Available Abstract Background The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. Results This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. Conclusion This study tested a number of popular feature

  7. Structural and functional features of lysine acetylation of plant and animal tubulins.

    Science.gov (United States)

    Rayevsky, Alexey V; Sharifi, Mohsen; Samofalova, Dariya A; Karpov, Pavel A; Blume, Yaroslav B

    2017-10-10

    The study of the genome and the proteome of different species and representatives of distinct kingdoms, especially detection of proteome via wide-scaled analyses has various challenges and pitfalls. Attempts to combine all available information together and isolate some common features for determination of the pathway and their mechanism of action generally have a highly complicated nature. However, microtubule (MT) monomers are highly conserved protein structures, and microtubules are structurally conserved from Homo sapiens to Arabidopsis thaliana. The interaction of MT elements with microtubule-associated proteins and post-translational modifiers is fully dependent on protein interfaces, and almost all MT modifications are well described except acetylation. Crystallography and interactome data using different approaches were combined to identify conserved proteins important in acetylation of microtubules. Application of computational methods and comparative analysis of binding modes generated a robust predictive model of acetylation of the ϵ-amino group of Lys40 in α-tubulins. In turn, the model discarded some probable mechanisms of interaction between elements of interest. Reconstruction of unresolved protein structures was carried out with modeling by homology to the existing crystal structure (PDBID: 1Z2B) from B. taurus using Swiss-model server, followed by a molecular dynamics simulation. Docking of the human tubulin fragment with Lys40 into the active site of α-tubulin acetyltransferase, reproduces the binding mode of peptidomimetic from X-ray structure (PDBID: 4PK3). © 2017 International Federation for Cell Biology.

  8. An improved classification of G-protein-coupled receptors using sequence-derived features

    Directory of Open Access Journals (Sweden)

    Peng Zhen-Ling

    2010-08-01

    Full Text Available Abstract Background G-protein-coupled receptors (GPCRs play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. Results We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. Conclusions The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid

  9. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study

    Directory of Open Access Journals (Sweden)

    A Santos Jose C

    2012-07-01

    Full Text Available Abstract Background There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP, which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. Results The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. Conclusions In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.

  10. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study.

    Science.gov (United States)

    A Santos, Jose C; Nassif, Houssam; Page, David; Muggleton, Stephen H; E Sternberg, Michael J

    2012-07-11

    There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.

  11. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    Science.gov (United States)

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  12. Structural analysis of recombinant human protein QM

    International Nuclear Information System (INIS)

    Gualberto, D.C.H.; Fernandes, J.L.; Silva, F.S.; Saraiva, K.W.; Affonso, R.; Pereira, L.M.; Silva, I.D.C.G.

    2012-01-01

    Full text: The ribosomal protein QM belongs to a family of ribosomal proteins, which is highly conserved from yeast to humans. The presence of the QM protein is necessary for joining the 60S and 40S subunits in a late step of the initiation of mRNA translation. Although the exact extra-ribosomal functions of QM are not yet fully understood, it has been identified as a putative tumor suppressor. This protein was reported to interact with the transcription factor c-Jun and thereby prevent c-Jun actives genes of the cellular growth. In this study, the human QM protein was expressed in bacterial system, in the soluble form and this structure was analyzed by Circular Dichroism and Fluorescence. The results of Circular Dichroism showed that this protein has less alpha helix than beta sheet, as described in the literature. QM protein does not contain a leucine zipper region; however the ion zinc is necessary for binding of QM to c-Jun. Then we analyzed the relationship between the removal of zinc ions and folding of protein. Preliminary results obtained by the technique Fluorescence showed a gradual increase in fluorescence with the addition of increasing concentration of EDTA. This suggests that the zinc is important in the tertiary structure of the protein. More studies are being made for better understand these results. (author)

  13. SCOWLP classification: Structural comparison and analysis of protein binding regions

    Directory of Open Access Journals (Sweden)

    Anders Gerd

    2008-01-01

    Full Text Available Abstract Background Detailed information about protein interactions is critical for our understanding of the principles governing protein recognition mechanisms. The structures of many proteins have been experimentally determined in complex with different ligands bound either in the same or different binding regions. Thus, the structural interactome requires the development of tools to classify protein binding regions. A proper classification may provide a general view of the regions that a protein uses to bind others and also facilitate a detailed comparative analysis of the interacting information for specific protein binding regions at atomic level. Such classification might be of potential use for deciphering protein interaction networks, understanding protein function, rational engineering and design. Description Protein binding regions (PBRs might be ideally described as well-defined separated regions that share no interacting residues one another. However, PBRs are often irregular, discontinuous and can share a wide range of interacting residues among them. The criteria to define an individual binding region can be often arbitrary and may differ from other binding regions within a protein family. Therefore, the rational behind protein interface classification should aim to fulfil the requirements of the analysis to be performed. We extract detailed interaction information of protein domains, peptides and interfacial solvent from the SCOWLP database and we classify the PBRs of each domain family. For this purpose, we define a similarity index based on the overlapping of interacting residues mapped in pair-wise structural alignments. We perform our classification with agglomerative hierarchical clustering using the complete-linkage method. Our classification is calculated at different similarity cut-offs to allow flexibility in the analysis of PBRs, feature especially interesting for those protein families with conflictive binding regions

  14. Classification of protein-protein interaction full-text documents using text and citation network features.

    Science.gov (United States)

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

  15. Protein Structure Determination Using Chemical Shifts

    DEFF Research Database (Denmark)

    Christensen, Anders Steen

    is determined using only chemical shifts recorded and assigned through automated processes. The CARMSD to the experimental X-ray for this structure is 1.1. Å. Additionally, the method is combined with very sparse NOE-restraints and evolutionary distance restraints and tested on several protein structures >100...

  16. On characterization of anisotropic plant protein structures

    NARCIS (Netherlands)

    Krintiras, G.A.; Göbel, J.; Bouwman, W.G.; Goot, van der A.J.; Stefanidis, G.D.

    2014-01-01

    In this paper, a set of complementary techniques was used to characterize surface and bulk structures of an anisotropic Soy Protein Isolate (SPI)–vital wheat gluten blend after it was subjected to heat and simple shear flow in a Couette Cell. The structured biopolymer blend can form a basis for a

  17. Hidden Structural Codes in Protein Intrinsic Disorder.

    Science.gov (United States)

    Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

    2017-10-17

    Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.

  18. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    Energy Technology Data Exchange (ETDEWEB)

    Cao, Haibo [Iowa State Univ., Ames, IA (United States)

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  19. Protein structure recognition: From eigenvector analysis to structural threading method

    Science.gov (United States)

    Cao, Haibo

    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.

  20. Protein Structure Recognition: From Eigenvector Analysis to Structural Threading Method

    International Nuclear Information System (INIS)

    Haibo Cao

    2003-01-01

    In this work, they try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. They found a strong correlation between amino acid sequences and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, they give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part includes discussions of interactions among amino acids residues, lattice HP model, and the design ability principle. In the second part, they try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in the eigenvector study of protein contact matrix. They believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, they discuss a threading method based on the correlation between amino acid sequences and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, they list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches

  1. Improved protein structure reconstruction using secondary structures, contacts at higher distance thresholds, and non-contacts.

    Science.gov (United States)

    Adhikari, Badri; Cheng, Jianlin

    2017-08-29

    Residue-residue contacts are key features for accurate de novo protein structure prediction. For the optimal utilization of these predicted contacts in folding proteins accurately, it is important to study the challenges of reconstructing protein structures using true contacts. Because contact-guided protein modeling approach is valuable for predicting the folds of proteins that do not have structural templates, it is necessary for reconstruction studies to focus on hard-to-predict protein structures. Using a data set consisting of 496 structural domains released in recent CASP experiments and a dataset of 150 representative protein structures, in this work, we discuss three techniques to improve the reconstruction accuracy using true contacts - adding secondary structures, increasing contact distance thresholds, and adding non-contacts. We find that reconstruction using secondary structures and contacts can deliver accuracy higher than using full contact maps. Similarly, we demonstrate that non-contacts can improve reconstruction accuracy not only when the used non-contacts are true but also when they are predicted. On the dataset consisting of 150 proteins, we find that by simply using low ranked predicted contacts as non-contacts and adding them as additional restraints, can increase the reconstruction accuracy by 5% when the reconstructed models are evaluated using TM-score. Our findings suggest that secondary structures are invaluable companions of contacts for accurate reconstruction. Confirming some earlier findings, we also find that larger distance thresholds are useful for folding many protein structures which cannot be folded using the standard definition of contacts. Our findings also suggest that for more accurate reconstruction using predicted contacts it is useful to predict contacts at higher distance thresholds (beyond 8 Å) and predict non-contacts.

  2. An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction

    Directory of Open Access Journals (Sweden)

    Thahir Mohamed

    2012-11-01

    Full Text Available Abstract Background Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting' unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. Results We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in

  3. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  4. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Hazrati, Mehrnaz Khodam; Kalies, Kai-Uwe; Martinetz, Thomas

    2011-01-01

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  5. Deep Convolutional Neural Networks: Structure, Feature Extraction and Training

    Directory of Open Access Journals (Sweden)

    Namatēvs Ivars

    2017-12-01

    Full Text Available Deep convolutional neural networks (CNNs are aimed at processing data that have a known network like topology. They are widely used to recognise objects in images and diagnose patterns in time series data as well as in sensor data classification. The aim of the paper is to present theoretical and practical aspects of deep CNNs in terms of convolution operation, typical layers and basic methods to be used for training and learning. Some practical applications are included for signal and image classification. Finally, the present paper describes the proposed block structure of CNN for classifying crucial features from 3D sensor data.

  6. Structural deformation upon protein-protein interaction: a structural alphabet approach.

    Science.gov (United States)

    Martin, Juliette; Regad, Leslie; Lecornet, Hélène; Camproux, Anne-Claude

    2008-02-28

    In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  7. Structural deformation upon protein-protein interaction: A structural alphabet approach

    Directory of Open Access Journals (Sweden)

    Lecornet Hélène

    2008-02-01

    Full Text Available Abstract Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%. This proportion is even greater in the interface regions (41%. Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking.

  8. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.

    KAUST Repository

    Schaefer, Christian

    2010-01-16

    MOTIVATION: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. RESULTS: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely.

  9. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.

    KAUST Repository

    Schaefer, Christian; Schlessinger, Avner; Rost, Burkhard

    2010-01-01

    MOTIVATION: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. RESULTS: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely.

  10. Ultrafast protein structure-based virtual screening with Panther

    Science.gov (United States)

    Niinivehmas, Sanna P.; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T.

    2015-10-01

    Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.

  11. Structure and Function of Caltrin (cium ansport hibitor Proteins

    Directory of Open Access Journals (Sweden)

    Ernesto Javier Grasso

    2017-12-01

    Full Text Available Caltrin ( cal cium tr ansport in hibitor is a family of small and basic proteins of the mammalian seminal plasma which bind to sperm cells during ejaculation and inhibit the extracellular Ca 2+ uptake, preventing the premature acrosomal exocytosis and hyperactivation when sperm cells ascend through the female reproductive tract. The binding of caltrin proteins to specific areas of the sperm surface suggests the existence of caltrin receptors, or precise protein-phospholipid arrangements in the sperm membrane, distributed in the regions where Ca 2+ influx may take place. However, the molecular mechanisms of recognition and interaction between caltrin and spermatozoa have not been elucidated. Therefore, the aim of this article is to describe in depth the known structural features and functional properties of caltrin proteins, to find out how they may possibly interact with the sperm membranes to control the intracellular signaling that trigger physiological events required for fertilization.

  12. Beta-structures in fibrous proteins.

    Science.gov (United States)

    Kajava, Andrey V; Squire, John M; Parry, David A D

    2006-01-01

    The beta-form of protein folding, one of the earliest protein structures to be defined, was originally observed in studies of silks. It was then seen in early studies of synthetic polypeptides and, of course, is now known to be present in a variety of guises as an essential component of globular protein structures. However, in the last decade or so it has become clear that the beta-conformation of chains is present not only in many of the amyloid structures associated with, for example, Alzheimer's Disease, but also in the prion structures associated with the spongiform encephalopathies. Furthermore, X-ray crystallography studies have revealed the high incidence of the beta-fibrous proteins among virulence factors of pathogenic bacteria and viruses. Here we describe the basic forms of the beta-fold, summarize the many different new forms of beta-structural fibrous arrangements that have been discovered, and review advances in structural studies of amyloid and prion fibrils. These and other issues are described in detail in later chapters.

  13. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    Science.gov (United States)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  14. Fibrous Protein Structures: Hierarchy, History and Heroes.

    Science.gov (United States)

    Squire, John M; Parry, David A D

    2017-01-01

    During the 1930s and 1940s the technique of X-ray diffraction was applied widely by William Astbury and his colleagues to a number of naturally-occurring fibrous materials. On the basis of the diffraction patterns obtained, he observed that the structure of each of the fibres was dominated by one of a small number of different types of molecular conformation. One group of fibres, known as the k-m-e-f group of proteins (keratin - myosin - epidermin - fibrinogen), gave rise to diffraction characteristics that became known as the α-pattern. Others, such as those from a number of silks, gave rise to a different pattern - the β-pattern, while connective tissues yielded a third unique set of diffraction characteristics. At the time of Astbury's work, the structures of these materials were unknown, though the spacings of the main X-ray reflections gave an idea of the axial repeats and the lateral packing distances. In a breakthrough in the early 1950s, the basic structures of all of these fibrous proteins were determined. It was found that the long protein chains, composed of strings of amino acids, could be folded up in a systematic manner to generate a limited number of structures that were consistent with the X-ray data. The most important of these were known as the α-helix, the β-sheet, and the collagen triple helix. These studies provided information about the basic building blocks of all proteins, both fibrous and globular. They did not, however, provide detailed information about how these molecules packed together in three-dimensions to generate the fibres found in vivo. A number of possible packing arrangements were subsequently deduced from the X-ray diffraction and other data, but it is only in the last few years, through the continued improvements of electron microscopy, that the packing details within some fibrous proteins can now be seen directly. Here we outline briefly some of the milestones in fibrous protein structure determination, the role of the

  15. Systematic comparison of crystal and NMR protein structures deposited in the protein data bank.

    Science.gov (United States)

    Sikic, Kresimir; Tomic, Sanja; Carugo, Oliviero

    2010-09-03

    Nearly all the macromolecular three-dimensional structures deposited in Protein Data Bank were determined by either crystallographic (X-ray) or Nuclear Magnetic Resonance (NMR) spectroscopic methods. This paper reports a systematic comparison of the crystallographic and NMR results deposited in the files of the Protein Data Bank, in order to find out to which extent these information can be aggregated in bioinformatics. A non-redundant data set containing 109 NMR - X-ray structure pairs of nearly identical proteins was derived from the Protein Data Bank. A series of comparisons were performed by focusing the attention towards both global features and local details. It was observed that: (1) the RMDS values between NMR and crystal structures range from about 1.5 Å to about 2.5 Å; (2) the correlation between conformational deviations and residue type reveals that hydrophobic amino acids are more similar in crystal and NMR structures than hydrophilic amino acids; (3) the correlation between solvent accessibility of the residues and their conformational variability in solid state and in solution is relatively modest (correlation coefficient = 0.462); (4) beta strands on average match better between NMR and crystal structures than helices and loops; (5) conformational differences between loops are independent of crystal packing interactions in the solid state; (6) very seldom, side chains buried in the protein interior are observed to adopt different orientations in the solid state and in solution.

  16. A Kernel for Protein Secondary Structure Prediction

    OpenAIRE

    Guermeur , Yann; Lifchitz , Alain; Vert , Régis

    2004-01-01

    http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10338&mode=toc; International audience; Multi-class support vector machines have already proved efficient in protein secondary structure prediction as ensemble methods, to combine the outputs of sets of classifiers based on different principles. In this chapter, their implementation as basic prediction methods, processing the primary structure or the profile of multiple alignments, is investigated. A kernel devoted to the task is in...

  17. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    Science.gov (United States)

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  18. 3D bioprinting of structural proteins.

    Science.gov (United States)

    Włodarczyk-Biegun, Małgorzata K; Del Campo, Aránzazu

    2017-07-01

    3D bioprinting is a booming method to obtain scaffolds of different materials with predesigned and customized morphologies and geometries. In this review we focus on the experimental strategies and recent achievements in the bioprinting of major structural proteins (collagen, silk, fibrin), as a particularly interesting technology to reconstruct the biochemical and biophysical composition and hierarchical morphology of natural scaffolds. The flexibility in molecular design offered by structural proteins, combined with the flexibility in mixing, deposition, and mechanical processing inherent to bioprinting technologies, enables the fabrication of highly functional scaffolds and tissue mimics with a degree of complexity and organization which has only just started to be explored. Here we describe the printing parameters and physical (mechanical) properties of bioinks based on structural proteins, including the biological function of the printed scaffolds. We describe applied printing techniques and cross-linking methods, highlighting the modifications implemented to improve scaffold properties. The used cell types, cell viability, and possible construct applications are also reported. We envision that the application of printing technologies to structural proteins will enable unprecedented control over their supramolecular organization, conferring printed scaffolds biological properties and functions close to natural systems. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Functions and structures of eukaryotic recombination proteins

    International Nuclear Information System (INIS)

    Ogawa, Tomoko

    1994-01-01

    We have found that Rad51 and RecA Proteins form strikingly similar structures together with dsDNA and ATP. Their right handed helical nucleoprotein filaments extend the B-form DNA double helixes to 1.5 times in length and wind the helix. The similarity and uniqueness of their structures must reflect functional homologies between these proteins. Therefore, it is highly probable that similar recombination proteins are present in various organisms of different evolutional states. We have succeeded to clone RAD51 genes from human, mouse, chicken and fission yeast genes, and found that the homologues are widely distributed in eukaryotes. The HsRad51 and MmRad51 or ChRad51 proteins consist of 339 amino acids differing only by 4 or 12 amino acids, respectively, and highly homologous to both yeast proteins, but less so to Dmcl. All of these proteins are homologous to the region from residues 33 to 240 of RecA which was named ''homologous core. The homologous core is likely to be responsible for functions common for all of them, such as the formation of helical nucleoprotein filament that is considered to be involved in homologous pairing in the recombination reaction. The mouse gene is transcribed at a high level in thymus, spleen, testis, and ovary, at lower level in brain and at a further lower level in some other tissues. It is transcribed efficiently in recombination active tissues. A clear functional difference of Rad51 homologues from RecA was suggested by the failure of heterologous genes to complement the deficiency of Scrad51 mutants. This failure seems to reflect the absence of a compatible partner, such as ScRad52 protein in the case of ScRad51 protein, between different species. Thus, these discoveries play a role of the starting point to understand the fundamental gene targeting in mammalian cells and in gene therapy. (J.P.N.)

  20. Design, construction and operation features of high-rise structures

    Science.gov (United States)

    Mylnik, Alexey; Mylnik, Vladimir; Zubeeva, Elena; Mukhamedzhanova, Olga

    2018-03-01

    The article considers design, construction and operation features of high-rise facilities. The analysis of various situations, that come from improper designing, construction and operation of unique facilities, is carried out. The integrated approach is suggested, when the problems of choosing acceptable constructional solutions related to the functional purpose, architectural solutions, methods of manufacturing and installation, operating conditions for unique buildings and structures are being tackled. A number of main causes for the emergency destruction of objects under construction and operation is considered. A number of measures are proposed on the basis of factor classification in order to efficiently prevent the situations, when various negative options of design loads and emergency impacts occur.

  1. Structural and sequence features of two residue turns in beta-hairpins.

    Science.gov (United States)

    Madan, Bharat; Seo, Sung Yong; Lee, Sun-Gu

    2014-09-01

    Beta-turns in beta-hairpins have been implicated as important sites in protein folding. In particular, two residue β-turns, the most abundant connecting elements in beta-hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta-hairpins with a large data set. For this, 3977 beta-turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta-hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two-residue turns, and their distributions were examined. From this study, we could identify code-like sequence motifs for the two residue beta-turn types. Finally, structural and sequence properties of beta-strands in the beta-hairpins were analyzed, which revealed that the beta-strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta-hairpin turn structures and sequences. © 2014 Wiley Periodicals, Inc.

  2. Detection and analysis of unusual features in the structural model and structure-factor data of a birch pollen allergen

    International Nuclear Information System (INIS)

    Rupp, Bernhard

    2012-01-01

    The structure factors deposited with PDB entry 3k78 show properties inconsistent with experimentally observed diffraction data, and without uncertainty represent calculated structure factors. The refinement of the model against these structure factors leads to an isomorphous structure different from the deposited model with an implausibly small R value (0.019). Physically improbable features in the model of the birch pollen structure Bet v 1d are faithfully reproduced in electron density generated with the deposited structure factors, but these structure factors themselves exhibit properties that are characteristic of data calculated from a simple model and are inconsistent with the data and error model obtained through experimental measurements. The refinement of the model against these structure factors leads to an isomorphous structure different from the deposited model with an implausibly small R value (0.019). The abnormal refinement is compared with normal refinement of an isomorphous variant structure of Bet v 1l. A variety of analytical tools, including the application of Diederichs plots, Rσ plots and bulk-solvent analysis are discussed as promising aids in validation. The examination of the Bet v 1d structure also cautions against the practice of indicating poorly defined protein chain residues through zero occupancies. The recommendation to preserve diffraction images is amplified

  3. Leaderless Transcripts and Small Proteins Are Common Features of the Mycobacterial Translational Landscape.

    Directory of Open Access Journals (Sweden)

    Scarlet S Shell

    2015-11-01

    Full Text Available RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5' untranslated region (UTR and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5' end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5' ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5' UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression.

  4. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    Science.gov (United States)

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  5. 3DProIN: Protein-Protein Interaction Networks and Structure Visualization.

    Science.gov (United States)

    Li, Hui; Liu, Chunmei

    2014-06-14

    3DProIN is a computational tool to visualize protein-protein interaction networks in both two dimensional (2D) and three dimensional (3D) view. It models protein-protein interactions in a graph and explores the biologically relevant features of the tertiary structures of each protein in the network. Properties such as color, shape and name of each node (protein) of the network can be edited in either 2D or 3D views. 3DProIN is implemented using 3D Java and C programming languages. The internet crawl technique is also used to parse dynamically grasped protein interactions from protein data bank (PDB). It is a java applet component that is embedded in the web page and it can be used on different platforms including Linux, Mac and Window using web browsers such as Firefox, Internet Explorer, Chrome and Safari. It also was converted into a mac app and submitted to the App store as a free app. Mac users can also download the app from our website. 3DProIN is available for academic research at http://bicompute.appspot.com.

  6. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features

    Directory of Open Access Journals (Sweden)

    Zhang Ya-Nan

    2012-05-01

    Full Text Available Abstract Background Adenosine-5′-triphosphate (ATP is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex. Results In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary. Conclusions Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.

  7. Discrete Haar transform and protein structure.

    Science.gov (United States)

    Morosetti, S

    1997-12-01

    The discrete Haar transform of the sequence of the backbone dihedral angles (phi and psi) was performed over a set of X-ray protein structures of high resolution from the Brookhaven Protein Data Bank. Afterwards, the new dihedral angles were calculated by the inverse transform, using a growing number of Haar functions, from the lower to the higher degree. New structures were obtained using these dihedral angles, with standard values for bond lengths and angles, and with omega = 0 degree. The reconstructed structures were compared with the experimental ones, and analyzed by visual inspection and statistical analysis. When half of the Haar coefficients were used, all the reconstructed structures were not yet collapsed to a tertiary folding, but they showed yet realized most of the secondary motifs. These results indicate a substantial separation of structural information in the space of Haar transform, with the secondary structural information mainly present in the Haar coefficients of lower degrees, and the tertiary one present in the higher degree coefficients. Because of this separation, the representation of the folded structures in the space of Haar transform seems a promising candidate to encompass the problem of premature convergence in genetic algorithms.

  8. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

    Science.gov (United States)

    Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

    2017-01-01

    DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  9. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features

    Directory of Open Access Journals (Sweden)

    Rianon Zaman

    2017-01-01

    Full Text Available DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  10. Automated Protein Structure Modeling with SWISS-MODEL Workspace and the Protein Model Portal

    OpenAIRE

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of appl...

  11. Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.

    Science.gov (United States)

    Ladunga, I

    1992-04-01

    The markedly nonuniform, even systematic distribution of sequences in the protein "universe" has been analyzed by methods of protein taxonomy. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of proteins that seem to be natural structural groupings, possibly seeds for a future protein taxonomy. The aim was not to force proteins into more or less man-made categories by discriminant analysis, but to find structurally similar groups, possibly of common evolutionary origin. Single-valued distance measures between pairs of superfamilies from the Protein Identification Resource were defined by two chi 2-like methods on tripeptide frequencies and the variable-length subsequence identity method derived from dot-matrix comparisons. Distance matrices were processed by several methods of cluster analysis to detect phylogenetic continuum between highly divergent proteins. Only well-defined clusters characterized by relatively unique structural, intracellular environmental, organismal, and functional attribute states were selected as major protein groups, including subsets of viral and Escherichia coli proteins, hormones, inhibitors, plant, ribosomal, serum and structural proteins, amino acid synthases, and clusters dominated by certain oxidoreductases and apolar and DNA-associated enzymes. The limited repertoire of functional patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle canalize certain proteins of viruses and Gram-negative bacteria, respectively, to organismal groups.

  12. Structural basis for precursor protein-directed ribosomal peptide macrocyclization

    Science.gov (United States)

    Li, Kunhua; Condurso, Heather L.; Li, Gengnan; Ding, Yousong; Bruner, Steven D.

    2016-01-01

    Macrocyclization is a common feature of natural product biosynthetic pathways including the diverse family of ribosomal peptides. Microviridins are architecturally complex cyanobacterial ribosomal peptides whose members target proteases with potent reversible inhibition. The product structure is constructed by three macrocyclizations catalyzed sequentially by two members of the ATP-grasp family, a unique strategy for ribosomal peptide macrocyclization. Here, we describe the detailed structural basis for the enzyme-catalyzed macrocyclizations in the microviridin J pathway of Microcystis aeruginosa. The macrocyclases, MdnC and MdnB, interact with a conserved α-helix of the precursor peptide using a novel precursor peptide recognition mechanism. The results provide insight into the unique protein/protein interactions key to the chemistry, suggest an origin of the natural combinatorial synthesis of microviridin peptides and provide a framework for future engineering efforts to generate designed compounds. PMID:27669417

  13. Structural basis for precursor protein-directed ribosomal peptide macrocyclization.

    Science.gov (United States)

    Li, Kunhua; Condurso, Heather L; Li, Gengnan; Ding, Yousong; Bruner, Steven D

    2016-11-01

    Macrocyclization is a common feature of natural product biosynthetic pathways including the diverse family of ribosomal peptides. Microviridins are architecturally complex cyanobacterial ribosomal peptides that target proteases with potent reversible inhibition. The product structure is constructed via three macrocyclizations catalyzed sequentially by two members of the ATP-grasp family, a unique strategy for ribosomal peptide macrocyclization. Here we describe in detail the structural basis for the enzyme-catalyzed macrocyclizations in the microviridin J pathway of Microcystis aeruginosa. The macrocyclases MdnC and MdnB interact with a conserved α-helix of the precursor peptide using a novel precursor-peptide recognition mechanism. The results provide insight into the unique protein-protein interactions that are key to the chemistry, suggest an origin for the natural combinatorial synthesis of microviridin peptides, and provide a framework for future engineering efforts to generate designed compounds.

  14. Wetting of nonconserved residue-backbones: A feature indicative of aggregation associated regions of proteins.

    Science.gov (United States)

    Pradhan, Mohan R; Pal, Arumay; Hu, Zhongqiao; Kannan, Srinivasaraghavan; Chee Keong, Kwoh; Lane, David P; Verma, Chandra S

    2016-02-01

    Aggregation is an irreversible form of protein complexation and often toxic to cells. The process entails partial or major unfolding that is largely driven by hydration. We model the role of hydration in aggregation using "Dehydrons." "Dehydrons" are unsatisfied backbone hydrogen bonds in proteins that seek shielding from water molecules by associating with ligands or proteins. We find that the residues at aggregation interfaces have hydrated backbones, and in contrast to other forms of protein-protein interactions, are under less evolutionary pressure to be conserved. Combining evolutionary conservation of residues and extent of backbone hydration allows us to distinguish regions on proteins associated with aggregation (non-conserved dehydron-residues) from other interaction interfaces (conserved dehydron-residues). This novel feature can complement the existing strategies used to investigate protein aggregation/complexation. © 2015 Wiley Periodicals, Inc.

  15. Identification of protein features encoded by alternative exons using Exon Ontology.

    Science.gov (United States)

    Tranchevent, Léon-Charles; Aubé, Fabien; Dulaurier, Louis; Benoit-Pilven, Clara; Rey, Amandine; Poret, Arnaud; Chautard, Emilie; Mortada, Hussein; Desmet, François-Olivier; Chakrama, Fatima Zahra; Moreno-Garcia, Maira Alejandra; Goillot, Evelyne; Janczarski, Stéphane; Mortreux, Franck; Bourgeois, Cyril F; Auboeuf, Didier

    2017-06-01

    Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named "Exon Ontology," based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information. © 2017 Tranchevent et al.; Published by Cold Spring Harbor Laboratory Press.

  16. Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression

    Directory of Open Access Journals (Sweden)

    Perera Rajika L

    2004-12-01

    Full Text Available Abstract Background In the search for generic expression strategies for mammalian protein families several bacterial expression vectors were examined for their ability to promote high yields of soluble protein. Proteins studied included cell surface receptors (Ephrins and Eph receptors, CD44, kinases (EGFR-cytoplasmic domain, CDK2 and 4, proteases (MMP1, CASP2, signal transduction proteins (GRB2, RAF1, HRAS and transcription factors (GATA2, Fli1, Trp53, Mdm2, JUN, FOS, MAD, MAX. Over 400 experiments were performed where expression of 30 full-length proteins and protein domains were evaluated with 6 different N-terminal and 8 C-terminal fusion partners. Expression of an additional set of 95 mammalian proteins was also performed to test the conclusions of this study. Results Several protein features correlated with soluble protein expression yield including molecular weight and the number of contiguous hydrophobic residues and low complexity regions. There was no relationship between successful expression and protein pI, grand average of hydropathicity (GRAVY, or sub-cellular location. Only small globular cytoplasmic proteins with an average molecular weight of 23 kDa did not require a solubility enhancing tag for high level soluble expression. Thioredoxin (Trx and maltose binding protein (MBP were the best N-terminal protein fusions to promote soluble expression, but MBP was most effective as a C-terminal fusion. 63 of 95 mammalian proteins expressed at soluble levels of greater than 1 mg/l as N-terminal H10-MBP fusions and those that failed possessed, on average, a higher molecular weight and greater number of contiguous hydrophobic amino acids and low complexity regions. Conclusions By analysis of the protein features identified here, this study will help predict which mammalian proteins and domains can be successfully expressed in E. coli as soluble product and also which are best targeted for a eukaryotic expression system. In some cases

  17. Defining an essence of structure determining residue contacts in proteins.

    Science.gov (United States)

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the

  18. Efficient identification of critical residues based only on protein structure by network analysis.

    Directory of Open Access Journals (Sweden)

    Michael P Cusack

    2007-05-01

    Full Text Available Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins.

  19. Structural interface parameters are discriminatory in recognising near-native poses of protein-protein interactions.

    Directory of Open Access Journals (Sweden)

    Sony Malhotra

    Full Text Available Interactions at the molecular level in the cellular environment play a very crucial role in maintaining the physiological functioning of the cell. These molecular interactions exist at varied levels viz. protein-protein interactions, protein-nucleic acid interactions or protein-small molecules interactions. Presently in the field, these interactions and their mechanisms mark intensively studied areas. Molecular interactions can also be studied computationally using the approach named as Molecular Docking. Molecular docking employs search algorithms to predict the possible conformations for interacting partners and then calculates interaction energies. However, docking proposes number of solutions as different docked poses and hence offers a serious challenge to identify the native (or near native structures from the pool of these docked poses. Here, we propose a rigorous scoring scheme called DockScore which can be used to rank the docked poses and identify the best docked pose out of many as proposed by docking algorithm employed. The scoring identifies the optimal interactions between the two protein partners utilising various features of the putative interface like area, short contacts, conservation, spatial clustering and the presence of positively charged and hydrophobic residues. DockScore was first trained on a set of 30 protein-protein complexes to determine the weights for different parameters. Subsequently, we tested the scoring scheme on 30 different protein-protein complexes and native or near-native structure were assigned the top rank from a pool of docked poses in 26 of the tested cases. We tested the ability of DockScore to discriminate likely dimer interactions that differ substantially within a homologous family and also demonstrate that DOCKSCORE can distinguish correct pose for all 10 recent CAPRI targets.

  20. PCNA Structure and Interactions with Partner Proteins

    KAUST Repository

    Oke, Muse; Zaher, Manal S.; Hamdan, Samir

    2018-01-01

    Proliferating cell nuclear antigen (PCNA) consists of three identical monomers that topologically encircle double-stranded DNA. PCNA stimulates the processivity of DNA polymerase δ and, to a less extent, the intrinsically highly processive DNA polymerase ε. It also functions as a platform that recruits and coordinates the activities of a large number of DNA processing proteins. Emerging structural and biochemical studies suggest that the nature of PCNA-partner proteins interactions is complex. A hydrophobic groove at the front side of PCNA serves as a primary docking site for the consensus PIP box motifs present in many PCNA-binding partners. Sequences that immediately flank the PIP box motif or regions that are distant from it could also interact with the hydrophobic groove and other regions of PCNA. Posttranslational modifications on the backside of PCNA could add another dimension to its interaction with partner proteins. An encounter of PCNA with different DNA structures might also be involved in coordinating its interactions. Finally, the ability of PCNA to bind up to three proteins while topologically linked to DNA suggests that it would be a versatile toolbox in many different DNA processing reactions.

  1. PCNA Structure and Interactions with Partner Proteins

    KAUST Repository

    Oke, Muse

    2018-01-29

    Proliferating cell nuclear antigen (PCNA) consists of three identical monomers that topologically encircle double-stranded DNA. PCNA stimulates the processivity of DNA polymerase δ and, to a less extent, the intrinsically highly processive DNA polymerase ε. It also functions as a platform that recruits and coordinates the activities of a large number of DNA processing proteins. Emerging structural and biochemical studies suggest that the nature of PCNA-partner proteins interactions is complex. A hydrophobic groove at the front side of PCNA serves as a primary docking site for the consensus PIP box motifs present in many PCNA-binding partners. Sequences that immediately flank the PIP box motif or regions that are distant from it could also interact with the hydrophobic groove and other regions of PCNA. Posttranslational modifications on the backside of PCNA could add another dimension to its interaction with partner proteins. An encounter of PCNA with different DNA structures might also be involved in coordinating its interactions. Finally, the ability of PCNA to bind up to three proteins while topologically linked to DNA suggests that it would be a versatile toolbox in many different DNA processing reactions.

  2. Protein secondary structure: category assignment and predictability

    DEFF Research Database (Denmark)

    Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

    2001-01-01

    In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary......-forward neural network with one hidden layer on a data set identical to the one used in earlier work....

  3. Protein-mediated surface structuring in biomembranes

    Directory of Open Access Journals (Sweden)

    Maggio B.

    2005-01-01

    Full Text Available The lipids and proteins of biomembranes exhibit highly dissimilar conformations, geometrical shapes, amphipathicity, and thermodynamic properties which constrain their two-dimensional molecular packing, electrostatics, and interaction preferences. This causes inevitable development of large local tensions that frequently relax into phase or compositional immiscibility along lateral and transverse planes of the membrane. On the other hand, these effects constitute the very codes that mediate molecular and structural changes determining and controlling the possibilities for enzymatic activity, apposition and recombination in biomembranes. The presence of proteins constitutes a major perturbing factor for the membrane sculpturing both in terms of its surface topography and dynamics. We will focus on some results from our group within this context and summarize some recent evidence for the active involvement of extrinsic (myelin basic protein, integral (Folch-Lees proteolipid protein and amphitropic (c-Fos and c-Jun proteins, as well as a membrane-active amphitropic phosphohydrolytic enzyme (neutral sphingomyelinase, in the process of lateral segregation and dynamics of phase domains, sculpturing of the surface topography, and the bi-directional modulation of the membrane biochemical reactivity.

  4. Mapping the structural and dynamical features of kinesin motor domains.

    Directory of Open Access Journals (Sweden)

    Guido Scarabelli

    Full Text Available Kinesin motor proteins drive intracellular transport by coupling ATP hydrolysis to conformational changes that mediate directed movement along microtubules. Characterizing these distinct conformations and their interconversion mechanism is essential to determining an atomic-level model of kinesin action. Here we report a comprehensive principal component analysis of 114 experimental structures along with the results of conventional and accelerated molecular dynamics simulations that together map the structural dynamics of the kinesin motor domain. All experimental structures were found to reside in one of three distinct conformational clusters (ATP-like, ADP-like and Eg5 inhibitor-bound. These groups differ in the orientation of key functional elements, most notably the microtubule binding α4-α5, loop8 subdomain and α2b-β4-β6-β7 motor domain tip. Group membership was found not to correlate with the nature of the bound nucleotide in a given structure. However, groupings were coincident with distinct neck-linker orientations. Accelerated molecular dynamics simulations of ATP, ADP and nucleotide free Eg5 indicate that all three nucleotide states could sample the major crystallographically observed conformations. Differences in the dynamic coupling of distal sites were also evident. In multiple ATP bound simulations, the neck-linker, loop8 and the α4-α5 subdomain display correlated motions that are absent in ADP bound simulations. Further dissection of these couplings provides evidence for a network of dynamic communication between the active site, microtubule-binding interface and neck-linker via loop7 and loop13. Additional simulations indicate that the mutations G325A and G326A in loop13 reduce the flexibility of these regions and disrupt their couplings. Our combined results indicate that the reported ATP and ADP-like conformations of kinesin are intrinsically accessible regardless of nucleotide state and support a model where neck

  5. Structure in galactic soft X-ray features

    International Nuclear Information System (INIS)

    Davelaar, J.

    1979-01-01

    Observations are described of the soft X-ray background in a part of the northern hemisphere in the energy range 0.06 - 3.0 keV. The X-ray instruments, placed onboard a sounding rocket, are a one-dimensional focusing collector with multi-cell proportional counters in the focal plane and eight large area counters on deployable panels. A description of the instruments and their preflight calibration is given. Precautions were taken to prevent UV sensitivity of the X-ray instruments. The observation program, which consisted of a number of pre-programmed slow scans, is outlined. The spectral date on the soft X-ray background in these and previous observations showed that at least two components of different temperature are present. A low temperature component of approximately (3-10)x10 5 is found all over the sky. Components of higher temperature approximately 3x10 6 K are found in regions of soft X-ray enhancement; The North Polar Spur has been observed in two scans at the galactic latitude b=25 0 and b=75 0 . The X-ray ridge structure is found to be strongly energy dependent. The low energy data ( 0 reveals two separate emission features on the ridge, both probably of finite extensions (approximately equal to 0 0 .5). A wider X-ray ridge (approximately equal to 10 0 ) is observed above 0.4 keV. (Auth.)

  6. Integrative approaches to the prediction of protein functions based on the feature selection

    Directory of Open Access Journals (Sweden)

    Lee Hyunju

    2009-12-01

    Full Text Available Abstract Background Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue. Results We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR, and a kernel based L1-norm regularized logistic regression (KL1LR. In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among

  7. Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences.

    Science.gov (United States)

    Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

    2015-03-07

    Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.

  8. Unique Structural Features of Influenza Virus H15 Hemagglutinin

    Energy Technology Data Exchange (ETDEWEB)

    Tzarum, Netanel; McBride, Ryan; Nycholat, Corwin M.; Peng, Wenjie; Paulson, James C.; Wilson, Ian A. (Scripps)

    2017-04-12

    Influenza A H15 viruses are members of a subgroup (H7-H10-H15) of group 2 hemagglutinin (HA) subtypes that include H7N9 and H10N8 viruses that were isolated from humans during 2013. The isolation of avian H15 viruses is, however, quite rare and, until recently, geographically restricted to wild shorebirds and waterfowl in Australia. The HAs of H15 viruses contain an insertion in the 150-loop (loop beginning at position 150) of the receptor-binding site common to this subgroup and a unique insertion in the 260-loop compared to any other subtype. Here, we show that the H15 HA has a high preference for avian receptor analogs by glycan array analyses. The H15 HA crystal structure reveals that it is structurally closest to H7N9 HA, but the head domain of the H15 trimer is wider than all other HAs due to a tilt and opening of the HA1 subunits of the head domain. The extended 150-loop of the H15 HA retains the conserved conformation as in H7 and H10 HAs. Furthermore, the elongated 260-loop increases the exposed HA surface and can contribute to antigenic variation in H15 HAs. Since avian-origin H15 HA viruses have been shown to cause enhanced disease in mammalian models, further characterization and immune surveillance of H15 viruses are warranted.

    IMPORTANCEIn the last 2 decades, an apparent increase has been reported for cases of human infection by emerging avian influenza A virus subtypes, including H7N9 and H10N8 viruses isolated during 2013. H15 is the other member of the subgroup of influenza A virus group 2 hemagglutinins (HAs) that also include H7 and H10. H15 viruses have been restricted to Australia, but recent isolation of H15 viruses in western Siberia suggests that they could be spread more globally via the avian flyways that converge and emanate from this region. Here we report on characterization of the three-dimensional structure and receptor specificity of the H15 hemagglutinin, revealing distinct features and specificities that can

  9. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    Science.gov (United States)

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot.

  10. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    Directory of Open Access Journals (Sweden)

    Huilin Wang

    Full Text Available X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM. Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I. Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II, which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization

  11. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    Science.gov (United States)

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  12. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    Science.gov (United States)

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  13. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Chira Camelia

    2011-07-01

    Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  14. FASTERp: A Feature Array Search Tool for Estimating Resemblance of Protein Sequences

    Energy Technology Data Exchange (ETDEWEB)

    Macklin, Derek; Egan, Rob; Wang, Zhong

    2014-03-14

    Metagenome sequencing efforts have provided a large pool of billions of genes for identifying enzymes with desirable biochemical traits. However, homology search with billions of genes in a rapidly growing database has become increasingly computationally impractical. Here we present our pilot efforts to develop a novel alignment-free algorithm for homology search. Specifically, we represent individual proteins as feature vectors that denote the presence or absence of short kmers in the protein sequence. Similarity between feature vectors is then computed using the Tanimoto score, a distance metric that can be rapidly computed on bit string representations of feature vectors. Preliminary results indicate good correlation with optimal alignment algorithms (Spearman r of 0.87, ~;;1,000,000 proteins from Pfam), as well as with heuristic algorithms such as BLAST (Spearman r of 0.86, ~;;1,000,000 proteins). Furthermore, a prototype of FASTERp implemented in Python runs approximately four times faster than BLAST on a small scale dataset (~;;1000 proteins). We are optimizing and scaling to improve FASTERp to enable rapid homology searches against billion-protein databases, thereby enabling more comprehensive gene annotation efforts.

  15. Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids.

    Science.gov (United States)

    Raicar, Gaurav; Saini, Harsh; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok

    2016-08-07

    Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Classification of proteins: available structural space for molecular modeling.

    Science.gov (United States)

    Andreeva, Antonina

    2012-01-01

    The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.

  17. Protein crystal structure analysis using synchrotron radiation at atomic resolution

    International Nuclear Information System (INIS)

    Nonaka, Takamasa

    1999-01-01

    We can now obtain a detailed picture of protein, allowing the identification of individual atoms, by interpreting the diffraction of X-rays from a protein crystal at atomic resolution, 1.2 A or better. As of this writing, about 45 unique protein structures beyond 1.2 A resolution have been deposited in the Protein Data Bank. This review provides a simplified overview of how protein crystallographers use such diffraction data to solve, refine, and validate protein structures. (author)

  18. The Structure of Neurexin 1[alpha] Reveals Features Promoting a Role as Synaptic Organizer

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Fang; Venugopal, Vandavasi; Murray, Beverly; Rudenko, Gabby (Michigan)

    2014-10-02

    {alpha}-Neurexins are essential synaptic adhesion molecules implicated in autism spectrum disorder and schizophrenia. The {alpha}-neurexin extracellular domain consists of six LNS domains interspersed by three EGF-like repeats and interacts with many different proteins in the synaptic cleft. To understand how {alpha}-neurexins might function as synaptic organizers, we solved the structure of the neurexin 1{alpha} extracellular domain (n1{alpha}) to 2.65 {angstrom}. The L-shaped molecule can be divided into a flexible repeat I (LNS1-EGF-A-LNS2), a rigid horseshoe-shaped repeat II (LNS3-EGF-B-LNS4) with structural similarity to so-called reelin repeats, and an extended repeat III (LNS5-EGF-B-LNS6) with controlled flexibility. A 2.95 {angstrom} structure of n1{alpha} carrying splice insert SS3 in LNS4 reveals that SS3 protrudes as a loop and does not alter the rigid arrangement of repeat II. The global architecture imposed by conserved structural features enables {alpha}-neurexins to recruit and organize proteins in distinct and variable ways, influenced by splicing, thereby promoting synaptic function.

  19. Structural classification of proteins using texture descriptors extracted from the cellular automata image.

    Science.gov (United States)

    Kavianpour, Hamidreza; Vasighi, Mahdi

    2017-02-01

    Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.

  20. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    Science.gov (United States)

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  1. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    Science.gov (United States)

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  2. Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection.

    Science.gov (United States)

    Kipli, Kuryati; Kouzani, Abbas Z

    2015-07-01

    Accurate detection of depression at an individual level using structural magnetic resonance imaging (sMRI) remains a challenge. Brain volumetric changes at a structural level appear to have importance in depression biomarkers studies. An automated algorithm is developed to select brain sMRI volumetric features for the detection of depression. A feature selection (FS) algorithm called degree of contribution (DoC) is developed for selection of sMRI volumetric features. This algorithm uses an ensemble approach to determine the degree of contribution in detection of major depressive disorder. The DoC is the score of feature importance used for feature ranking. The algorithm involves four stages: feature ranking, subset generation, subset evaluation, and DoC analysis. The performance of DoC is evaluated on the Duke University Multi-site Imaging Research in the Analysis of Depression sMRI dataset. The dataset consists of 115 brain sMRI scans of 88 healthy controls and 27 depressed subjects. Forty-four sMRI volumetric features are used in the evaluation. The DoC score of forty-four features was determined as the accuracy threshold (Acc_Thresh) was varied. The DoC performance was compared with that of four existing FS algorithms. At all defined Acc_Threshs, DoC outperformed the four examined FS algorithms for the average classification score and the maximum classification score. DoC has a good ability to generate reduced-size subsets of important features that could yield high classification accuracy. Based on the DoC score, the most discriminant volumetric features are those from the left-brain region.

  3. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  4. GIS: a comprehensive source for protein structure similarities.

    Science.gov (United States)

    Guerler, Aysam; Knapp, Ernst-Walter

    2010-07-01

    A web service for analysis of protein structures that are sequentially or non-sequentially similar was generated. Recently, the non-sequential structure alignment algorithm GANGSTA+ was introduced. GANGSTA+ can detect non-sequential structural analogs for proteins stated to possess novel folds. Since GANGSTA+ ignores the polypeptide chain connectivity of secondary structure elements (i.e. alpha-helices and beta-strands), it is able to detect structural similarities also between proteins whose sequences were reshuffled during evolution. GANGSTA+ was applied in an all-against-all comparison on the ASTRAL40 database (SCOP version 1.75), which consists of >10,000 protein domains yielding about 55 x 10(6) possible protein structure alignments. Here, we provide the resulting protein structure alignments as a public web-based service, named GANGSTA+ Internet Services (GIS). We also allow to browse the ASTRAL40 database of protein structures with GANGSTA+ relative to an externally given protein structure using different constraints to select specific results. GIS allows us to analyze protein structure families according to the SCOP classification scheme. Additionally, users can upload their own protein structures for pairwise protein structure comparison, alignment against all protein structures of the ASTRAL40 database (SCOP version 1.75) or symmetry analysis. GIS is publicly available at http://agknapp.chemie.fu-berlin.de/gplus.

  5. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  6. Effect of processing on structural features of anodic aluminum oxides

    Science.gov (United States)

    Erdogan, Pembe; Birol, Yucel

    2012-09-01

    Morphological features of the anodic aluminum oxide (AAO) templates fabricated by electrochemical oxidation under different processing conditions were investigated. The selection of the polishing parameters does not appear to be critical as long as the aluminum substrate is polished adequately prior to the anodization process. AAO layers with a highly ordered pore distribution are obtained after anodizing in 0.6 M oxalic acid at 20 °C under 40 V for 5 minutes suggesting that the desired pore features are attained once an oxide layer develops on the surface. While the pore features are not affected much, the thickness of the AAO template increases with increasing anodization treatment time. Pore features are better and the AAO growth rate is higher at 20 °C than at 5 °C; higher under 45 V than under 40 V; higher with 0.6 M than with 0.3 M oxalic acid.

  7. Automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

    Science.gov (United States)

    Bordoli, Lorenza; Schwede, Torsten

    2012-01-01

    Comparative protein structure modeling is a computational approach to build three-dimensional structural models for proteins using experimental structures of related protein family members as templates. Regular blind assessments of modeling accuracy have demonstrated that comparative protein structure modeling is currently the most reliable technique to model protein structures. Homology models are often sufficiently accurate to substitute for experimental structures in a wide variety of applications. Since the usefulness of a model for specific application is determined by its accuracy, model quality estimation is an essential component of protein structure prediction. Comparative protein modeling has become a routine approach in many areas of life science research since fully automated modeling systems allow also nonexperts to build reliable models. In this chapter, we describe practical approaches for automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal.

  8. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  9. An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach.

    Directory of Open Access Journals (Sweden)

    Zhila Esna Ashari

    Full Text Available Type IV secretion systems (T4SS are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell's machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of

  10. Alpha complexes in protein structure prediction

    DEFF Research Database (Denmark)

    Winter, Pawel; Fonseca, Rasmus

    2015-01-01

    Reducing the computational effort and increasing the accuracy of potential energy functions is of utmost importance in modeling biological systems, for instance in protein structure prediction, docking or design. Evaluating interactions between nonbonded atoms is the bottleneck of such computations......-complexes from scratch for every configuration encountered during the search for the native structure would make this approach hopelessly slow. However, it is argued that kinetic a-complexes can be used to reduce the computational effort of determining the potential energy when "moving" from one configuration...... to a neighboring one. As a consequence, relatively expensive (initial) construction of an a-complex is expected to be compensated by subsequent fast kinetic updates during the search process. Computational results presented in this paper are limited. However, they suggest that the applicability of a...

  11. Course 12: Proteins: Structural, Thermodynamic and Kinetic Aspects

    Science.gov (United States)

    Finkelstein, A. V.

    1 Introduction 2 Overview of protein architectures and discussion of physical background of their natural selection 2.1 Protein structures 2.2 Physical selection of protein structures 3 Thermodynamic aspects of protein folding 3.1 Reversible denaturation of protein structures 3.2 What do denatured proteins look like? 3.3 Why denaturation of a globular protein is the first-order phase transition 3.4 "Gap" in energy spectrum: The main characteristic that distinguishes protein chains from random polymers 4 Kinetic aspects of protein folding 4.1 Protein folding in vivo 4.2 Protein folding in vitro (in the test-tube) 4.3 Theory of protein folding rates and solution of the Levinthal paradox

  12. Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.

    Science.gov (United States)

    Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong

    2009-07-01

    A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.

  13. Light-harvesting features revealed by the structure of plant Photosystem I

    CERN Document Server

    Ben-Shem, A; Nelson, N; 10.1023/B:PRES.0000036881.23512.42

    2004-01-01

    Oxygenic photosynthesis is driven by two multi-subunit membrane protein complexes, Photosystem I and Photosystem II. In plants and green algae, both complexes are composed of two moieties: a reaction center (RC), where light-induced charge translocation occurs, and a peripheral antenna that absorbs light and funnels its energy to the reaction center. The peripheral antenna of PS I (LHC I) is composed of four gene products (Lhca 1-4) that are unique among the chlorophyll a/b binding proteins in their pronounced long-wavelength absorbance and in their assembly into dimers. The recently determined structure of plant Photosystem I provides the first relatively high- resolution structural model of a super-complex containing a reaction center and its peripheral antenna. We describe some of the structural features responsible for the unique properties of LHC I and discuss the advantages of the particular LHC I dimerization mode over monomeric or trimeric forms. In addition, we delineate some of the interactions betw...

  14. The RCSB protein data bank: integrative view of protein, gene and 3D structural information.

    Science.gov (United States)

    Rose, Peter W; Prlić, Andreas; Altunkaya, Ali; Bi, Chunxiao; Bradley, Anthony R; Christie, Cole H; Costanzo, Luigi Di; Duarte, Jose M; Dutta, Shuchismita; Feng, Zukang; Green, Rachel Kramer; Goodsell, David S; Hudson, Brian; Kalro, Tara; Lowe, Robert; Peisach, Ezra; Randle, Christopher; Rose, Alexander S; Shao, Chenghua; Tao, Yi-Ping; Valasatava, Yana; Voigt, Maria; Westbrook, John D; Woo, Jesse; Yang, Huangwang; Young, Jasmine Y; Zardecki, Christine; Berman, Helen M; Burley, Stephen K

    2017-01-04

    The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Relevance of echo-structure and texture features

    DEFF Research Database (Denmark)

    Karemore, Gopal; Mullick, Jhinuk Basu; KV, Dr. Rajagopal

    2010-01-01

    Aim: Echostructure is an essential parameter for the evaluation of circumscribed lesions and can be described as a texture feature on ultrasound images. Present study evaluates the possibility of distinguishing between benign and malignant breast tumors using various texture features. Materials...... and Methods: 58 cases of breast tumor (29 each from benign and malignant) were documented under standardized conditions using a linear array machine and 7.5 MHz transducer. In each sonographic image, ROI of tumor was marked and then subjected to the evaluation of tumor status using five parameters of second...... performance ROC= 0.78(pbenign and malignant tumors. It also reveals that when evaluating images of a breast tumor...

  16. Structural determination of intact proteins using mass spectrometry

    Science.gov (United States)

    Kruppa, Gary [San Francisco, CA; Schoeniger, Joseph S [Oakland, CA; Young, Malin M [Livermore, CA

    2008-05-06

    The present invention relates to novel methods of determining the sequence and structure of proteins. Specifically, the present invention allows for the analysis of intact proteins within a mass spectrometer. Therefore, preparatory separations need not be performed prior to introducing a protein sample into the mass spectrometer. Also disclosed herein are new instrumental developments for enhancing the signal from the desired modified proteins, methods for producing controlled protein fragments in the mass spectrometer, eliminating complex microseparations, and protein preparatory chemical steps necessary for cross-linking based protein structure determination.Additionally, the preferred method of the present invention involves the determination of protein structures utilizing a top-down analysis of protein structures to search for covalent modifications. In the preferred method, intact proteins are ionized and fragmented within the mass spectrometer.

  17. Protein 8-class secondary structure prediction using conditional neural fields.

    Science.gov (United States)

    Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo

    2011-10-01

    Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. The calcium binding properties and structure prediction of the Hax-1 protein.

    Science.gov (United States)

    Balcerak, Anna; Rowinski, Sebastian; Szafron, Lukasz M; Grzybowska, Ewa A

    2017-01-01

    Hax-1 is a protein involved in regulation of different cellular processes, but its properties and exact mechanisms of action remain unknown. In this work, using purified, recombinant Hax-1 and by applying an in vitro autoradiography assay we have shown that this protein binds Ca 2+ . Additionally, we performed structure prediction analysis which shows that Hax-1 displays definitive structural features, such as two α-helices, short β-strands and four disordered segments.

  19. Features for Exploiting Black-Box Optimization Problem Structure

    DEFF Research Database (Denmark)

    Tierney, Kevin; Malitsky, Yuri; Abell, Tinus

    2013-01-01

    landscape of BBO problems and show how an algorithm portfolio approach can exploit these general, problem indepen- dent features and outperform the utilization of any single minimization search strategy. We test our methodology on data from the GECCO Workshop on BBO Benchmarking 2012, which contains 21...

  20. Cell array-based intracellular localization screening reveals novel functional features of human chromosome 21 proteins

    Directory of Open Access Journals (Sweden)

    Kahlem Pascal

    2006-06-01

    Full Text Available Abstract Background Trisomy of human chromosome 21 (Chr21 results in Down's syndrome, a complex developmental and neurodegenerative disease. Molecular analysis of Down's syndrome, however, poses a particular challenge, because the aneuploid region of Chr21 contains many genes of unknown function. Subcellular localization of human Chr21 proteins may contribute to further understanding of the functions and regulatory mechanisms of the genes that code for these proteins. Following this idea, we used a transfected-cell array technique to perform a rapid and cost-effective analysis of the intracellular distribution of Chr 21 proteins. Results We chose 89 genes that were distributed over the majority of 21q, ranging from RBM11 (14.5 Mb to MCM3AP (46.6 Mb, with part of them expressed aberrantly in the Down's syndrome mouse model. Open reading frames of these genes were cloned into a mammalian expression vector with an amino-terminal His6 tag. All of the constructs were arrayed on glass slides and reverse transfected into HEK293T cells for protein expression. Co-localization detection using a set of organelle markers was carried out for each Chr21 protein. Here, we report the subcellular localization properties of 52 proteins. For 34 of these proteins, their localization is described for the first time. Furthermore, the alteration in cell morphology and growth as a result of protein over-expression for claudin-8 and claudin-14 genes has been characterized. Conclusion The cell array-based protein expression and detection approach is a cost-effective platform for large-scale functional analyses, including protein subcellular localization and cell phenotype screening. The results from this study reveal novel functional features of human Chr21 proteins, which should contribute to further understanding of the molecular pathology of Down's syndrome.

  1. Process Features in Writing: Internal Structure and Incremental Value over Product Features. Research Report. ETS RR-15-27

    Science.gov (United States)

    Zhang, Mo; Deane, Paul

    2015-01-01

    In educational measurement contexts, essays have been evaluated and formative feedback has been given based on the end product. In this study, we used a large sample collected from middle school students in the United States to investigate the factor structure of the writing process features gathered from keystroke logs and the association of that…

  2. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

    Science.gov (United States)

    Zhang, Xue; Acencio, Marcio Luis; Lemke, Ney

    2016-01-01

    Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research. PMID:27014079

  3. Protein structure similarity from principle component correlation analysis

    Directory of Open Access Journals (Sweden)

    Chou James

    2006-01-01

    Full Text Available Abstract Background Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. Results We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. Conclusion The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum

  4. [Structure analysis of disease-related proteins using vibrational spectroscopy].

    Science.gov (United States)

    Hiramatsu, Hirotsugu

    2014-01-01

    Analyses of the structure and properties of identified pathogenic proteins are important for elucidating the molecular basis of diseases and in drug discovery research. Vibrational spectroscopy has advantages over other techniques in terms of sensitivity of detection of structural changes. Spectral analysis, however, is complicated because the spectrum involves a substantial amount of information. This article includes examples of structural analysis of disease-related proteins using vibrational spectroscopy in combination with additional techniques that facilitate data acquisition and analysis. Residue-specific conformation analysis of an amyloid fibril was conducted using IR absorption spectroscopy in combination with (13)C-isotope labeling, linear dichroism measurement, and analysis of amide I band features. We reveal a pH-dependent property of the interacting segment of an amyloidogenic protein, β2-microglobulin, which causes dialysis-related amyloidosis. We also reveal the molecular mechanisms underlying pH-dependent sugar-binding activity of human galectin-1, which is involved in cell adhesion, using spectroscopic techniques including UV resonance Raman spectroscopy. The decreased activity at acidic pH was attributed to a conformational change in the sugar-binding pocket caused by protonation of His52 (pKa 6.3) and the cation-π interaction between Trp68 and the protonated His44 (pKa 5.7). In addition, we show that the peak positions of the Raman bands of the C4=C5 stretching mode at approximately 1600 cm(-1) and the Nπ-C2-Nτ bending mode at approximately 1405 cm(-1) serve as markers of the His side-chain structure. The Raman signal was enhanced 12 fold using a vertical flow apparatus.

  5. Genome, secretome and glucose transport highlight unique features of the protein production host Pichia pastoris

    Directory of Open Access Journals (Sweden)

    Mattanovich Diethard

    2009-06-01

    Full Text Available Abstract Background Pichia pastoris is widely used as a production platform for heterologous proteins and model organism for organelle proliferation. Without a published genome sequence available, strain and process development relied mainly on analogies to other, well studied yeasts like Saccharomyces cerevisiae. Results To investigate specific features of growth and protein secretion, we have sequenced the 9.4 Mb genome of the type strain DSMZ 70382 and analyzed the secretome and the sugar transporters. The computationally predicted secretome consists of 88 ORFs. When grown on glucose, only 20 proteins were actually secreted at detectable levels. These data highlight one major feature of P. pastoris, namely the low contamination of heterologous proteins with host cell protein, when applying glucose based expression systems. Putative sugar transporters were identified and compared to those of related yeast species. The genome comprises 2 homologs to S. cerevisiae low affinity transporters and 2 to high affinity transporters of other Crabtree negative yeasts. Contrary to other yeasts, P. pastoris possesses 4 H+/glycerol transporters. Conclusion This work highlights significant advantages of using the P. pastoris system with glucose based expression and fermentation strategies. As only few proteins and no proteases are actually secreted on glucose, it becomes evident that cell lysis is the relevant cause of proteolytic degradation of secreted proteins. The endowment with hexose transporters, dominantly of the high affinity type, limits glucose uptake rates and thus overflow metabolism as observed in S. cerevisiae. The presence of 4 genes for glycerol transporters explains the high specific growth rates on this substrate and underlines the suitability of a glycerol/glucose based fermentation strategy. Furthermore, we present an open access web based genome browser http://www.pichiagenome.org.

  6. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  7. The structure of a cholesterol-trapping protein

    Science.gov (United States)

    cholesterol-trapping protein Contact: Dan Krotz, dakrotz@lbl.gov Berkeley Lab Science Beat Lab website index Institute researchers determined the three-dimensional structure of a protein that controls cholesterol level in the bloodstream. Knowing the structure of the protein, a cellular receptor that ensnares

  8. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures.

    Science.gov (United States)

    Kinjo, Akira R; Nakamura, Haruki

    2013-01-01

    Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.

  9. Learning about the internal structure of categories through classification and feature inference.

    Science.gov (United States)

    Jee, Benjamin D; Wiley, Jennifer

    2014-01-01

    Previous research on category learning has found that classification tasks produce representations that are skewed toward diagnostic feature dimensions, whereas feature inference tasks lead to richer representations of within-category structure. Yet, prior studies often measure category knowledge through tasks that involve identifying only the typical features of a category. This neglects an important aspect of a category's internal structure: how typical and atypical features are distributed within a category. The present experiments tested the hypothesis that inference learning results in richer knowledge of internal category structure than classification learning. We introduced several new measures to probe learners' representations of within-category structure. Experiment 1 found that participants in the inference condition learned and used a wider range of feature dimensions than classification learners. Classification learners, however, were more sensitive to the presence of atypical features within categories. Experiment 2 provided converging evidence that classification learners were more likely to incorporate atypical features into their representations. Inference learners were less likely to encode atypical category features, even in a "partial inference" condition that focused learners' attention on the feature dimensions relevant to classification. Overall, these results are contrary to the hypothesis that inference learning produces superior knowledge of within-category structure. Although inference learning promoted representations that included a broad range of category-typical features, classification learning promoted greater sensitivity to the distribution of typical and atypical features within categories.

  10. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment

  11. Structural Features and Healthy Properties of Polysaccharides Occurring in Mushrooms

    Directory of Open Access Journals (Sweden)

    Eva Guillamón

    2012-12-01

    Full Text Available Polysaccharides from mushrooms have attracted a great deal of attention due to the many healthy benefits they have demonstrated, such as immunomodulation, anticancer activity, prevention and treatment of cardiovascular diseases, antiviral and antimicrobial effects, among others. Isolation and purification of polysaccharides commonly involve several steps, and different techniques are actually available in order to increase extraction yield and purity. Studies have demonstrated that the molecular structure and arrangement significantly influence the biological activity; therefore, there is a wide range of analytical techniques for the elucidation of chemical structures. Different polysaccharides have been isolated from mushrooms, most of them consisting of β-linked glucans, such as lentinan from Lentinus edodes, pleuran from Pleurotus species, schizophyllan from Schizophyllum commune, calocyban from Calocybe indica, or ganoderan and ganopoly from Ganoderma lucidum. This article reviews the main methods of polysaccharide isolation and structural characterization, as well as some of the most important polysaccharides isolated from mushrooms and the healthy benefits they provide.

  12. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-01-01

    operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching

  13. An estimated 5% of new protein structures solved today represent a new Pfam family

    International Nuclear Information System (INIS)

    Mistry, Jaina; Kloppmann, Edda; Rost, Burkhard; Punta, Marco

    2013-01-01

    This study uses the Pfam database to show that the sequence redundancy of protein structures deposited in the PDB is increasing. The possible reasons behind this trend are discussed. High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquired their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed

  14. K-nearest uphill clustering in the protein structure space

    KAUST Repository

    Cui, Xuefeng

    2016-08-26

    The protein structure classification problem, which is to assign a protein structure to a cluster of similar proteins, is one of the most fundamental problems in the construction and application of the protein structure space. Early manually curated protein structure classifications (e.g., SCOP and CATH) are very successful, but recently suffer the slow updating problem because of the increased throughput of newly solved protein structures. Thus, fully automatic methods to cluster proteins in the protein structure space have been designed and developed. In this study, we observed that the SCOP superfamilies are highly consistent with clustering trees representing hierarchical clustering procedures, but the tree cutting is very challenging and becomes the bottleneck of clustering accuracy. To overcome this challenge, we proposed a novel density-based K-nearest uphill clustering method that effectively eliminates noisy pairwise protein structure similarities and identifies density peaks as cluster centers. Specifically, the density peaks are identified based on K-nearest uphills (i.e., proteins with higher densities) and K-nearest neighbors. To our knowledge, this is the first attempt to apply and develop density-based clustering methods in the protein structure space. Our results show that our density-based clustering method outperforms the state-of-the-art clustering methods previously applied to the problem. Moreover, we observed that computational methods and human experts could produce highly similar clusters at high precision values, while computational methods also suggest to split some large superfamilies into smaller clusters. © 2016 Elsevier B.V.

  15. Evaluation of feature detection algorithms for structure from motion

    CSIR Research Space (South Africa)

    Govender, N

    2009-11-01

    Full Text Available technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence, April 1981. [17] C.Tomasi and T.Kanade, “Detection and tracking of point fetaures,” Carnegie Mellon, Tech. Rep., April 1991. [18] P. Torr... Algorithms for Structure from Motion Natasha Govender Mobile Intelligent Autonomous Systems CSIR Pretoria Email: ngovender@csir.co.za Abstract—Structure from motion is a widely-used technique in computer vision to perform 3D reconstruction. The 3D...

  16. Structure-function correlations of pulmonary surfactant protein SP-B and the saposin-like family of proteins.

    Science.gov (United States)

    Olmeda, Bárbara; García-Álvarez, Begoña; Pérez-Gil, Jesús

    2013-03-01

    Pulmonary surfactant is a lipid-protein complex secreted by the respiratory epithelium of mammalian lungs, which plays an essential role in stabilising the alveolar surface and so reducing the work of breathing. The surfactant protein SP-B is part of this complex, and is strictly required for the assembly of pulmonary surfactant and its extracellular development to form stable surface-active films at the air-liquid alveolar interface, making the lack of SP-B incompatible with life. In spite of its physiological importance, a model for the structure and the mechanism of action of SP-B is still needed. The sequence of SP-B is homologous to that of the saposin-like family of proteins, which are membrane-interacting polypeptides with apparently diverging activities, from the co-lipase action of saposins to facilitate the degradation of sphingolipids in the lysosomes to the cytolytic actions of some antibiotic proteins, such as NK-lysin and granulysin or the amoebapore of Entamoeba histolytica. Numerous studies on the interactions of these proteins with membranes have still not explained how a similar sequence and a potentially related fold can sustain such apparently different activities. In the present review, we have summarised the most relevant features of the structure, lipid-protein and protein-protein interactions of SP-B and the saposin-like family of proteins, as a basis to propose an integrated model and a common mechanistic framework of the apparent functional versatility of the saposin fold.

  17. Features analysis for identification of date and party hubs in protein interaction network of Saccharomyces Cerevisiae

    Directory of Open Access Journals (Sweden)

    Araabi Babak N

    2010-12-01

    Full Text Available Abstract Background It has been understood that biological networks have modular organizations which are the sources of their observed complexity. Analysis of networks and motifs has shown that two types of hubs, party hubs and date hubs, are responsible for this complexity. Party hubs are local coordinators because of their high co-expressions with their partners, whereas date hubs display low co-expressions and are assumed as global connectors. However there is no mutual agreement on these concepts in related literature with different studies reporting their results on different data sets. We investigated whether there is a relation between the biological features of Saccharomyces Cerevisiae's proteins and their roles as non-hubs, intermediately connected, party hubs, and date hubs. We propose a classifier that separates these four classes. Results We extracted different biological characteristics including amino acid sequences, domain contents, repeated domains, functional categories, biological processes, cellular compartments, disordered regions, and position specific scoring matrix from various sources. Several classifiers are examined and the best feature-sets based on average correct classification rate and correlation coefficients of the results are selected. We show that fusion of five feature-sets including domains, Position Specific Scoring Matrix-400, cellular compartments level one, and composition pairs with two and one gaps provide the best discrimination with an average correct classification rate of 77%. Conclusions We study a variety of known biological feature-sets of the proteins and show that there is a relation between domains, Position Specific Scoring Matrix-400, cellular compartments level one, composition pairs with two and one gaps of Saccharomyces Cerevisiae's proteins, and their roles in the protein interaction network as non-hubs, intermediately connected, party hubs and date hubs. This study also confirms the

  18. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    Science.gov (United States)

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  19. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  20. Ultrasonic and structural features of some borosilicate glasses ...

    Indian Academy of Sciences (India)

    Therefore, the glass structure becomes contractedand compacted, which decreases its molar volume and increases its rigidity. This concept was asserted from the increase in the ultrasonic velocity, Debye temperature and elastic moduli with the increase of SiO2 content. The present compositional dependence of the elastic ...

  1. Unusual Features of Crystal Structures of Some Simple Copper Compounds

    Science.gov (United States)

    Douglas, Bodie

    2009-01-01

    Some simple copper compounds have unusual crystal structures. Cu[subscript 3]N is cubic with N atoms at centers of octahedra formed by 6 Cu atoms. Cu[subscript 2]O (cuprite) is also cubic; O atoms are in tetrahedra formed by 4 Cu atoms. These tetrahedra are linked by sharing vertices forming two independent networks without linkages between them.…

  2. Band structure features of nonlinear optical yttrium aluminium borate crystal

    Czech Academy of Sciences Publication Activity Database

    Reshak, Ali H; Auluck, S.; Majchrowski, A.; Kityk, I. V.

    2008-01-01

    Roč. 10, č. 10 (2008), s. 1445-1448 ISSN 1293-2558 Institutional research plan: CEZ:AV0Z60870520 Keywords : Electronic structure * DFF * FPLAPW * LDA Subject RIV: BO - Biophysics Impact factor: 1.742, year: 2008

  3. Improving the chances of successful protein structure determination with a random forest classifier

    Energy Technology Data Exchange (ETDEWEB)

    Jahandideh, Samad [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); Jaroszewski, Lukasz; Godzik, Adam, E-mail: adam@burnham.org [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); University of California, San Diego, La Jolla, California (United States)

    2014-03-01

    Using an extended set of protein features calculated separately for protein surface and interior, a new version of XtalPred based on a random forest classifier achieves a significant improvement in predicting the success of structure determination from the primary amino-acid sequence. Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007 ▶), Protein Sci.16, 2472–2482] was developed. XtalPred classifies proteins into five ‘crystallization classes’ based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted

  4. Using an alignment of fragment strings for comparing protein structures

    DEFF Research Database (Denmark)

    Friedberg, Iddo; Harder, Tim; Kolodny, Rachel

    2007-01-01

    . RESULTS: Here we describe the use of a particular structure fragment library, denoted here as KL-strings, for the 1D representation of protein structure. Using KL-strings, we develop an infrastructure for comparing protein structures with a 1D representation. This study focuses on the added value gained...

  5. Rheology and structure of milk protein gels

    NARCIS (Netherlands)

    Vliet, van T.; Lakemond, C.M.M.; Visschers, R.W.

    2004-01-01

    Recent studies on gel formation and rheology of milk gels are reviewed. A distinction is made between gels formed by aggregated casein, gels of `pure` whey proteins and gels in which both casein and whey proteins contribute to their properties. For casein' whey protein mixtures, it has been shown

  6. A hidden markov model derived structural alphabet for proteins.

    Science.gov (United States)

    Camproux, A C; Gautier, R; Tufféry, P

    2004-06-04

    Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.

  7. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  8. Concordance of visual and structural features between siblings with albinism.

    Science.gov (United States)

    Heinmiller, Laura J; Holleschau, Ann; Summers, C Gail

    2016-02-01

    To evaluate similarities and differences in visual function and ocular structure between siblings with albinism. The medical records of all siblings diagnosed with albinism were retrospectively reviewed. Comparisons were made using examination at oldest age for younger sibling and examination closest to that age for older siblings. A total of 111 patients from 54 families were studied. Mean age was 12.9 years (range, 2 months to 44.2 years). Mean difference in ages between sibling pair examinations was 11.5 months (range, 0-87 months). Of 45 families, best-corrected visual acuity was equal in 9 (20%), within 1/2 octave in 9 (20%), >1/2 but albinism should be counseled with due caution because visual function is often disparate despite similar structural findings. Copyright © 2016 American Association for Pediatric Ophthalmology and Strabismus. Published by Elsevier Inc. All rights reserved.

  9. Structural features in icosahedral Al63Cu25Fe12

    International Nuclear Information System (INIS)

    Howell, R.H.; Solal, F.; Turchi, P.E.A.; Berger, C.; Calvayrac, Y.

    1991-01-01

    Since the discovery of a quasicrystalline phase in Al-Mn alloys a substantial amount of work has been done to understand the structural and physical properties of this new class of materials. More recently the discovery of a thermodynamically stable icosahedral phase in AlCuFe presents the opportunity to study pure quasicrystalline phases of high structural quality by eliminating known defects, especially phason disorder by conventional heat treatment. In particular it was shown that annealing treatments of as quenched samples resulted in a dramatic reduction in the width of the diffraction peaks associated with the elimination of as quenched defects, present in other quasicrystals. Positron annihilation lifetime measurements have a high sensitivity to intrinsic defects and positron annihilation radiation angular correlation measurements are well suited to measurements of electronic structure in systems where the defect effects do not dominate. We have measured positron annihilation lifetime and angular correlations on quasicrystalline samples of Al 63 Cu 25 Fe 12 in the pure icosahedral phase

  10. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    Science.gov (United States)

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  11. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

    2011-06-20

    One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  12. The adsorption features between insecticidal crystal protein and nano-Mg(OH)2.

    Science.gov (United States)

    Pan, Xiaohong; Xu, Zhangyan; Zheng, Yilin; Huang, Tengzhou; Li, Lan; Chen, Zhi; Rao, Wenhua; Chen, Saili; Hong, Xianxian; Guan, Xiong

    2017-12-01

    Nano-Mg(OH) 2 , with low biological toxicity, is an ideal nano-carrier for insecticidal protein to improve the bioactivity. In this work, the adsorption features of insecticidal protein by nano-Mg(OH) 2 have been studied. The adsorption capacity could reach as high as 136 mg g -1 , and the adsorption isotherm had been fitted with Langmuir and Freundlich models. Moreover, the adsorption kinetics followed a pseudo-first or -second order rate model, and the adsorption was spontaneous and an exothermic process. However, high temperatures are not suitable for adsorption, which implies that the temperature would be a critical factor during the adsorption process. In addition, FT-IR confirmed that the protein was adsorbed on the nano-Mg(OH) 2 , zeta potential analysis suggested that insecticidal protein was loaded onto the nano-Mg(OH) 2 not by electrostatic adsorption but maybe by intermolecular forces, and circular dichroism spectroscopy of Cry11Aa protein before and after loading with nano-Mg(OH) 2 was changed. The study applied the adsorption information between Cry11Aa and nano-Mg(OH) 2 , which would be useful in the practical application of nano-Mg(OH) 2 as a nano-carrier.

  13. Unveiling network-based functional features through integration of gene expression into protein networks.

    Science.gov (United States)

    Jalili, Mahdi; Gebhardt, Tom; Wolkenhauer, Olaf; Salehzadeh-Yazdi, Ali

    2018-06-01

    Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Compare local pocket and global protein structure models by small structure patterns

    KAUST Repository

    Cui, Xuefeng; Kuwahara, Hiroyuki; Li, Shuai Cheng; Gao, Xin

    2015-01-01

    Researchers proposed several criteria to assess the quality of predicted protein structures because it is one of the essential tasks in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competitions. Popular criteria

  15. Structural–Functional Features of the Thyrotropin Receptor: A Class A G-Protein-Coupled Receptor at Work

    Directory of Open Access Journals (Sweden)

    Gerd Krause

    2017-04-01

    Full Text Available The thyroid-stimulating hormone receptor (TSHR is a member of the glycoprotein hormone receptors, a sub-group of class A G-protein-coupled receptors (GPCRs. TSHR and its endogenous ligand thyrotropin (TSH are of essential importance for growth and function of the thyroid gland and proper function of the TSH/TSHR system is pivotal for production and release of thyroid hormones. This receptor is also important with respect to pathophysiology, such as autoimmune (including ophthalmopathy or non-autoimmune thyroid dysfunctions and cancer development. Pharmacological interventions directly targeting the TSHR should provide benefits to disease treatment compared to currently available therapies of dysfunctions associated with the TSHR or the thyroid gland. Upon TSHR activation, the molecular events conveying conformational changes from the extra- to the intracellular side of the cell across the membrane comprise reception, conversion, and amplification of the signal. These steps are highly dependent on structural features of this receptor and its intermolecular interaction partners, e.g., TSH, antibodies, small molecules, G-proteins, or arrestin. For better understanding of signal transduction, pathogenic mechanisms such as autoantibody action and mutational modifications or for developing new pharmacological strategies, it is essential to combine available structural data with functional information to generate homology models of the entire receptor. Although so far these insights are fragmental, in the past few decades essential contributions have been made to investigate in-depth the involved determinants, such as by structure determination via X-ray crystallography. This review summarizes available knowledge (as of December 2016 concerning the TSHR protein structure, associated functional aspects, and based on these insights we suggest several receptor complex models. Moreover, distinct TSHR properties will be highlighted in comparison to other

  16. Structural–Functional Features of the Thyrotropin Receptor: A Class A G-Protein-Coupled Receptor at Work

    Science.gov (United States)

    Kleinau, Gunnar; Worth, Catherine L.; Kreuchwig, Annika; Biebermann, Heike; Marcinkowski, Patrick; Scheerer, Patrick; Krause, Gerd

    2017-01-01

    The thyroid-stimulating hormone receptor (TSHR) is a member of the glycoprotein hormone receptors, a sub-group of class A G-protein-coupled receptors (GPCRs). TSHR and its endogenous ligand thyrotropin (TSH) are of essential importance for growth and function of the thyroid gland and proper function of the TSH/TSHR system is pivotal for production and release of thyroid hormones. This receptor is also important with respect to pathophysiology, such as autoimmune (including ophthalmopathy) or non-autoimmune thyroid dysfunctions and cancer development. Pharmacological interventions directly targeting the TSHR should provide benefits to disease treatment compared to currently available therapies of dysfunctions associated with the TSHR or the thyroid gland. Upon TSHR activation, the molecular events conveying conformational changes from the extra- to the intracellular side of the cell across the membrane comprise reception, conversion, and amplification of the signal. These steps are highly dependent on structural features of this receptor and its intermolecular interaction partners, e.g., TSH, antibodies, small molecules, G-proteins, or arrestin. For better understanding of signal transduction, pathogenic mechanisms such as autoantibody action and mutational modifications or for developing new pharmacological strategies, it is essential to combine available structural data with functional information to generate homology models of the entire receptor. Although so far these insights are fragmental, in the past few decades essential contributions have been made to investigate in-depth the involved determinants, such as by structure determination via X-ray crystallography. This review summarizes available knowledge (as of December 2016) concerning the TSHR protein structure, associated functional aspects, and based on these insights we suggest several receptor complex models. Moreover, distinct TSHR properties will be highlighted in comparison to other class A GPCRs to

  17. PDB2CD visualises dynamics within protein structures.

    Science.gov (United States)

    Janes, Robert W

    2017-10-01

    Proteins tend to have defined conformations, a key factor in enabling their function. Atomic resolution structures of proteins are predominantly obtained by either solution nuclear magnetic resonance (NMR) or crystal structure methods. However, when considering a protein whose structure has been determined by both these approaches, on many occasions, the resultant conformations are subtly different, as illustrated by the examples in this study. The solution NMR approach invariably results in a cluster of structures whose conformations satisfy the distance boundaries imposed by the data collected; it might be argued that this is evidence of the dynamics of proteins when in solution. In crystal structures, the proteins are often in an energy minimum state which can result in an increase in the extent of regular secondary structure present relative to the solution state depicted by NMR, because the more dynamic ends of alpha helices and beta strands can become ordered at the lower temperatures. This study examines a novel way to display the differences in conformations within an NMR ensemble and between these and a crystal structure of a protein. Circular dichroism (CD) spectroscopy can be used to characterise protein structures in solution. Using the new bioinformatics tool, PDB2CD, which generates CD spectra from atomic resolution protein structures, the differences between, and possible dynamic range of, conformations adopted by a protein can be visualised.

  18. Molecular structures and metabolic characteristics of protein in brown and yellow flaxseed with altered nutrient traits.

    Science.gov (United States)

    Khan, Nazir Ahmad; Booker, Helen; Yu, Peiqiang

    2014-07-16

    The objectives of this study were to investigate the chemical profiles; crude protein (CP) subfractions; ruminal CP degradation characteristics and intestinal digestibility of rumen undegraded protein (RUP); and protein molecular structures using molecular spectroscopy of newly developed yellow-seeded flax (Linum usitatissimum L.). Seeds from two yellow flaxseed breeding lines and two brown flaxseed varieties were evaluated. The yellow-seeded lines had higher (P RUP (29.2 vs 35.1% CP) than that in the brown-seeded varieties. However, the total supply of digestible RUP was not significantly different between the two seed types. Regression equations based on protein molecular structural features gave relatively good estimation for the contents of CP (R(2) = 0.87), soluble CP (R(2) = 0.92), RUP (R(2) = 0.97), and intestinal digestibility of RUP (R(2) = 0.71). In conclusion, molecular spectroscopy can be used to rapidly characterize feed protein molecular structures and predict their nutritive value.

  19. Features of structural response of mechanically loaded crystallites to irradiation

    Energy Technology Data Exchange (ETDEWEB)

    Korchuganov, Aleksandr V., E-mail: avkor@ispms.ru [Institute of Strength Physics and Materials Science SB RAS, Tomsk, 634055 (Russian Federation); National Research Tomsk State University, Tomsk, 634050 (Russian Federation)

    2015-10-27

    A molecular dynamics method is employed to investigate the origin and evolution of plastic deformation in elastically deformed iron and vanadium crystallites due to atomic displacement cascades. Elastic stress states of crystallites result from different degrees of specimen deformation. Crystallites are deformed under constant-volume conditions. Atomic displacement cascades with the primary knock-on atom energy up to 50 keV are generated in loaded specimens. It is shown that irradiation may cause not only the Frenkel pair formation but also large-scale structural rearrangements outside the irradiated area, which prove to be similar to rearrangements proceeding by the twinning mechanism in mechanically loaded specimens.

  20. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    Science.gov (United States)

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  1. Structure of an essential bacterial protein YeaZ (TM0874) from Thermotoga maritima at 2.5 Å resolution

    International Nuclear Information System (INIS)

    Xu, Qingping; McMullan, Daniel; Jaroszewski, Lukasz; Krishna, S. Sri; Elsliger, Marc-André; Yeh, Andrew P.; Abdubek, Polat; Astakhova, Tamara; Axelrod, Herbert L.; Carlton, Dennis; Chiu, Hsiu-Ju; Clayton, Thomas; Duan, Lian; Feuerhelm, Julie; Grant, Joanna; Han, Gye Won; Jin, Kevin K.; Klock, Heath E.; Knuth, Mark W.; Miller, Mitchell D.; Morse, Andrew T.; Nigoghossian, Edward; Okach, Linda; Oommachen, Silvya; Paulsen, Jessica; Reyes, Ron; Rife, Christopher L.; Bedem, Henry van den; Hodgson, Keith O.; Wooley, John; Deacon, Ashley M.; Godzik, Adam; Lesley, Scott A.; Wilson, Ian A.

    2009-01-01

    The crystal structure of an essential bacterial protein, YeaZ, from T. maritima identifies an interface that potentially mediates protein–protein interaction. YeaZ is involved in a protein network that is essential for bacteria. The crystal structure of YeaZ from Thermotoga maritima was determined to 2.5 Å resolution. Although this protein belongs to a family of ancient actin-like ATPases, it appears that it has lost the ability to bind ATP since it lacks some key structural features that are important for interaction with ATP. A conserved surface was identified, supporting its role in the formation of protein complexes

  2. Dengue-2 Structural Proteins Associate with Human Proteins to Produce a Coagulation and Innate Immune Response Biased Interactome

    Directory of Open Access Journals (Sweden)

    Soares Luis RB

    2011-01-01

    Full Text Available Abstract Background Dengue virus infection is a public health threat to hundreds of millions of individuals in the tropical regions of the globe. Although Dengue infection usually manifests itself in its mildest, though often debilitating clinical form, dengue fever, life-threatening complications commonly arise in the form of hemorrhagic shock and encephalitis. The etiological basis for the virus-induced pathology in general, and the different clinical manifestations in particular, are not well understood. We reasoned that a detailed knowledge of the global biological processes affected by virus entry into a cell might help shed new light on this long-standing problem. Methods A bacterial two-hybrid screen using DENV2 structural proteins as bait was performed, and the results were used to feed a manually curated, global dengue-human protein interaction network. Gene ontology and pathway enrichment, along with network topology and microarray meta-analysis, were used to generate hypothesis regarding dengue disease biology. Results Combining bioinformatic tools with two-hybrid technology, we screened human cDNA libraries to catalogue proteins physically interacting with the DENV2 virus structural proteins, Env, cap and PrM. We identified 31 interacting human proteins representing distinct biological processes that are closely related to the major clinical diagnostic feature of dengue infection: haemostatic imbalance. In addition, we found dengue-binding human proteins involved with additional key aspects, previously described as fundamental for virus entry into cells and the innate immune response to infection. Construction of a DENV2-human global protein interaction network revealed interesting biological properties suggested by simple network topology analysis. Conclusions Our experimental strategy revealed that dengue structural proteins interact with human protein targets involved in the maintenance of blood coagulation and innate anti

  3. Relation between native ensembles and experimental structures of proteins

    DEFF Research Database (Denmark)

    Best, R. B.; Lindorff-Larsen, Kresten; DePristo, M. A.

    2006-01-01

    Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of "high-sequence similarity Protein Data Bank" (HSP) structures and consider the extent to which such ensembles represent the structural...... Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest...... heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein...

  4. Structure, hardness and fracture features of nanostructural materials

    International Nuclear Information System (INIS)

    Noskova, N.I.; Korznikov, A.V.; Idrisova, S.R.

    2000-01-01

    A study is made into nanocrystalline metals Cu and Mo, nanocrystalline intermetallic compound Ni 3 Al produced using severe plastic deformation; nanophase alloys Fe 73.5 Cu 1 Nb 3 Si 1.35 B 9 and Pd 81 Cu 7 Si 12 produced by crystallization from amorphous state as well as nanophase materials TiN and Al 2 O 3 produced by nano powder compacting in the temperature range of 273-573 K. Methods of transmission and scanning electron microscopy, X-ray diffraction analysis, mechanical testing and microhardness measurement are applied to study structure, internal elastic stress, phase composition, hardness, strength and plastic properties, surface fracture mode of nanostructural materials [ru

  5. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation.

    Science.gov (United States)

    Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng

    2009-04-21

    In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.

  6. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    Science.gov (United States)

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  7. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    Science.gov (United States)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  8. A systematic identification of species-specific protein succinylation sites using joint element features information

    Directory of Open Access Journals (Sweden)

    Hasan MM

    2017-08-01

    Full Text Available Md Mehedi Hasan,1 Mst Shamima Khatun,2 Md Nurul Haque Mollah,2 Cao Yong,3 Dianjing Guo1 1School of Life Sciences and the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territory, Hong Kong, People’s Republic of China; 2Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh; 3Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, People’s Republic of China Abstract: Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly

  9. Some features of borophosphatic catalysts structure with silicate bond

    International Nuclear Information System (INIS)

    Kubasov, A.A.; Kitaev, L.E.; Topchieva, K.V.; Gonchakova, N.N.

    1979-01-01

    The structure of borophosphatic catalysts is studied using the method of IR-spectroscopy. Silica gel and diatomite brick were used as a binding (carriers). To clarify the character of the carrier effect on borophosphate structure obtained were boric and phosphoric acid spectra, covered in the quantity of 10 weight % on SiO 2 , and also industrial catalyst H 3 PO 4 /SiO 2 of hydratation with higher P 2 O 5 content. At calcination of sample 10% H 3 BO 3 /SiO 2 in vacuum 932 cm -1 strip intensity increased with the temperature rise and that can be referred to B-O-Si vibrations. In the area of fundamental vibrations P-O and Si-O spectrum of the 10% H 3 PO 4 /SiO 2 sample, subjected to heating up to 600 deg C in the air, differed but slightly from the initial SiO 2 spectrum. In the spectrum of the sample with higher P 2 O 5 content after thermovapor treatment at 300 deg C in the frequency range of 500-800 cm -1 strips were detected, which testified to the phosphoric acid interaction with silica gel. The state of adsorbed water can be judged by the change of 1630 cm -1 strip optical density in the course of step thermovacuum borophosphate treatment. It was found that water was removed from the sample surface in the range of 200-300 deg C. Thus, at borophosphate catalysts calcination which contains SiO 2 , interaction took place between borophosphate and bind components accompanied by B-O-Si and P-O-Si bonds formation. Water removal from these catalyst surfaces took place at lower temperature as compared to individual borophosphate, which testified to certain release of electron acceptor properties as a result of the bind component introduction. Thus, the bind component introduction not only increases mechanical strength and hydrolytic stability of borophosphates but results in their surface modification

  10. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  11. Deamidation of asparagine and glutamine residues in proteins and peptides: structural determinants and analytical methodology

    NARCIS (Netherlands)

    Bischoff, Rainer; Kolbe, H.V.

    1994-01-01

    Non-enzymatic deamidation of asparagine and glutamine residues in proteins and peptides are reviewed by first outlining the well-described reaction mechanism involving cyclic imide intermediates, followed by a discussion of structural features which influence the reaction rate. The second and major

  12. Current strategies for protein production and purification enabling membrane protein structural biology.

    Science.gov (United States)

    Pandey, Aditya; Shin, Kyungsoo; Patterson, Robin E; Liu, Xiang-Qin; Rainey, Jan K

    2016-12-01

    Membrane proteins are still heavily under-represented in the protein data bank (PDB), owing to multiple bottlenecks. The typical low abundance of membrane proteins in their natural hosts makes it necessary to overexpress these proteins either in heterologous systems or through in vitro translation/cell-free expression. Heterologous expression of proteins, in turn, leads to multiple obstacles, owing to the unpredictability of compatibility of the target protein for expression in a given host. The highly hydrophobic and (or) amphipathic nature of membrane proteins also leads to challenges in producing a homogeneous, stable, and pure sample for structural studies. Circumventing these hurdles has become possible through the introduction of novel protein production protocols; efficient protein isolation and sample preparation methods; and, improvement in hardware and software for structural characterization. Combined, these advances have made the past 10-15 years very exciting and eventful for the field of membrane protein structural biology, with an exponential growth in the number of solved membrane protein structures. In this review, we focus on both the advances and diversity of protein production and purification methods that have allowed this growth in structural knowledge of membrane proteins through X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM).

  13. Structural and functional features of central nervous system lymphatic vessels.

    Science.gov (United States)

    Louveau, Antoine; Smirnov, Igor; Keyes, Timothy J; Eccles, Jacob D; Rouhani, Sherin J; Peske, J David; Derecki, Noel C; Castle, David; Mandell, James W; Lee, Kevin S; Harris, Tajie H; Kipnis, Jonathan

    2015-07-16

    One of the characteristics of the central nervous system is the lack of a classical lymphatic drainage system. Although it is now accepted that the central nervous system undergoes constant immune surveillance that takes place within the meningeal compartment, the mechanisms governing the entrance and exit of immune cells from the central nervous system remain poorly understood. In searching for T-cell gateways into and out of the meninges, we discovered functional lymphatic vessels lining the dural sinuses. These structures express all of the molecular hallmarks of lymphatic endothelial cells, are able to carry both fluid and immune cells from the cerebrospinal fluid, and are connected to the deep cervical lymph nodes. The unique location of these vessels may have impeded their discovery to date, thereby contributing to the long-held concept of the absence of lymphatic vasculature in the central nervous system. The discovery of the central nervous system lymphatic system may call for a reassessment of basic assumptions in neuroimmunology and sheds new light on the aetiology of neuroinflammatory and neurodegenerative diseases associated with immune system dysfunction.

  14. Predicting nucleic acid binding interfaces from structural models of proteins.

    Science.gov (United States)

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  15. Association between C-reactive protein and features of the metabolic syndrome

    DEFF Research Database (Denmark)

    Fröhlich, M; Imhof, A; Berg, Gabriele

    2000-01-01

    OBJECTIVE: To assess the association of circulating levels of C-reactive protein, a sensitive systemic marker of inflammation, with different components of the metabolic syndrome. RESEARCH DESIGN AND METHODS: Total cholesterol (TC), HDL cholesterol, triglycerides, uric acid, BMI , and prevalence...... concentrations in subjects grouped according to the presence of 0-1, 2-3, and > or =4 features of the metabolic syndrome were 1.11, 1.27, and 2.16 mg/l, respectively, with a statistically highly significant trend (P metabolic syndrome...

  16. Ion pairs in non-redundant protein structures

    Indian Academy of Sciences (India)

    Ion pairs contribute to several functions including the activity of catalytic triads, fusion of viral membranes, stability in thermophilic proteins and solvent–protein interactions. Furthermore, they have the ability to affect the stability of protein structures and are also a part of the forces that act to hold monomers together.

  17. Comparative sequence and structural analyses of G-protein-coupled receptor crystal structures and implications for molecular models.

    Directory of Open Access Journals (Sweden)

    Catherine L Worth

    Full Text Available BACKGROUND: Up until recently the only available experimental (high resolution structure of a G-protein-coupled receptor (GPCR was that of bovine rhodopsin. In the past few years the determination of GPCR structures has accelerated with three new receptors, as well as squid rhodopsin, being successfully crystallized. All share a common molecular architecture of seven transmembrane helices and can therefore serve as templates for building molecular models of homologous GPCRs. However, despite the common general architecture of these structures key differences do exist between them. The choice of which experimental GPCR structure(s to use for building a comparative model of a particular GPCR is unclear and without detailed structural and sequence analyses, could be arbitrary. The aim of this study is therefore to perform a systematic and detailed analysis of sequence-structure relationships of known GPCR structures. METHODOLOGY: We analyzed in detail conserved and unique sequence motifs and structural features in experimentally-determined GPCR structures. Deeper insight into specific and important structural features of GPCRs as well as valuable information for template selection has been gained. Using key features a workflow has been formulated for identifying the most appropriate template(s for building homology models of GPCRs of unknown structure. This workflow was applied to a set of 14 human family A GPCRs suggesting for each the most appropriate template(s for building a comparative molecular model. CONCLUSIONS: The available crystal structures represent only a subset of all possible structural variation in family A GPCRs. Some GPCRs have structural features that are distributed over different crystal structures or which are not present in the templates suggesting that homology models should be built using multiple templates. This study provides a systematic analysis of GPCR crystal structures and a consistent method for identifying

  18. Comparative sequence and structural analyses of G-protein-coupled receptor crystal structures and implications for molecular models.

    Science.gov (United States)

    Worth, Catherine L; Kleinau, Gunnar; Krause, Gerd

    2009-09-16

    Up until recently the only available experimental (high resolution) structure of a G-protein-coupled receptor (GPCR) was that of bovine rhodopsin. In the past few years the determination of GPCR structures has accelerated with three new receptors, as well as squid rhodopsin, being successfully crystallized. All share a common molecular architecture of seven transmembrane helices and can therefore serve as templates for building molecular models of homologous GPCRs. However, despite the common general architecture of these structures key differences do exist between them. The choice of which experimental GPCR structure(s) to use for building a comparative model of a particular GPCR is unclear and without detailed structural and sequence analyses, could be arbitrary. The aim of this study is therefore to perform a systematic and detailed analysis of sequence-structure relationships of known GPCR structures. We analyzed in detail conserved and unique sequence motifs and structural features in experimentally-determined GPCR structures. Deeper insight into specific and important structural features of GPCRs as well as valuable information for template selection has been gained. Using key features a workflow has been formulated for identifying the most appropriate template(s) for building homology models of GPCRs of unknown structure. This workflow was applied to a set of 14 human family A GPCRs suggesting for each the most appropriate template(s) for building a comparative molecular model. The available crystal structures represent only a subset of all possible structural variation in family A GPCRs. Some GPCRs have structural features that are distributed over different crystal structures or which are not present in the templates suggesting that homology models should be built using multiple templates. This study provides a systematic analysis of GPCR crystal structures and a consistent method for identifying suitable templates for GPCR homology modelling that will

  19. The structure and function of endophilin proteins

    DEFF Research Database (Denmark)

    Kjaerulff, Ole; Brodin, Lennart; Jung, Anita

    2011-01-01

    Members of the BAR domain protein superfamily are essential elements of cellular traffic. Endophilins are among the best studied BAR domain proteins. They have a prominent function in synaptic vesicle endocytosis (SVE), receptor trafficking and apoptosis, and in other processes that require...

  20. Identification of Protein Pupylation Sites Using Bi-Profile Bayes Feature Extraction and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Xiaowei Zhao

    2013-01-01

    Full Text Available Pupylation, one of the most important posttranslational modifications of proteins, typically takes place when prokaryotic ubiquitin-like protein (Pup is attached to specific lysine residues on a target protein. Identification of pupylation substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of pupylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, a new bioinformatics tool named EnsemblePup was developed that used an ensemble of support vector machine classifiers to predict pupylation sites. The highlight of EnsemblePup was to utilize the Bi-profile Bayes feature extraction as the encoding scheme. The performance of EnsemblePup was measured with a sensitivity of 79.49%, a specificity of 82.35%, an accuracy of 85.43%, and a Matthews correlation coefficient of 0.617 using the 5-fold cross validation on the training dataset. When compared with other existing methods on a benchmark dataset, the EnsemblePup provided better predictive performance, with a sensitivity of 80.00%, a specificity of 83.33%, an accuracy of 82.00%, and a Matthews correlation coefficient of 0.629. The experimental results suggested that EnsemblePup presented here might be useful to identify and annotate potential pupylation sites in proteins of interest. A web server for predicting pupylation sites was developed.

  1. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  2. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  3. Evolutionary Implications of Metal Binding Features in Different Species’ Prion Protein: An Inorganic Point of View

    Directory of Open Access Journals (Sweden)

    Diego La Mendola

    2014-05-01

    Full Text Available Prion disorders are a group of fatal neurodegenerative conditions of mammals. The key molecular event in the pathogenesis of such diseases is the conformational conversion of prion protein, PrPC, into a misfolded form rich in β-sheet structure, PrPSc, but the detailed mechanistic aspects of prion protein conversion remain enigmatic. There is uncertainty on the precise physiological function of PrPC in healthy individuals. Several evidences support the notion of its role in copper homeostasis. PrPC binds Cu2+ mainly through a domain composed by four to five repeats of eight amino acids. In addition to mammals, PrP homologues have also been identified in birds, reptiles, amphibians and fish. The globular domain of protein is retained in the different species, suggesting that the protein carries out an essential common function. However, the comparison of amino acid sequences indicates that prion protein has evolved differently in each vertebrate class. The primary sequences are strongly conserved in each group, but these exhibit a low similarity with those of mammals. The N-terminal domain of different prions shows tandem amino acid repeats with an increasing amount of histidine residues going from amphibians to mammals. The difference in the sequence affects the number of copper binding sites, the affinity and the coordination environment of metal ions, suggesting that the involvement of prion in metal homeostasis may be a specific characteristic of mammalian prion protein. In this review, we describe the similarities and the differences in the metal binding of different species’ prion protein, as revealed by studies carried out on the entire protein and related peptide fragments.

  4. Compressive behavior of pervious concretes and a quantification of the influence of random pore structure features

    International Nuclear Information System (INIS)

    Deo, Omkar; Neithalath, Narayanan

    2010-01-01

    Research highlights: → Identified the relevant pore structure features of pervious concretes, provided methodologies to extract those, and quantified the influence of these features on compressive response. → A model for stress-strain relationship of pervious concretes, and relationship between model parameters and parameters of the stress-strain relationship developed. → Statistical model for compressive strength as a function of pore structure features; and a stochastic model for the sensitivity of pore structure features in strength prediction. - Abstract: Properties of a random porous material such as pervious concrete are strongly dependent on its pore structure features, porosity being an important one among them. This study deals with developing an understanding of the material structure-compressive response relationships in pervious concretes. Several pervious concrete mixtures with different pore structure features are proportioned and subjected to static compression tests. The pore structure features such as pore area fractions, pore sizes, mean free spacing of the pores, specific surface area, and the three-dimensional pore distribution density are extracted using image analysis methods. The compressive stress-strain response of pervious concretes, a model to predict the stress-strain response, and its relationship to several of the pore structure features are outlined. Larger aggregate sizes and increase in paste volume fractions are observed to result in increased compressive strengths. The compressive response is found to be influenced by the pore sizes, their distributions and spacing. A statistical model is used to relate the compressive strength to the relevant pore structure features, which is then used as a base model in a Monte-Carlo simulation to evaluate the sensitivity of the predicted compressive strength to the model terms.

  5. DroidEnsemble: Detecting Android Malicious Applications with Ensemble of String and Structural Static Features

    KAUST Repository

    Wang, Wei

    2018-05-11

    Android platform has dominated the Operating System of mobile devices. However, the dramatic increase of Android malicious applications (malapps) has caused serious software failures to Android system and posed a great threat to users. The effective detection of Android malapps has thus become an emerging yet crucial issue. Characterizing the behaviors of Android applications (apps) is essential to detecting malapps. Most existing work on detecting Android malapps was mainly based on string static features such as permissions and API usage extracted from apps. There also exists work on the detection of Android malapps with structural features, such as Control Flow Graph (CFG) and Data Flow Graph (DFG). As Android malapps have become increasingly polymorphic and sophisticated, using only one type of static features may result in false negatives. In this work, we propose DroidEnsemble that takes advantages of both string features and structural features to systematically and comprehensively characterize the static behaviors of Android apps and thus build a more accurate detection model for the detection of Android malapps. We extract each app’s string features, including permissions, hardware features, filter intents, restricted API calls, used permissions, code patterns, as well as structural features like function call graph. We then use three machine learning algorithms, namely, Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Random Forest (RF), to evaluate the performance of these two types of features and of their ensemble. In the experiments, We evaluate our methods and models with 1386 benign apps and 1296 malapps. Extensive experimental results demonstrate the effectiveness of DroidEnsemble. It achieves the detection accuracy as 95.8% with only string features and as 90.68% with only structural features. DroidEnsemble reaches the detection accuracy as 98.4% with the ensemble of both types of features, reducing 9 false positives and 12 false

  6. Chinese wine classification system based on micrograph using combination of shape and structure features

    Science.gov (United States)

    Wan, Yi

    2011-06-01

    Chinese wines can be classification or graded by the micrographs. Micrographs of Chinese wines show floccules, stick and granule of variant shape and size. Different wines have variant microstructure and micrographs, we study the classification of Chinese wines based on the micrographs. Shape and structure of wines' particles in microstructure is the most important feature for recognition and classification of wines. So we introduce a feature extraction method which can describe the structure and region shape of micrograph efficiently. First, the micrographs are enhanced using total variation denoising, and segmented using a modified Otsu's method based on the Rayleigh Distribution. Then features are extracted using proposed method in the paper based on area, perimeter and traditional shape feature. Eight kinds total 26 features are selected. Finally, Chinese wine classification system based on micrograph using combination of shape and structure features and BP neural network have been presented. We compare the recognition results for different choices of features (traditional shape features or proposed features). The experimental results show that the better classification rate have been achieved using the combinational features proposed in this paper.

  7. A feature-based approach to modeling protein-DNA interactions.

    Directory of Open Access Journals (Sweden)

    Eilon Sharon

    Full Text Available Transcription factor (TF binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM, which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs, a novel probabilistic method for modeling TF-DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/.

  8. The contact activation proteins: a structure/function overview

    NARCIS (Netherlands)

    Meijers, J. C.; McMullen, B. A.; Bouma, B. N.

    1992-01-01

    In recent years, extensive knowledge has been obtained on the structure/function relationships of blood coagulation proteins. In this overview, we present recent developments on the structure/function relationships of the contact activation proteins: factor XII, high molecular weight kininogen,

  9. De novo protein structure determination using sparse NMR data

    International Nuclear Information System (INIS)

    Bowers, Peter M.; Strauss, Charlie E.M.; Baker, David

    2000-01-01

    We describe a method for generating moderate to high-resolution protein structures using limited NMR data combined with the ab initio protein structure prediction method Rosetta. Peptide fragments are selected from proteins of known structure based on sequence similarity and consistency with chemical shift and NOE data. Models are built from these fragments by minimizing an energy function that favors hydrophobic burial, strand pairing, and satisfaction of NOE constraints. Models generated using this procedure with ∼1 NOE constraint per residue are in some cases closer to the corresponding X-ray structures than the published NMR solution structures. The method requires only the sparse constraints available during initial stages of NMR structure determination, and thus holds promise for increasing the speed with which protein solution structures can be determined

  10. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  11. Prediction of protein–protein interactions: unifying evolution and structure at protein interfaces

    International Nuclear Information System (INIS)

    Tuncbag, Nurcan; Gursoy, Attila; Keskin, Ozlem

    2011-01-01

    The vast majority of the chores in the living cell involve protein–protein interactions. Providing details of protein interactions at the residue level and incorporating them into protein interaction networks are crucial toward the elucidation of a dynamic picture of cells. Despite the rapid increase in the number of structurally known protein complexes, we are still far away from a complete network. Given experimental limitations, computational modeling of protein interactions is a prerequisite to proceed on the way to complete structural networks. In this work, we focus on the question 'how do proteins interact?' rather than 'which proteins interact?' and we review structure-based protein–protein interaction prediction approaches. As a sample approach for modeling protein interactions, PRISM is detailed which combines structural similarity and evolutionary conservation in protein interfaces to infer structures of complexes in the protein interaction network. This will ultimately help us to understand the role of protein interfaces in predicting bound conformations

  12. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

    Science.gov (United States)

    2013-01-01

    Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction

  13. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.

    Science.gov (United States)

    Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun

    2016-11-01

    The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the "intermediate" decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys .

  14. Structural and functional analysis of VQ motif-containing proteins in Arabidopsis as interacting proteins of WRKY transcription factors.

    Science.gov (United States)

    Cheng, Yuan; Zhou, Yuan; Yang, Yan; Chi, Ying-Jun; Zhou, Jie; Chen, Jian-Ye; Wang, Fei; Fan, Baofang; Shi, Kai; Zhou, Yan-Hong; Yu, Jing-Quan; Chen, Zhixiang

    2012-06-01

    WRKY transcription factors are encoded by a large gene superfamily with a broad range of roles in plants. Recently, several groups have reported that proteins containing a short VQ (FxxxVQxLTG) motif interact with WRKY proteins. We have recently discovered that two VQ proteins from Arabidopsis (Arabidopsis thaliana), SIGMA FACTOR-INTERACTING PROTEIN1 and SIGMA FACTOR-INTERACTING PROTEIN2, act as coactivators of WRKY33 in plant defense by specifically recognizing the C-terminal WRKY domain and stimulating the DNA-binding activity of WRKY33. In this study, we have analyzed the entire family of 34 structurally divergent VQ proteins from Arabidopsis. Yeast (Saccharomyces cerevisiae) two-hybrid assays showed that Arabidopsis VQ proteins interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY proteins. Using site-directed mutagenesis, we identified structural features of these two closely related groups of WRKY domains that are critical for interaction with VQ proteins. Quantitative reverse transcription polymerase chain reaction revealed that expression of a majority of Arabidopsis VQ genes was responsive to pathogen infection and salicylic acid treatment. Functional analysis using both knockout mutants and overexpression lines revealed strong phenotypes in growth, development, and susceptibility to pathogen infection. Altered phenotypes were substantially enhanced through cooverexpression of genes encoding interacting VQ and WRKY proteins. These findings indicate that VQ proteins play an important role in plant growth, development, and response to environmental conditions, most likely by acting as cofactors of group I and IIc WRKY transcription factors.

  15. Structure of synaptophysin: a hexameric MARVEL-domain channel protein.

    Science.gov (United States)

    Arthur, Christopher P; Stowell, Michael H B

    2007-06-01

    Synaptophysin I (SypI) is an archetypal member of the MARVEL-domain family of integral membrane proteins and one of the first synaptic vesicle proteins to be identified and cloned. Most all MARVEL-domain proteins are involved in membrane apposition and vesicle-trafficking events, but their precise role in these processes is unclear. We have purified mammalian SypI and determined its three-dimensional (3D) structure by using electron microscopy and single-particle 3D reconstruction. The hexameric structure resembles an open basket with a large pore and tenuous interactions within the cytosolic domain. The structure suggests a model for Synaptophysin's role in fusion and recycling that is regulated by known interactions with the SNARE machinery. This 3D structure of a MARVEL-domain protein provides a structural foundation for understanding the role of these important proteins in a variety of biological processes.

  16. Sampling Realistic Protein Conformations Using Local Structural Bias

    DEFF Research Database (Denmark)

    Hamelryck, Thomas Wim; Kent, John T.; Krogh, A.

    2006-01-01

    The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which...... are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded...... in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications...

  17. Sieve element occlusion (SEO) genes encode structural phloem proteins involved in wound sealing of the phloem.

    Science.gov (United States)

    Ernst, Antonia M; Jekat, Stephan B; Zielonka, Sascia; Müller, Boje; Neumann, Ulla; Rüping, Boris; Twyman, Richard M; Krzyzanek, Vladislav; Prüfer, Dirk; Noll, Gundula A

    2012-07-10

    The sieve element occlusion (SEO) gene family originally was delimited to genes encoding structural components of forisomes, which are specialized crystalloid phloem proteins found solely in the Fabaceae. More recently, SEO genes discovered in various non-Fabaceae plants were proposed to encode the common phloem proteins (P-proteins) that plug sieve plates after wounding. We carried out a comprehensive characterization of two tobacco (Nicotiana tabacum) SEO genes (NtSEO). Reporter genes controlled by the NtSEO promoters were expressed specifically in immature sieve elements, and GFP-SEO fusion proteins formed parietal agglomerates in intact sieve elements as well as sieve plate plugs after wounding. NtSEO proteins with and without fluorescent protein tags formed agglomerates similar in structure to native P-protein bodies when transiently coexpressed in Nicotiana benthamiana, and the analysis of these protein complexes by electron microscopy revealed ultrastructural features resembling those of native P-proteins. NtSEO-RNA interference lines were essentially devoid of P-protein structures and lost photoassimilates more rapidly after injury than control plants, thus confirming the role of P-proteins in sieve tube sealing. We therefore provide direct evidence that SEO genes in tobacco encode P-protein subunits that affect translocation. We also found that peptides recently identified in fascicular phloem P-protein plugs from squash (Cucurbita maxima) represent cucurbit members of the SEO family. Our results therefore suggest a common evolutionary origin for P-proteins found in the sieve elements of all dicotyledonous plants and demonstrate the exceptional status of extrafascicular P-proteins in cucurbits.

  18. Rapid and reliable protein structure determination via chemical shift threading.

    Science.gov (United States)

    Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S

    2018-01-01

    Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .

  19. Evaluation of physical structural features on influencing enzymatic hydrolysis efficiency of micronized wood

    Science.gov (United States)

    Jinxue Jiang; Jinwu Wang; Xiao Zhang; Michael Wolcott

    2016-01-01

    Enzymatic hydrolysis of lignocellulosic biomass is highly dependent on the changes in structural features after pretreatment. Mechanical milling pretreatment is an effective approach to alter the physical structure of biomass and thus improve enzymatic hydrolysis. This study examined the influence of structural characteristics on the enzymatic hydrolysis of micronized...

  20. Is protein structure prediction still an enigma?

    African Journals Online (AJOL)

    STORAGESEVER

    2008-12-29

    Dec 29, 2008 ... Computer methods for protein analysis address this problem since they study the .... neighbor methods, molecular dynamic simulation, and approaches .... fuzzy clustering, neural net works, logistic regression, decision tree ...

  1. Dynamic features of apo and bound HIV-Nef protein reveal the anti-HIV dimerization inhibition mechanism.

    Science.gov (United States)

    Moonsamy, Suri; Bhakat, Soumendranath; Soliman, Mahmoud E S

    2015-01-01

    The first account on the dynamic features of Nef or negative factor, a small myristoylated protein located in the cytoplasm believes to increase HIV-1 viral titer level, is reported herein. Due to its major role in HIV-1 pathogenicity, Nef protein is considered an emerging target in anti-HIV drug design and discovery process. In this study, comparative long-range all-atom molecular dynamics simulations were employed for apo and bound protein to unveil molecular mechanism of HIV-Nef dimerization and inhibition. Results clearly revealed that B9, a newly discovered Nef inhibitor, binds at the dimeric interface of Nef protein and caused significant separation between orthogonally opposed residues, namely Asp108, Leu112 and Gln104. Large differences in magnitudes were observed in the radius of gyration (∼1.5 Å), per-residue fluctuation (∼2 Å), C-alpha deviations (∼2 Å) which confirm a comparatively more flexible nature of apo conformation due to rapid dimeric association. Compared to the bound conformer, a more globally correlated motion in case of apo structure of HIV-Nef confirms the process of dimeric association. This clearly highlights the process of inhibition as a result of ligand binding. The difference in principal component analysis (PCA) scatter plot and per-residue mobility plot across first two normal modes further justifies the same findings. The in-depth dynamic analyses of Nef protein presented in this report would serve crucial in understanding its function and inhibition mechanisms. Information on inhibitor binding mode would also assist in designing of potential inhibitors against this important HIV target.

  2. Solution structure and dynamics of melanoma inhibitory activity protein

    International Nuclear Information System (INIS)

    Lougheed, Julie C.; Domaille, Peter J.; Handel, Tracy M.

    2002-01-01

    Melanoma inhibitory activity (MIA) is a small secreted protein that is implicated in cartilage cell maintenance and melanoma metastasis. It is representative of a recently discovered family of proteins that contain a Src Homologous 3 (SH3) subdomain. While SH3 domains are normally found in intracellular proteins and mediate protein-protein interactions via recognition of polyproline helices, MIA is single-domain extracellular protein, and it probably binds to a different class of ligands.Here we report the assignments, solution structure, and dynamics of human MIA determined by heteronuclear NMR methods. The structures were calculated in a semi-automated manner without manual assignment of NOE crosspeaks, and have a backbone rmsd of 0.38 A over the ordered regions of the protein. The structure consists of an SH3-like subdomain with N- and C-terminal extensions of approximately 20 amino acids each that together form a novel fold. The rmsd between the solution structure and our recently reported crystal structure is 0.86 A over the ordered regions of the backbone, and the main differences are localized to the most dynamic regions of the protein. The similarity between the NMR and crystal structures supports the use of automated NOE assignments and ambiguous restraints to accelerate the calculation of NMR structures

  3. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    Science.gov (United States)

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  4. Association of protein structure, protein and carbohydrate subfractions with bioenergy profiles and biodegradation functions in modeled forage

    Science.gov (United States)

    Ji, Cuiying; Zhang, Xuewei; Yu, Peiqiang

    2016-03-01

    The objectives of this study were to detect unique aspects and association of forage protein inherent structure, biological compounds, protein and carbohydrate subfractions, bioenergy profiles, and biodegradation features. In this study, common available alfalfa hay from two different sourced-origins (FSO vs. CSO) was used as a modeled forage for inherent structure profile, bioenergy, biodegradation and their association between their structure and bio-functions. The molecular spectral profiles were determined using non-invasive molecular spectroscopy. The parameters included: protein structure amide I group, amide II group and their ratios; protein subfractions (PA1, PA2, PB1, PB2, PC); carbohydrate fractions (CA1, CA2, CA3, CA4, CB1, CB2, CC); biodegradable and undegradable fractions of protein (RDPA2, RDPB1, RDPB2, RDP; RUPA2 RUPB1, RUPB2, RUPC, RUP); biodegradable and undegradable fractions of carbohydrate (RDCA4, RDCB1, RDCB2, RDCB3, RDCHO; RUCA4, RUCB1; RUCB2; RUCB3 RUCC, RUCHO) and bioenergy profiles (tdNDF, tdFA, tdCP, tdNFC, TDN1 ×, DE3 ×, ME3 ×, NEL3 ×; NEm, NEg). The results show differences in protein and carbohydrate (CHO) subfractions in the moderately degradable true protein fraction (PB1: 502 vs. 420 g/kg CP, P = 0.09), slowly degraded true protein fraction (PB2: 45 vs. 96 g/kg CP, P = 0.02), moderately degradable CHO fraction (CB2: 283 vs. 223 g/kg CHO, P = 0.06) and slowly degraded CHO fraction (CB3: 369 vs. 408 g/kg CHO) between the two sourced origins. As to biodegradable (RD) fractions of protein and CHO in rumen, there were differences in RD of PB1 (417 vs. 349 g/kg CP, P = 0.09), RD of PB2 (29 vs. 62 g/kg CP, P = 0.02), RD of CB2 (251 vs. 198 g/kg DM, P = 0.06), RD of CB3 (236 vs. 261 g/kg CHO, P = 0.08). As to bioenergy profile, there were differences in total digestible nutrient (TDN: 551 vs. 537 g/kg DM, P = 0.06), and metabolic bioenergy (P = 0.095). As to protein molecular structure, there were differences in protein structure 1st

  5. Structural and sequence analysis of imelysin-like proteins implicated in bacterial iron uptake.

    Directory of Open Access Journals (Sweden)

    Qingping Xu

    Full Text Available Imelysin-like proteins define a superfamily of bacterial proteins that are likely involved in iron uptake. Members of this superfamily were previously thought to be peptidases and were included in the MEROPS family M75. We determined the first crystal structures of two remotely related, imelysin-like proteins. The Psychrobacter arcticus structure was determined at 2.15 Å resolution and contains the canonical imelysin fold, while higher resolution structures from the gut bacteria Bacteroides ovatus, in two crystal forms (at 1.25 Å and 1.44 Å resolution, have a circularly permuted topology. Both structures are highly similar to each other despite low sequence similarity and circular permutation. The all-helical structure can be divided into two similar four-helix bundle domains. The overall structure and the GxHxxE motif region differ from known HxxE metallopeptidases, suggesting that imelysin-like proteins are not peptidases. A putative functional site is located at the domain interface. We have now organized the known homologous proteins into a superfamily, which can be separated into four families. These families share a similar functional site, but each has family-specific structural and sequence features. These results indicate that imelysin-like proteins have evolved from a common ancestor, and likely have a conserved function.

  6. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    Science.gov (United States)

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Function and structure of GFP-like proteins in the protein data bank.

    Science.gov (United States)

    Ong, Wayne J-H; Alvarez, Samuel; Leroux, Ivan E; Shahid, Ramza S; Samma, Alex A; Peshkepija, Paola; Morgan, Alicia L; Mulcahy, Shawn; Zimmer, Marc

    2011-04-01

    The RCSB protein databank contains 266 crystal structures of green fluorescent proteins (GFP) and GFP-like proteins. This is the first systematic analysis of all the GFP-like structures in the pdb. We have used the pdb to examine the function of fluorescent proteins (FP) in nature, aspects of excited state proton transfer (ESPT) in FPs, deformation from planarity of the chromophore and chromophore maturation. The conclusions reached in this review are that (1) The lid residues are highly conserved, particularly those on the "top" of the β-barrel. They are important to the function of GFP-like proteins, perhaps in protecting the chromophore or in β-barrel formation. (2) The primary/ancestral function of GFP-like proteins may well be to aid in light induced electron transfer. (3) The structural prerequisites for light activated proton pumps exist in many structures and it's possible that like bioluminescence, proton pumps are secondary functions of GFP-like proteins. (4) In most GFP-like proteins the protein matrix exerts a significant strain on planar chromophores forcing most GFP-like proteins to adopt non-planar chromophores. These chromophoric deviations from planarity play an important role in determining the fluorescence quantum yield. (5) The chemospatial characteristics of the chromophore cavity determine the isomerization state of the chromophore. The cavities of highlighter proteins that can undergo cis/trans isomerization have chemospatial properties that are common to both cis and trans GFP-like proteins.

  8. Visualisation of variable binding pockets on protein surfaces by probabilistic analysis of related structure sets

    Directory of Open Access Journals (Sweden)

    Ashford Paul

    2012-03-01

    Full Text Available Abstract Background Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. Results We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i analysis of a kinase superfamily highlights the

  9. Visualisation of variable binding pockets on protein surfaces by probabilistic analysis of related structure sets.

    Science.gov (United States)

    Ashford, Paul; Moss, David S; Alex, Alexander; Yeap, Siew K; Povia, Alice; Nobeli, Irene; Williams, Mark A

    2012-03-14

    Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active

  10. Tuning structure of oppositely charged nanoparticle and protein complexes

    Energy Technology Data Exchange (ETDEWEB)

    Kumar, Sugam, E-mail: sugam@barc.gov.in; Aswal, V. K., E-mail: sugam@barc.gov.in [Solid State Physics Division, Bhabha Atomic Research Centre, Mumbai-400085 (India); Callow, P. [Institut Laue Langevin, DS/LSS, 6 rue Jules Horowitz, 38042 Grenoble Cedex 9 (France)

    2014-04-24

    Small-angle neutron scattering (SANS) has been used to probe the structures of anionic silica nanoparticles (LS30) and cationic lyszyme protein (M.W. 14.7kD, I.P. ∼ 11.4) by tuning their interaction through the pH variation. The protein adsorption on nanoparticles is found to be increasing with pH and determined by the electrostatic attraction between two components as well as repulsion between protein molecules. We show the strong electrostatic attraction between nanoparticles and protein molecules leads to protein-mediated aggregation of nanoparticles which are characterized by fractal structures. At pH 5, the protein adsorption gives rise to nanoparticle aggregation having surface fractal morphology with close packing of nanoparticles. The surface fractals transform to open structures of mass fractal morphology at higher pH (7 and 9) on approaching isoelectric point (I.P.)

  11. Studying Membrane Protein Structure and Function Using Nanodiscs

    DEFF Research Database (Denmark)

    Huda, Pie

    The structure and dynamic of membrane proteins can provide valuable information about general functions, diseases and effects of various drugs. Studying membrane proteins are a challenge as an amphiphilic environment is necessary to stabilise the protein in a functionally and structurally relevant...... form. This is most typically achieved through the use of detergent based reconstitution systems. However, time and again such systems fail to provide a suitable environment causing aggregation and inactivation. Nanodiscs are self-assembled lipoproteins containing two membrane scaffold proteins...... and a lipid bilayer in defined nanometer size, which can act as a stabiliser for membrane proteins. This enables both functional and structural investigation of membrane proteins in a detergent free environment which is closer to the native situation. Understanding the self-assembly of nanodiscs is important...

  12. The research of structural features of astralens - nanodimensional carbon particles of fulleroid type

    International Nuclear Information System (INIS)

    Ponomarev, A.N.; Nikitin, V.A.; Rybalko, V.V.

    2006-01-01

    The article is focused on the research of structural features of astralens - nanodimensional carbonic particles of fulleroid type. Astralens are perspective nanomodificators of properties of materials of different types. The potentials os astralens as modificators depend on their characteristic structural features, and in the first place, on the distribution of nanoparticles by sizes. The typical dimensions of astralens are determined to be within the range of 15-75 nm [ru

  13. Exploring protein dynamics space: the dynasome as the missing link between protein structure and function.

    Directory of Open Access Journals (Sweden)

    Ulf Hensen

    Full Text Available Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics.

  14. Host Proteins Determine MRSA Biofilm Structure and Integrity

    DEFF Research Database (Denmark)

    Dreier, Cindy; Nielsen, Astrid; Jørgensen, Nis Pedersen

    Human extracellular matrix (hECM) proteins aids the initial attachment and initiation of an infection, by specific binding to bacterial cell surface proteins. However, the importance of hECM proteins in structure, integrity and antibiotic resilience of a biofilm is unknown. This study aims...... to determine how specific hECM proteins affect S. aureus USA300 JE2 biofilms. Biofilms were grown in the presence of synovial fluid from rheumatoid arteritis patients to mimic in vivo conditions, where bacteria incorporate hECM proteins into the biofilm matrix. Difference in biofilm structure, with and without...... addition of hECM to growth media, was visualized by confocal laser scanning microscopy. Two enzymatic degradation experiments were used to study biofilm matrix composition and importance of hECM proteins: enzymatic removal of specific hECM proteins from growth media, before biofilm formation, and enzymatic...

  15. Coordination Analysis Using Global Structural Constraints and Alignment-based Local Features

    Science.gov (United States)

    Hara, Kazuo; Shimbo, Masashi; Matsumoto, Yuji

    We propose a hybrid approach to coordinate structure analysis that combines a simple grammar to ensure consistent global structure of coordinations in a sentence, and features based on sequence alignment to capture local symmetry of conjuncts. The weight of the alignment-based features, which in turn determines the score of coordinate structures, is optimized by perceptron training on a given corpus. A bottom-up chart parsing algorithm efficiently finds the best scoring structure, taking both nested or non-overlapping flat coordinations into account. We demonstrate that our approach outperforms existing parsers in coordination scope detection on the Genia corpus.

  16. Integral membrane protein structure determination using pseudocontact shifts

    Energy Technology Data Exchange (ETDEWEB)

    Crick, Duncan J.; Wang, Jue X. [University of Cambridge, Department of Biochemistry (United Kingdom); Graham, Bim; Swarbrick, James D. [Monash University, Monash Institute of Pharmaceutical Sciences (Australia); Mott, Helen R.; Nietlispach, Daniel, E-mail: dn206@cam.ac.uk [University of Cambridge, Department of Biochemistry (United Kingdom)

    2015-04-15

    Obtaining enough experimental restraints can be a limiting factor in the NMR structure determination of larger proteins. This is particularly the case for large assemblies such as membrane proteins that have been solubilized in a membrane-mimicking environment. Whilst in such cases extensive deuteration strategies are regularly utilised with the aim to improve the spectral quality, these schemes often limit the number of NOEs obtainable, making complementary strategies highly beneficial for successful structure elucidation. Recently, lanthanide-induced pseudocontact shifts (PCSs) have been established as a structural tool for globular proteins. Here, we demonstrate that a PCS-based approach can be successfully applied for the structure determination of integral membrane proteins. Using the 7TM α-helical microbial receptor pSRII, we show that PCS-derived restraints from lanthanide binding tags attached to four different positions of the protein facilitate the backbone structure determination when combined with a limited set of NOEs. In contrast, the same set of NOEs fails to determine the correct 3D fold. The latter situation is frequently encountered in polytopical α-helical membrane proteins and a PCS approach is thus suitable even for this particularly challenging class of membrane proteins. The ease of measuring PCSs makes this an attractive route for structure determination of large membrane proteins in general.

  17. Using linear algebra for protein structural comparison and classification.

    Science.gov (United States)

    Gomide, Janaína; Melo-Minardi, Raquel; Dos Santos, Marcos Augusto; Neshich, Goran; Meira, Wagner; Lopes, Júlio César; Santoro, Marcelo

    2009-07-01

    In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

  18. Using linear algebra for protein structural comparison and classification

    Directory of Open Access Journals (Sweden)

    Janaína Gomide

    2009-01-01

    Full Text Available In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD and Latent Semantic Indexing (LSI techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

  19. High C-Reactive Protein Predicts Delirium Incidence, Duration, and Feature Severity After Major Noncardiac Surgery.

    Science.gov (United States)

    Vasunilashorn, Sarinnapha M; Dillon, Simon T; Inouye, Sharon K; Ngo, Long H; Fong, Tamara G; Jones, Richard N; Travison, Thomas G; Schmitt, Eva M; Alsop, David C; Freedman, Steven D; Arnold, Steven E; Metzger, Eran D; Libermann, Towia A; Marcantonio, Edward R

    2017-08-01

    To examine associations between the inflammatory marker C-reactive protein (CRP) measured preoperatively and on postoperative day 2 (POD2) and delirium incidence, duration, and feature severity. Prospective cohort study. Two academic medical centers. Adults aged 70 and older undergoing major noncardiac surgery (N = 560). Plasma CRP was measured using enzyme-linked immunosorbent assay. Delirium was assessed from Confusion Assessment Method (CAM) interviews and chart review. Delirium duration was measured according to number of hospital days with delirium. Delirium feature severity was defined as the sum of CAM-Severity (CAM-S) scores on all postoperative hospital days. Generalized linear models were used to examine independent associations between CRP (preoperatively and POD2 separately) and delirium incidence, duration, and feature severity; prolonged hospital length of stay (LOS, >5 days); and discharge disposition. Postoperative delirium occurred in 24% of participants, 12% had 2 or more delirium days, and the mean ± standard deviation sum CAM-S was 9.3 ± 11.4. After adjusting for age, sex, surgery type, anesthesia route, medical comorbidities, and postoperative infectious complications, participants with preoperative CRP of 3 mg/L or greater had a risk of delirium that was 1.5 times as great (95% confidence interval (CI) = 1.1-2.1) as that of those with CRP less than 3 mg/L, 0.4 more delirium days (P delirium (3.6 CAM-S points higher, P delirium (95% CI = 1.0-2.4) as those in the lowest quartile (≤127.53 mg/L), had 0.2 more delirium days (P delirium (4.5 CAM-S points higher, P delirium incidence, duration, and feature severity. CRP may be useful to identify individuals who are at risk of developing delirium. © 2017, Copyright the Authors Journal compilation © 2017, The American Geriatrics Society.

  20. Structure of the Aeropyrum pernix L7Ae multifunctional protein and insight into its extreme thermostability

    International Nuclear Information System (INIS)

    Bhuiya, Mohammad Wadud; Suryadi, Jimmy; Zhou, Zholi; Brown, Bernard Andrew II

    2013-01-01

    The crystal structure of A. pernix L7Ae is reported, providing insight into the extreme thermostability of this protein. Archaeal ribosomal protein L7Ae is a multifunctional RNA-binding protein that directs post-transcriptional modification of archaeal RNAs. The L7Ae protein from Aeropyrum pernix (Ap L7Ae), a member of the Crenarchaea, was found to have an extremely high melting temperature (>383 K). The crystal structure of Ap L7Ae has been determined to a resolution of 1.56 Å. The structure of Ap L7Ae was compared with the structures of two homologs: hyperthermophilic Methanocaldococcus jannaschii L7Ae and the mesophilic counterpart mammalian 15.5 kD protein. The primary stabilizing feature in the Ap L7Ae protein appears to be the large number of ion pairs and extensive ion-pair network that connects secondary-structural elements. To our knowledge, Ap L7Ae is among the most thermostable single-domain monomeric proteins presently observed

  1. "SP-G", a putative new surfactant protein--tissue localization and 3D structure.

    Directory of Open Access Journals (Sweden)

    Felix Rausch

    Full Text Available Surfactant proteins (SP are well known from human lung. These proteins assist the formation of a monolayer of surface-active phospholipids at the liquid-air interface of the alveolar lining, play a major role in lowering the surface tension of interfaces, and have functions in innate and adaptive immune defense. During recent years it became obvious that SPs are also part of other tissues and fluids such as tear fluid, gingiva, saliva, the nasolacrimal system, and kidney. Recently, a putative new surfactant protein (SFTA2 or SP-G was identified, which has no sequence or structural identity to the already know surfactant proteins. In this work, computational chemistry and molecular-biological methods were combined to localize and characterize SP-G. With the help of a protein structure model, specific antibodies were obtained which allowed the detection of SP-G not only on mRNA but also on protein level. The localization of this protein in different human tissues, sequence based prediction tools for posttranslational modifications and molecular dynamic simulations reveal that SP-G has physicochemical properties similar to the already known surfactant proteins B and C. This includes also the possibility of interactions with lipid systems and with that, a potential surface-regulatory feature of SP-G. In conclusion, the results indicate SP-G as a new surfactant protein which represents an until now unknown surfactant protein class.

  2. PSPP: a protein structure prediction pipeline for computing clusters.

    Directory of Open Access Journals (Sweden)

    Michael S Lee

    2009-07-01

    Full Text Available Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes.The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform

  3. Nucleos: a web server for the identification of nucleotide-binding sites in protein structures.

    Science.gov (United States)

    Parca, Luca; Ferré, Fabrizio; Ausiello, Gabriele; Helmer-Citterich, Manuela

    2013-07-01

    Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the template-binding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brand-new nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/.

  4. Survey of immunological features of the alpha-like proteins of Streptococcus agalactiae.

    Science.gov (United States)

    Maeland, Johan A; Afset, Jan E; Lyng, Randi V; Radtke, Andreas

    2015-02-01

    Nearly all Streptococcus agalactiae (group B streptococcus [GBS]) strains express a protein which belongs to the so-called alpha-like proteins (Alps), of which Cα, Alp1, Alp2, Alp3, Rib, and Alp4 are known to occur in GBS. The Alps are chimeras which form mosaic structures on the GBS surface. Both N- and C-terminal stretches of the Alps possess immunogenic sites of dissimilar immunological specificity. In this review, we have compiled data dealing with the specificity of the N- and C-terminal immunogenic sites of the Alps. The majority of N-terminal sites show protein specificity while the C-terminal sites show broader cross-reactivity. Molecular serotyping has revealed that antibody-based serotyping has often resulted in erroneous Alp identification, due to persistence of cross-reacting antibodies in antisera for serotyping. Retrospectively, this could be expected on the basis of sequence analysis results. Some of the historical R proteins are in fact Alps. The data included in the review may provide a basis for decisions regarding techniques for the preparation of specific antisera for serotyping of GBS, for use in other approaches in GBS research, and for decision making in the context of GBS vaccine developments. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  5. Bayesian comparison of protein structures using partial Procrustes distance.

    Science.gov (United States)

    Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi

    2017-09-26

    An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.

  6. Structural and Functional Annotation of Hypothetical Proteins of O139

    Directory of Open Access Journals (Sweden)

    Md. Saiful Islam

    2015-06-01

    Full Text Available In developing countries threat of cholera is a significant health concern whenever water purification and sewage disposal systems are inadequate. Vibrio cholerae is one of the responsible bacteria involved in cholera disease. The complete genome sequence of V. cholerae deciphers the presence of various genes and hypothetical proteins whose function are not yet understood. Hence analyzing and annotating the structure and function of hypothetical proteins is important for understanding the V. cholerae. V. cholerae O139 is the most common and pathogenic bacterial strain among various V. cholerae strains. In this study sequence of six hypothetical proteins of V. cholerae O139 has been annotated from NCBI. Various computational tools and databases have been used to determine domain family, protein-protein interaction, solubility of protein, ligand binding sites etc. The three dimensional structure of two proteins were modeled and their ligand binding sites were identified. We have found domains and families of only one protein. The analysis revealed that these proteins might have antibiotic resistance activity, DNA breaking-rejoining activity, integrase enzyme activity, restriction endonuclease, etc. Structural prediction of these proteins and detection of binding sites from this study would indicate a potential target aiding docking studies for therapeutic designing against cholera.

  7. Structural study of surfactant-dependent interaction with protein

    Energy Technology Data Exchange (ETDEWEB)

    Mehan, Sumit; Aswal, Vinod K., E-mail: vkaswal@barc.gov.in [Solid State Physics Division, Bhabha Atomic Research Centre, Mumbai 400 085 (India); Kohlbrecher, Joachim [Laboratory for Neutron Scattering, Paul Scherrer Institut, CH-5232 PSI Villigen (Switzerland)

    2015-06-24

    Small-angle neutron scattering (SANS) has been used to study the complex structure of anionic BSA protein with three different (cationic DTAB, anionic SDS and non-ionic C12E10) surfactants. These systems form very different surfactant-dependent complexes. We show that the structure of protein-surfactant complex is initiated by the site-specific electrostatic interaction between the components, followed by the hydrophobic interaction at high surfactant concentrations. It is also found that hydrophobic interaction is preferred over the electrostatic interaction in deciding the resultant structure of protein-surfactant complexes.

  8. The influence of target structure on topographical features produced by ion beam sputtering

    International Nuclear Information System (INIS)

    Whitton, J.L.; Grant, W.A.

    1981-01-01

    Ion beam erosion of solid surfaces often results in the development of distinctive topographical features. The relationship between the type of features formed by ion erosion and target structure has been investigated. Single crystals of copper and nickel and the amorphous alloy Metglas have been bombarded to high doses (approx. >=10 19 ions cm -2 ) with 40 keV Ar + and P + . Topography changes were monitored using SEM and structural changes by TEM. Targets that retain their long range crystallinity show sharply defined, regular features that are related to the target structure. Targets that are highly disordered, either intrinsically or as a result of the ion bombardment, produce diffuse, smaller features. Those differences are observed at all stages in topographical evolution. (orig.)

  9. Protein features as determinants of wild-type glycoside hydrolase thermostability

    DEFF Research Database (Denmark)

    Geertz-Hansen, Henrik Marcus; Kiemer, Lars; Nielsen, Morten

    2017-01-01

    -silico methods guiding the discovery process would be of high value. To develop such an in-silico method and provide the data foundation of it, we determined the melting temperatures of 602 fungal glycoside hydrolases from the families GH5, 6, 7, 10, 11, 43 and AA9 (formerly GH61). We, then used sequence...... and homology modeled structure information of these enzymes to develop the ThermoP melting temperature prediction method. Futhermore, in the context of thermostability, we determined the relative importance of 160 molecular features, such as amino acid frequencies and spatial interactions, and exemplified...

  10. The Widespread Prevalence and Functional Significance of Silk-Like Structural Proteins in Metazoan Biological Materials.

    Directory of Open Access Journals (Sweden)

    Carmel McDougall

    Full Text Available In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the

  11. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-07-01

    Full Text Available Abstract Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes. While in Ab- dataset (antigen-antibody complexes are excluded, there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs. The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine

  12. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    Science.gov (United States)

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  13. Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

    Directory of Open Access Journals (Sweden)

    Bradley Michael E

    2006-02-01

    Full Text Available Abstract Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1 multiple sequence alignments, 2 mapping of alignment sites to crystal structure sites, 3 phylogenetic trees, 4 inferred ancestral sequences at internal tree nodes, and 5 amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural

  14. Characterization of the autophagy marker protein Atg8 reveals atypical features of autophagy in Plasmodium falciparum.

    Directory of Open Access Journals (Sweden)

    Rahul Navale

    Full Text Available Conventional autophagy is a lysosome-dependent degradation process that has crucial homeostatic and regulatory functions in eukaryotic organisms. As malaria parasites must dispose a number of self and host cellular contents, we investigated if autophagy in malaria parasites is similar to the conventional autophagy. Genome wide analysis revealed a partial autophagy repertoire in Plasmodium, as homologs for only 15 of the 33 yeast autophagy proteins could be identified, including the autophagy marker Atg8. To gain insights into autophagy in malaria parasites, we investigated Plasmodium falciparum Atg8 (PfAtg8 employing techniques and conditions that are routinely used to study autophagy. Atg8 was similarly expressed and showed punctate localization throughout the parasite in both asexual and sexual stages; it was exclusively found in the pellet fraction as an integral membrane protein, which is in contrast to the yeast or mammalian Atg8 that is distributed among cytosolic and membrane fractions, and suggests for a constitutive autophagy. Starvation, the best known autophagy inducer, decreased PfAtg8 level by almost 3-fold compared to the normally growing parasites. Neither the Atg8-associated puncta nor the Atg8 expression level was significantly altered by treatment of parasites with routinely used autophagy inhibitors (cysteine (E64 and aspartic (pepstatin protease inhibitors, the kinase inhibitor 3-methyladenine, and the lysosomotropic agent chloroquine, indicating an atypical feature of autophagy. Furthermore, prolonged inhibition of the major food vacuole protease activity by E64 and pepstatin did not cause accumulation of the Atg8-associated puncta in the food vacuole, suggesting that autophagy is primarily not meant for degradative function in malaria parasites. Atg8 showed partial colocalization with the apicoplast; doxycycline treatment, which disrupts apicoplast, did not affect Atg8 localization, suggesting a role, but not exclusive, in

  15. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    Science.gov (United States)

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  16. Topological properties of complex networks in protein structures

    Science.gov (United States)

    Kim, Kyungsik; Jung, Jae-Won; Min, Seungsik

    2014-03-01

    We study topological properties of networks in structural classification of proteins. We model the native-state protein structure as a network made of its constituent amino-acids and their interactions. We treat four structural classes of proteins composed predominantly of α helices and β sheets and consider several proteins from each of these classes whose sizes range from amino acids of the Protein Data Bank. Particularly, we simulate and analyze the network metrics such as the mean degree, the probability distribution of degree, the clustering coefficient, the characteristic path length, the local efficiency, and the cost. This work was supported by the KMAR and DP under Grant WISE project (153-3100-3133-302-350).

  17. Deriving a Mutation Index of Carcinogenicity Using Protein Structure and Protein Interfaces

    Science.gov (United States)

    Hakas, Jarle; Pearl, Frances; Zvelebil, Marketa

    2014-01-01

    With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/. PMID:24454733

  18. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein In vitro Digestibility and Solubility.

    Science.gov (United States)

    Bai, Mingmei; Qin, Guixin; Sun, Zewei; Long, Guohui

    2016-08-01

    The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller's dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR) spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area) of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003); moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004). On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (pdigestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  19. Protein Tyrosine Nitration: Biochemical Mechanisms and Structural Basis of its Functional Effects

    Science.gov (United States)

    Radi, Rafael

    2012-01-01

    CONSPECTUS The nitration of protein tyrosine residues to 3-nitrotyrosine represents an oxidative postranslational modification that unveils the disruption of nitric oxide (•NO) signaling and metabolism towards pro-oxidant processes. Indeed, excess levels of reactive oxygen species in the presence of •NO or •NO-derived metabolites lead to the formation of nitrating species such as peroxynitrite. Thus, protein 3-nitrotyrosine has been established as a biomarker of cell, tissue and systemic “nitroxidative stress”. Moreover, tyrosine nitration modifies key properties of the amino acid (i.e. phenol group pKa, redox potential, hydrophobicity and volume). Thus, the incorporation of a nitro group (−NO2) to protein tyrosines can lead to profound structural and functional changes, some of which contribute to altered cell and tissue homeostasis. In this Account, I describe our current efforts to define 1) biologically-relevant mechanisms of protein tyrosine nitration and 2) how this modification can cause changes in protein structure and function at the molecular level. First, the relevance of protein tyrosine nitration via free radical-mediated reactions (in both peroxynitrite-dependent or independent pathways) involving the intermediacy of tyrosyl radical (Tyr•) will be underscored. This feature of the nitration process becomes critical as Tyr• can take variable fates, including the formation of 3-nitrotyrosine. Fast kinetic techniques, electron paramagnetic resonance (EPR) studies, bioanalytical methods and kinetic simulations have altogether assisted to characterize and fingerprint the reactions of tyrosine with peroxynitrite and one-electron oxidants and its further evolution to 3-nitrotyrosine. Recent findings show that nitration of tyrosines in proteins associated to biomembranes is linked to the lipid peroxidation process via a connecting reaction that involves the one-electron oxidation of tyrosine by lipid peroxyl radicals (LOO•). Second

  20. Geomfinder: a multi-feature identifier of similar three-dimensional protein patterns: a ligand-independent approach.

    Science.gov (United States)

    Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel

    2016-01-01

    Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility

  1. Effects of NMR spectral resolution on protein structure calculation.

    Directory of Open Access Journals (Sweden)

    Suhas Tikole

    Full Text Available Adequate digital resolution and signal sensitivity are two critical factors for protein structure determinations by solution NMR spectroscopy. The prime objective for obtaining high digital resolution is to resolve peak overlap, especially in NOESY spectra with thousands of signals where the signal analysis needs to be performed on a large scale. Achieving maximum digital resolution is usually limited by the practically available measurement time. We developed a method utilizing non-uniform sampling for balancing digital resolution and signal sensitivity, and performed a large-scale analysis of the effect of the digital resolution on the accuracy of the resulting protein structures. Structure calculations were performed as a function of digital resolution for about 400 proteins with molecular sizes ranging between 5 and 33 kDa. The structural accuracy was assessed by atomic coordinate RMSD values from the reference structures of the proteins. In addition, we monitored also the number of assigned NOESY cross peaks, the average signal sensitivity, and the chemical shift spectral overlap. We show that high resolution is equally important for proteins of every molecular size. The chemical shift spectral overlap depends strongly on the corresponding spectral digital resolution. Thus, knowing the extent of overlap can be a predictor of the resulting structural accuracy. Our results show that for every molecular size a minimal digital resolution, corresponding to the natural linewidth, needs to be achieved for obtaining the highest accuracy possible for the given protein size using state-of-the-art automated NOESY assignment and structure calculation methods.

  2. Structural and Function Prediction of Musa acuminata subsp. Malaccensis Protein

    Directory of Open Access Journals (Sweden)

    Anum Munir

    2016-03-01

    Full Text Available Hypothetical proteins (HPs are the proteins whose presence has been anticipated, yet in vivo function has not been built up. Illustrating the structural and functional privileged insights of these HPs might likewise prompt a superior comprehension of the protein-protein associations or networks in diverse types of life. Bananas (Musa acuminata spp., including sweet and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister grouped to the all-around considered Poales, which incorporate oats. Bananas are crucial for nourishment security in numerous tropical and subtropical nations and the most prominent organic product in industrialized nations. In the present study, the hypothetical protein of M. acuminata (Banana was chosen for analysis and modeling by distinctive bioinformatics apparatuses and databases. As indicated by primary and secondary structure analysis, XP_009393594.1 is a stable hydrophobic protein containing a noteworthy extent of α-helices; Homology modeling was done utilizing SWISS-MODEL server where the templates identity with XP_009393594.1 protein was less which demonstrated novelty of our protein. Ab initio strategy was conducted to produce its 3D structure. A few evaluations of quality assessment and validation parameters determined the generated protein model as stable with genuinely great quality. Functional analysis was completed by ProtFun 2.2, and KEGG (KAAS, recommended that the hypothetical protein is a transcription factor with cytoplasmic domain as zinc finger. The protein was observed to be vital for translation process, involved in metabolism, signaling and cellular processes, genetic information processing and Zinc ion binding. It is suggested that further test approval would help to anticipate the structures and functions of other uncharacterized proteins of different plants and living being.

  3. Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure.

    Science.gov (United States)

    Foroughi Pour, Ali; Dalton, Lori A

    2018-03-21

    Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.

  4. Structural studies of human glioma pathogenesis-related protein 1

    Energy Technology Data Exchange (ETDEWEB)

    Asojo, Oluwatoyin A., E-mail: oasojo@unmc.edu [College of Medicine, Nebraska Medical Center, Omaha, NE 68198-6495 (United States); Koski, Raymond A.; Bonafé, Nathalie [L2 Diagnostics LLC, 300 George Street, New Haven, CT 06511 (United States); College of Medicine, Nebraska Medical Center, Omaha, NE 68198-6495 (United States)

    2011-10-01

    Structural analysis of a truncated soluble domain of human glioma pathogenesis-related protein 1, a membrane protein implicated in the proliferation of aggressive brain cancer, is presented. Human glioma pathogenesis-related protein 1 (GLIPR1) is a membrane protein that is highly upregulated in brain cancers but is barely detectable in normal brain tissue. GLIPR1 is composed of a signal peptide that directs its secretion, a conserved cysteine-rich CAP (cysteine-rich secretory proteins, antigen 5 and pathogenesis-related 1 proteins) domain and a transmembrane domain. GLIPR1 is currently being investigated as a candidate for prostate cancer gene therapy and for glioblastoma targeted therapy. Crystal structures of a truncated soluble domain of the human GLIPR1 protein (sGLIPR1) solved by molecular replacement using a truncated polyalanine search model of the CAP domain of stecrisp, a snake-venom cysteine-rich secretory protein (CRISP), are presented. The correct molecular-replacement solution could only be obtained by removing all loops from the search model. The native structure was refined to 1.85 Å resolution and that of a Zn{sup 2+} complex was refined to 2.2 Å resolution. The latter structure revealed that the putative binding cavity coordinates Zn{sup 2+} similarly to snake-venom CRISPs, which are involved in Zn{sup 2+}-dependent mechanisms of inflammatory modulation. Both sGLIPR1 structures have extensive flexible loop/turn regions and unique charge distributions that were not observed in any of the previously reported CAP protein structures. A model is also proposed for the structure of full-length membrane-bound GLIPR1.

  5. Structure and function of nanoparticle-protein conjugates

    International Nuclear Information System (INIS)

    Aubin-Tam, M-E; Hamad-Schifferli, K

    2008-01-01

    Conjugation of proteins to nanoparticles has numerous applications in sensing, imaging, delivery, catalysis, therapy and control of protein structure and activity. Therefore, characterizing the nanoparticle-protein interface is of great importance. A variety of covalent and non-covalent linking chemistries have been reported for nanoparticle attachment. Site-specific labeling is desirable in order to control the protein orientation on the nanoparticle, which is crucial in many applications such as fluorescence resonance energy transfer. We evaluate methods for successful site-specific attachment. Typically, a specific protein residue is linked directly to the nanoparticle core or to the ligand. As conjugation often affects the protein structure and function, techniques to probe structure and activity are assessed. We also examine how molecular dynamics simulations of conjugates would complete those experimental techniques in order to provide atomistic details on the effect of nanoparticle attachment. Characterization studies of nanoparticle-protein complexes show that the structure and function are influenced by the chemistry of the nanoparticle ligand, the nanoparticle size, the nanoparticle material, the stoichiometry of the conjugates, the labeling site on the protein and the nature of the linkage (covalent versus non-covalent)

  6. Pushing the frontiers of atomic models for protein tertiary structure ...

    Indian Academy of Sciences (India)

    as an NP complete or NP hard problem.4,5 This notwith- standing, the dire need for tertiary structures of proteins in drug discovery and other areas6–8 has propelled the development of a multitude of computational recipes. In this article, we focus on ab initio/de novo strategies,. Bhageerath in particular, for protein tertiary ...

  7. Computing a new family of shape descriptors for protein structures

    DEFF Research Database (Denmark)

    Røgen, Peter; Sinclair, Robert

    2003-01-01

    The large-scale 3D structure of a protein can be represented by the polygonal curve through the carbon a atoms of the protein backbone. We introduce an algorithm for computing the average number of times that a given configuration of crossings on such polygonal curves is seen, the average being...

  8. Simulation of Protein Structure, Dynamics and Function in Organic Media

    National Research Council Canada - National Science Library

    Daggett, Valerie

    1998-01-01

    The overall goal of our ONR-sponsored research is to pursue realistic molecular modeling strudies pertinnent to the related properties of protein stability, dynamics, structure, function, and folding in aqueous solution...

  9. Protein structure estimation from NMR data by matrix completion.

    Science.gov (United States)

    Li, Zhicheng; Li, Yang; Lei, Qiang; Zhao, Qing

    2017-09-01

    Knowledge of protein structures is very important to understand their corresponding physical and chemical properties. Nuclear Magnetic Resonance (NMR) spectroscopy is one of the main methods to measure protein structure. In this paper, we propose a two-stage approach to calculate the structure of a protein from a highly incomplete distance matrix, where most data are obtained from NMR. We first randomly "guess" a small part of unobservable distances by utilizing the triangle inequality, which is crucial for the second stage. Then we use matrix completion to calculate the protein structure from the obtained incomplete distance matrix. We apply the accelerated proximal gradient algorithm to solve the corresponding optimization problem. Furthermore, the recovery error of our method is analyzed, and its efficiency is demonstrated by several practical examples.

  10. Modeling membrane protein structure through site-directed ESR spectroscopy

    NARCIS (Netherlands)

    Kavalenka, A.A.

    2009-01-01

    Site-directed spin labeling (SDSL) electron spin resonance (ESR) spectroscopy is a
    relatively new biophysical tool for obtaining structural information about proteins. This
    thesis presents a novel approach, based on powerful spectral analysis techniques (multicomponent
    spectral

  11. Fast iodide-SAD phasing for high-throughput membrane protein structure determination.

    Science.gov (United States)

    Melnikov, Igor; Polovinkin, Vitaly; Kovalev, Kirill; Gushchin, Ivan; Shevtsov, Mikhail; Shevchenko, Vitaly; Mishin, Alexey; Alekseev, Alexey; Rodriguez-Valera, Francisco; Borshchevskiy, Valentin; Cherezov, Vadim; Leonard, Gordon A; Gordeliy, Valentin; Popov, Alexander

    2017-05-01

    We describe a fast, easy, and potentially universal method for the de novo solution of the crystal structures of membrane proteins via iodide-single-wavelength anomalous diffraction (I-SAD). The potential universality of the method is based on a common feature of membrane proteins-the availability at the hydrophobic-hydrophilic interface of positively charged amino acid residues with which iodide strongly interacts. We demonstrate the solution using I-SAD of four crystal structures representing different classes of membrane proteins, including a human G protein-coupled receptor (GPCR), and we show that I-SAD can be applied using data collection strategies based on either standard or serial x-ray crystallography techniques.

  12. Computational design of proteins with novel structure and functions

    International Nuclear Information System (INIS)

    Yang Wei; Lai Lu-Hua

    2016-01-01

    Computational design of proteins is a relatively new field, where scientists search the enormous sequence space for sequences that can fold into desired structure and perform desired functions. With the computational approach, proteins can be designed, for example, as regulators of biological processes, novel enzymes, or as biotherapeutics. These approaches not only provide valuable information for understanding of sequence–structure–function relations in proteins, but also hold promise for applications to protein engineering and biomedical research. In this review, we briefly introduce the rationale for computational protein design, then summarize the recent progress in this field, including de novo protein design, enzyme design, and design of protein–protein interactions. Challenges and future prospects of this field are also discussed. (topical review)

  13. Potato leafroll virus structural proteins manipulate overlapping, yet distinct protein interaction networks during infection.

    Science.gov (United States)

    DeBlasio, Stacy L; Johnson, Richard; Sweeney, Michelle M; Karasev, Alexander; Gray, Stewart M; MacCoss, Michael J; Cilia, Michelle

    2015-06-01

    Potato leafroll virus (PLRV) produces a readthrough protein (RTP) via translational readthrough of the coat protein amber stop codon. The RTP functions as a structural component of the virion and as a nonincorporated protein in concert with numerous insect and plant proteins to regulate virus movement/transmission and tissue tropism. Affinity purification coupled to quantitative MS was used to generate protein interaction networks for a PLRV mutant that is unable to produce the read through domain (RTD) and compared to the known wild-type PLRV protein interaction network. By quantifying differences in the protein interaction networks, we identified four distinct classes of PLRV-plant interactions: those plant and nonstructural viral proteins interacting with assembled coat protein (category I); plant proteins in complex with both coat protein and RTD (category II); plant proteins in complex with the RTD (category III); and plant proteins that had higher affinity for virions lacking the RTD (category IV). Proteins identified as interacting with the RTD are potential candidates for regulating viral processes that are mediated by the RTP such as phloem retention and systemic movement and can potentially be useful targets for the development of strategies to prevent infection and/or viral transmission of Luteoviridae species that infect important crop species. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Binding free energy analysis of protein-protein docking model structures by evERdock.

    Science.gov (United States)

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-14

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  15. Constraining cyclic peptides to mimic protein structure motifs

    DEFF Research Database (Denmark)

    Hill, Timothy A.; Shepherd, Nicholas E.; Diness, Frederik

    2014-01-01

    peptides can have protein-like biological activities and potencies, enabling their uses as biological probes and leads to therapeutics, diagnostics and vaccines. This Review highlights examples of cyclic peptides that mimic three-dimensional structures of strand, turn or helical segments of peptides...... and proteins, and identifies some additional restraints incorporated into natural product cyclic peptides and synthetic macrocyclic pepti-domimetics that refine peptide structure and confer biological properties....

  16. Overcoming bottlenecks in the membrane protein structural biology pipeline.

    Science.gov (United States)

    Hardy, David; Bill, Roslyn M; Jawhari, Anass; Rothnie, Alice J

    2016-06-15

    Membrane proteins account for a third of the eukaryotic proteome, but are greatly under-represented in the Protein Data Bank. Unfortunately, recent technological advances in X-ray crystallography and EM cannot account for the poor solubility and stability of membrane protein samples. A limitation of conventional detergent-based methods is that detergent molecules destabilize membrane proteins, leading to their aggregation. The use of orthologues, mutants and fusion tags has helped improve protein stability, but at the expense of not working with the sequence of interest. Novel detergents such as glucose neopentyl glycol (GNG), maltose neopentyl glycol (MNG) and calixarene-based detergents can improve protein stability without compromising their solubilizing properties. Styrene maleic acid lipid particles (SMALPs) focus on retaining the native lipid bilayer of a membrane protein during purification and biophysical analysis. Overcoming bottlenecks in the membrane protein structural biology pipeline, primarily by maintaining protein stability, will facilitate the elucidation of many more membrane protein structures in the near future. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  17. Illuminating structural proteins in viral "dark matter" with metaproteomics.

    Science.gov (United States)

    Brum, Jennifer R; Ignacio-Espinoza, J Cesar; Kim, Eun-Hae; Trubl, Gareth; Jones, Robert M; Roux, Simon; VerBerkmoes, Nathan C; Rich, Virginia I; Sullivan, Matthew B

    2016-03-01

    Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional "viral dark matter." Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional dark matter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over one-half of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore, four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world's oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Together, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.

  18. Functional diversification of structurally alike NLR proteins in plants.

    Science.gov (United States)

    Chakraborty, Joydeep; Jain, Akansha; Mukherjee, Dibya; Ghosh, Suchismita; Das, Sampa

    2018-04-01

    In due course of evolution many pathogens alter their effector molecules to modulate the host plants' metabolism and immune responses triggered upon proper recognition by the intracellular nucleotide-binding oligomerization domain containing leucine-rich repeat (NLR) proteins. Likewise, host plants have also evolved with diversified NLR proteins as a survival strategy to win the battle against pathogen invasion. NLR protein indeed detects pathogen derived effector proteins leading to the activation of defense responses associated with programmed cell death (PCD). In this interactive process, genome structure and plasticity play pivotal role in the development of innate immunity. Despite being quite conserved with similar biological functions in all eukaryotes, the intracellular NLR immune receptor proteins happen to be structurally distinct. Recent studies have made progress in identifying transcriptional regulatory complexes activated by NLR proteins. In this review, we attempt to decipher the intracellular NLR proteins mediated surveillance across the evolutionarily diverse taxa, highlighting some of the recent updates on NLR protein compartmentalization, molecular interactions before and after activation along with insights into the finer role of these receptor proteins to combat invading pathogens upon their recognition. Latest information on NLR sensors, helpers and NLR proteins with integrated domains in the context of plant pathogen interactions are also discussed. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Combining neural networks for protein secondary structure prediction

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric

    1995-01-01

    In this paper structured neural networks are applied to the problem of predicting the secondary structure of proteins. A hierarchical approach is used where specialized neural networks are designed for each structural class and then combined using another neural network. The submodels are designed...... by using a priori knowledge of the mapping between protein building blocks and the secondary structure and by using weight sharing. Since none of the individual networks have more than 600 adjustable weights over-fitting is avoided. When ensembles of specialized experts are combined the performance...

  20. A generative, probabilistic model of local protein structure

    DEFF Research Database (Denmark)

    Boomsma, Wouter; Mardia, Kanti V.; Taylor, Charles C.

    2008-01-01

    Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative...... conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state...

  1. Predicting protein complexes using a supervised learning method combined with local structural information.

    Science.gov (United States)

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  2. Photonic crystals based on opals and inverse opals: synthesis and structural features

    International Nuclear Information System (INIS)

    Klimonsky, S O; Abramova, Vera V; Sinitskii, Alexander S; Tretyakov, Yuri D

    2011-01-01

    Methods of synthesis of photonic crystals based on opals and inverse opals are considered. Their structural features are discussed. Data on different types of structural defects and their influence on the optical properties of opaline materials are systematized. The possibilities of investigation of structural defects by optical spectroscopy, electron microscopy, microradian X-ray diffraction, laser diffraction and using an analysis of Kossel ring patterns are described. The bibliography includes 253 references.

  3. Structural and compositional features of high-rise buildings: experimental design in Yekaterinburg

    Science.gov (United States)

    Yankovskaya, Yulia; Lobanov, Yuriy; Temnov, Vladimir

    2018-03-01

    The study looks at the specifics of high-rise development in Yekaterinburg. High-rise buildings are considered in the context of their historical development, structural features, compositional and imaginative design techniques. Experience of Yekaterinburg architects in experimental design is considered and analyzed. Main issues and prospects of high-rise development within the Yekaterinburg structure are studied. The most interesting and significant conceptual approaches to the structural and compositional arrangement of high-rise buildings are discussed.

  4. A Feature-Based Structural Measure: An Image Similarity Measure for Face Recognition

    Directory of Open Access Journals (Sweden)

    Noor Abdalrazak Shnain

    2017-08-01

    Full Text Available Facial recognition is one of the most challenging and interesting problems within the field of computer vision and pattern recognition. During the last few years, it has gained special attention due to its importance in relation to current issues such as security, surveillance systems and forensics analysis. Despite this high level of attention to facial recognition, the success is still limited by certain conditions; there is no method which gives reliable results in all situations. In this paper, we propose an efficient similarity index that resolves the shortcomings of the existing measures of feature and structural similarity. This measure, called the Feature-Based Structural Measure (FSM, combines the best features of the well-known SSIM (structural similarity index measure and FSIM (feature similarity index measure approaches, striking a balance between performance for similar and dissimilar images of human faces. In addition to the statistical structural properties provided by SSIM, edge detection is incorporated in FSM as a distinctive structural feature. Its performance is tested for a wide range of PSNR (peak signal-to-noise ratio, using ORL (Olivetti Research Laboratory, now AT&T Laboratory Cambridge and FEI (Faculty of Industrial Engineering, São Bernardo do Campo, São Paulo, Brazil databases. The proposed measure is tested under conditions of Gaussian noise; simulation results show that the proposed FSM outperforms the well-known SSIM and FSIM approaches in its efficiency of similarity detection and recognition of human faces.

  5. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.

    Science.gov (United States)

    Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M

    2014-01-30

    RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.

  6. Common Features in Electronic Structure of the Oxypnictide Superconductors from Photoemission Spectroscopy

    International Nuclear Information System (INIS)

    Xiao-Wen, Jia; Hai-Yun, Liu; Wen-Tao, Zhang; Lin, Zhao; Jian-Qiao, Meng; Guo-Dong, Liu; Xiao-Li, Dong; Zhi-An, Ren; Wei, Yi; Guang-Can, Che; Zhong-Xian, Zhao; Gang, Wu; Rong-Hua, Liu; Xian-Hui, Chen; Gen-Fu, Chen; Nan-Lin, Wang; Yong, Zhu; Xiao-Yang, Wang; Gui-Ling, Wang; Yong, Zhou

    2008-01-01

    High resolution photoemission measurements are carried out on non-superconducting LaFeAsO parent compound and various superconducting RFeAs(O 1-x F x ) (R=La, Ce and Pr) compounds. It is found that the parent LaFeAsO compound shows a metallic character. By extensive measurements, several common features are identified in the electronic structure of these Fe-based compounds: (1) 0.2 eV feature in the valence band, (2) a universal 13-16 meV feature, (3) near Ef spectral weight suppression with decreasing temperature. These universal features can provide important information about band structure, superconducting gap and pseudogap in these Fe-based materials

  7. Sequential Release of Proteins from Structured Multishell Microcapsules.

    Science.gov (United States)

    Shimanovich, Ulyana; Michaels, Thomas C T; De Genst, Erwin; Matak-Vinkovic, Dijana; Dobson, Christopher M; Knowles, Tuomas P J

    2017-10-09

    In nature, a wide range of functional materials is based on proteins. Increasing attention is also turning to the use of proteins as artificial biomaterials in the form of films, gels, particles, and fibrils that offer great potential for applications in areas ranging from molecular medicine to materials science. To date, however, most such applications have been limited to single component materials despite the fact that their natural analogues are composed of multiple types of proteins with a variety of functionalities that are coassembled in a highly organized manner on the micrometer scale, a process that is currently challenging to achieve in the laboratory. Here, we demonstrate the fabrication of multicomponent protein microcapsules where the different components are positioned in a controlled manner. We use molecular self-assembly to generate multicomponent structures on the nanometer scale and droplet microfluidics to bring together the different components on the micrometer scale. Using this approach, we synthesize a wide range of multiprotein microcapsules containing three well-characterized proteins: glucagon, insulin, and lysozyme. The localization of each protein component in multishell microcapsules has been detected by labeling protein molecules with different fluorophores, and the final three-dimensional microcapsule structure has been resolved by using confocal microscopy together with image analysis techniques. In addition, we show that these structures can be used to tailor the release of such functional proteins in a sequential manner. Moreover, our observations demonstrate that the protein release mechanism from multishell capsules is driven by the kinetic control of mass transport of the cargo and by the dissolution of the shells. The ability to generate artificial materials that incorporate a variety of different proteins with distinct functionalities increases the breadth of the potential applications of artificial protein-based materials

  8. Structuring oil by protein building blocks

    NARCIS (Netherlands)

    Vries, de Auke

    2017-01-01

    Over the recent years, structuring of oil into ‘organogels’ or ‘oleogels’ has gained much attention amongst colloid-, material,- and food scientists. Potentially, these oleogels could be used as an alternative for saturated- and trans fats in food products. To develop oleogels as a

  9. Phosphorylation variation during the cell cycle scales with structural propensities of proteins.

    Directory of Open Access Journals (Sweden)

    Stefka Tyanova

    Full Text Available Phosphorylation at specific residues can activate a protein, lead to its localization to particular compartments, be a trigger for protein degradation and fulfill many other biological functions. Protein phosphorylation is increasingly being studied at a large scale and in a quantitative manner that includes a temporal dimension. By contrast, structural properties of identified phosphorylation sites have so far been investigated in a static, non-quantitative way. Here we combine for the first time dynamic properties of the phosphoproteome with protein structural features. At six time points of the cell division cycle we investigate how the variation of the amount of phosphorylation correlates with the protein structure in the vicinity of the modified site. We find two distinct phosphorylation site groups: intrinsically disordered regions tend to contain sites with dynamically varying levels, whereas regions with predominantly regular secondary structures retain more constant phosphorylation levels. The two groups show preferences for different amino acids in their kinase recognition motifs - proline and other disorder-associated residues are enriched in the former group and charged residues in the latter. Furthermore, these preferences scale with the degree of disorderedness, from regular to irregular and to disordered structures. Our results suggest that the structural organization of the region in which a phosphorylation site resides may serve as an additional control mechanism. They also imply that phosphorylation sites are associated with different time scales that serve different functional needs.

  10. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis.

    Science.gov (United States)

    Sun, Wenqing; Zheng, Bin; Qian, Wei

    2017-10-01

    This study aimed to analyze the ability of extracting automatically generated features using deep structured algorithms in lung nodule CT image diagnosis, and compare its performance with traditional computer aided diagnosis (CADx) systems using hand-crafted features. All of the 1018 cases were acquired from Lung Image Database Consortium (LIDC) public lung cancer database. The nodules were segmented according to four radiologists' markings, and 13,668 samples were generated by rotating every slice of nodule images. Three multichannel ROI based deep structured algorithms were designed and implemented in this study: convolutional neural network (CNN), deep belief network (DBN), and stacked denoising autoencoder (SDAE). For the comparison purpose, we also implemented a CADx system using hand-crafted features including density features, texture features and morphological features. The performance of every scheme was evaluated by using a 10-fold cross-validation method and an assessment index of the area under the receiver operating characteristic curve (AUC). The observed highest area under the curve (AUC) was 0.899±0.018 achieved by CNN, which was significantly higher than traditional CADx with the AUC=0.848±0.026. The results from DBN was also slightly higher than CADx, while SDAE was slightly lower. By visualizing the automatic generated features, we found some meaningful detectors like curvy stroke detectors from deep structured schemes. The study results showed the deep structured algorithms with automatically generated features can achieve desirable performance in lung nodule diagnosis. With well-tuned parameters and large enough dataset, the deep learning algorithms can have better performance than current popular CADx. We believe the deep learning algorithms with similar data preprocessing procedure can be used in other medical image analysis areas as well. Copyright © 2017. Published by Elsevier Ltd.

  11. Mass Spectrometry Coupled Experiments and Protein Structure Modeling Methods

    Directory of Open Access Journals (Sweden)

    Lee Sael

    2013-10-01

    Full Text Available With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.

  12. Chaperonin Structure - The Large Multi-Subunit Protein Complex

    Directory of Open Access Journals (Sweden)

    Irena Roterman

    2009-03-01

    Full Text Available The multi sub-unit protein structure representing the chaperonins group is analyzed with respect to its hydrophobicity distribution. The proteins of this group assist protein folding supported by ATP. The specific axial symmetry GroEL structure (two rings of seven units stacked back to back - 524 aa each and the GroES (single ring of seven units - 97 aa each polypeptide chains are analyzed using the hydrophobicity distribution expressed as excess/deficiency all over the molecule to search for structure-to-function relationships. The empirically observed distribution of hydrophobic residues is confronted with the theoretical one representing the idealized hydrophobic core with hydrophilic residues exposure on the surface. The observed discrepancy between these two distributions seems to be aim-oriented, determining the structure-to-function relation. The hydrophobic force field structure generated by the chaperonin capsule is presented. Its possible influence on substrate folding is suggested.

  13. In Silico Analysis of the Structural and Biochemical Features of the NMD Factor UPF1 in Ustilago maydis.

    Directory of Open Access Journals (Sweden)

    Nancy Martínez-Montiel

    Full Text Available The molecular mechanisms regulating the accuracy of gene expression are still not fully understood. Among these mechanisms, Nonsense-mediated Decay (NMD is a quality control process that detects post-transcriptionally abnormal transcripts and leads them to degradation. The UPF1 protein lays at the heart of NMD as shown by several structural and functional features reported for this factor mainly for Homo sapiens and Saccharomyces cerevisiae. This process is highly conserved in eukaryotes but functional diversity can be observed in various species. Ustilago maydis is a basidiomycete and the best-known smut, which has become a model to study molecular and cellular eukaryotic mechanisms. In this study, we performed in silico analysis to investigate the structural and biochemical properties of the putative UPF1 homolog in Ustilago maydis. The putative homolog for UPF1 was recognized in the annotated genome for the basidiomycete, exhibiting 66% identity with its human counterpart at the protein level. The known structural and functional domains characteristic of UPF1 homologs were also found. Based on the crystal structures available for UPF1, we constructed different three-dimensional models for umUPF1 in order to analyze the secondary and tertiary structural features of this factor. Using these models, we studied the spatial arrangement of umUPF1 and its capability to interact with UPF2. Moreover, we identified the critical amino acids that mediate the interaction of umUPF1 with UPF2, ATP, RNA and with UPF1 itself. Mutating these amino acids in silico showed an important effect over the native structure. Finally, we performed molecular dynamic simulations for UPF1 proteins from H. sapiens and U. maydis and the results obtained show a similar behavior and physicochemical properties for the protein in both organisms. Overall, our results indicate that the putative UPF1 identified in U. maydis shows a very similar sequence, structural organization

  14. In Silico Analysis of the Structural and Biochemical Features of the NMD Factor UPF1 in Ustilago maydis.

    Science.gov (United States)

    Martínez-Montiel, Nancy; Morales-Lara, Laura; Hernández-Pérez, Julio M; Martínez-Contreras, Rebeca D

    2016-01-01

    The molecular mechanisms regulating the accuracy of gene expression are still not fully understood. Among these mechanisms, Nonsense-mediated Decay (NMD) is a quality control process that detects post-transcriptionally abnormal transcripts and leads them to degradation. The UPF1 protein lays at the heart of NMD as shown by several structural and functional features reported for this factor mainly for Homo sapiens and Saccharomyces cerevisiae. This process is highly conserved in eukaryotes but functional diversity can be observed in various species. Ustilago maydis is a basidiomycete and the best-known smut, which has become a model to study molecular and cellular eukaryotic mechanisms. In this study, we performed in silico analysis to investigate the structural and biochemical properties of the putative UPF1 homolog in Ustilago maydis. The putative homolog for UPF1 was recognized in the annotated genome for the basidiomycete, exhibiting 66% identity with its human counterpart at the protein level. The known structural and functional domains characteristic of UPF1 homologs were also found. Based on the crystal structures available for UPF1, we constructed different three-dimensional models for umUPF1 in order to analyze the secondary and tertiary structural features of this factor. Using these models, we studied the spatial arrangement of umUPF1 and its capability to interact with UPF2. Moreover, we identified the critical amino acids that mediate the interaction of umUPF1 with UPF2, ATP, RNA and with UPF1 itself. Mutating these amino acids in silico showed an important effect over the native structure. Finally, we performed molecular dynamic simulations for UPF1 proteins from H. sapiens and U. maydis and the results obtained show a similar behavior and physicochemical properties for the protein in both organisms. Overall, our results indicate that the putative UPF1 identified in U. maydis shows a very similar sequence, structural organization, mechanical stability

  15. NMR structural studies of peptides and proteins in membranes

    Energy Technology Data Exchange (ETDEWEB)

    Opella, S J [Pennsylvania Univ., Philadelphia, PA (United States). Dept. of Chemistry

    1994-12-31

    The use of NMR methodology in structural studies is described as applicable to larger proteins, considering that the majority of membrane proteins is constructed from a limited repertoire of structural and dynamic elements. The membrane associated domains of these proteins are made up of long hydrophobic membrane spanning helices, shorter amphipathic bridging helices in the plane of the bilayer, connecting loops with varying degrees of mobility, and mobile N- and C- terminal sections. NMR studies have been successful in identifying all of these elements and their orientations relative to each other and the membrane bilayer 19 refs., 9 figs.

  16. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Mining protein loops using a structural alphabet and statistical exceptionality

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2010-02-01

    Full Text Available Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times. Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words. These structural words have low structural variability (mean RMSd of 0.85 Å. As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues and long loops. Moreover, half of

  18. Mining protein loops using a structural alphabet and statistical exceptionality.

    Science.gov (United States)

    Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude

    2010-02-04

    Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of

  19. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

    Directory of Open Access Journals (Sweden)

    Martin Juliette

    2011-06-01

    Full Text Available Abstract Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet, which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i ubiquitous motifs, shared by several superfamilies and (ii superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

  20. Structural protein relationships among eastern equine encephalitis viruses.

    Science.gov (United States)

    Strizki, J M; Repik, P M

    1994-11-01

    We have re-evaluated the relationships among the polypeptides of eastern equine encephalitis (EEE) viruses using SDS-PAGE and peptide mapping of individual virion proteins. Four to five distinct polypeptide bands were detected upon SDS-PAGE analysis of viruses: the E1, E2 and C proteins normally associated with alphavirus virions, as well as an additional more rapidly-migrating E2-associated protein and a high M(r) (HMW) protein. In contrast with previous findings by others, the electrophoretic profiles of the virion proteins of EEE viruses displayed a marked correlation with serotype. The protein profiles of the 33 North American (NA)-serotype viruses examined were remarkably homogeneous, with variation detected only in the E1 protein of two isolates. In contrast, considerable heterogeneity was observed in the migration profiles of both the E1 and E2 glycoproteins of the 13 South American (SA)-type viruses examined. Peptide mapping of individual virion proteins using limited proteolysis with Staphylococcus aureus V8 protease confirmed that, in addition to the homogeneity evident among NA-type viruses and relative heterogeneity among SA-type viruses, the E1 and E2 proteins of NA- and SA-serotype viruses exhibited serotype-specific structural variation. The C protein was highly conserved among isolates of both virus serotypes. Endoglycosidase analyses of intact virions did not reveal substantial glycosylation differences between the glycoproteins of NA- and SA-serotype viruses. Both the HMW protein and the E2 protein (doublet) of EEE virus appeared to contain, at least in part, high-mannose type N-linked oligosaccharides. No evidence of O-linked glycans was found on either the E1 or the E2 glycoprotein. Despite the observed structural differences between proteins of NA- and SA-type viruses, Western blot analyses utilizing polyclonal antibodies indicated that immunoreactive epitopes appeared to be conserved.

  1. Patchwork structure-function analysis of the Sendai virus matrix protein.

    Science.gov (United States)

    Mottet-Osman, Geneviève; Miazza, Vincent; Vidalain, Pierre-Olivier; Roux, Laurent

    2014-09-01

    Paramyxoviruses contain a bi-lipidic envelope decorated by two transmembrane glycoproteins and carpeted on the inner surface with a layer of matrix proteins (M), thought to bridge the glycoproteins with the viral nucleocapsids. To characterize M structure-function features, a set of M domains were mutated or deleted. The genes encoding these modified M were incorporated into recombinant Sendai viruses and expressed as supplemental proteins. Using a method of integrated suppression complementation system (ISCS), the functions of these M mutants were analyzed in the context of the infection. Cellular membrane association, localization at the cell periphery, nucleocapsid binding, cellular protein interactions and promotion of viral particle formation were characterized in relation with the mutations. At the end, lack of nucleocapsid binding go together with lack of cell surface localization and both features definitely correlate with loss of M global function estimated by viral particle production. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. The Relationship Between Low-Frequency Motions and Community Structure of Residue Network in Protein Molecules.

    Science.gov (United States)

    Sun, Weitao

    2018-01-01

    The global shape of a protein molecule is believed to be dominant in determining low-frequency deformational motions. However, how structure dynamics relies on residue interactions remains largely unknown. The global residue community structure and the local residue interactions are two important coexisting factors imposing significant effects on low-frequency normal modes. In this work, an algorithm for community structure partition is proposed by integrating Miyazawa-Jernigan empirical potential energy as edge weight. A sensitivity parameter is defined to measure the effect of local residue interaction on low-frequency movement. We show that community structure is a more fundamental feature of residue contact networks. Moreover, we surprisingly find that low-frequency normal mode eigenvectors are sensitive to some local critical residue interaction pairs (CRIPs). A fair amount of CRIPs act as bridges and hold distributed structure components into a unified tertiary structure by bonding nearby communities. Community structure analysis and CRIP detection of 116 catalytic proteins reveal that breaking up of a CRIP can cause low-frequency allosteric movement of a residue at the far side of protein structure. The results imply that community structure and CRIP may be the structural basis for low-frequency motions.

  3. A computer graphics program system for protein structure representation.

    Science.gov (United States)

    Ross, A M; Golub, E E

    1988-01-01

    We have developed a computer graphics program system for the schematic representation of several protein secondary structure analysis algorithms. The programs calculate the probability of occurrence of alpha-helix, beta-sheet and beta-turns by the method of Chou and Fasman and assign unique predicted structure to each residue using a novel conflict resolution algorithm based on maximum likelihood. A detailed structure map containing secondary structure, hydrophobicity, sequence identity, sequence numbering and the location of putative N-linked glycosylation sites is then produced. In addition, helical wheel diagrams and hydrophobic moment calculations can be performed to further analyze the properties of selected regions of the sequence. As they require only structure specification as input, the graphics programs can easily be adapted for use with other secondary structure prediction schemes. The use of these programs to analyze protein structure-function relationships is described and evaluated. PMID:2832829

  4. Crystal structure of Homo sapiens protein LOC79017

    Energy Technology Data Exchange (ETDEWEB)

    Bae, Euiyoung; Bingman, Craig A.; Aceti, David J.; Phillips, Jr., George N. (UW)

    2010-02-08

    LOC79017 (MW 21.0 kDa, residues 1-188) was annotated as a hypothetical protein encoded by Homo sapiens chromosome 7 open reading frame 24. It was selected as a target by the Center for Eukaryotic Structural Genomics (CESG) because it did not share more than 30% sequence identity with any protein for which the three-dimensional structure is known. The biological function of the protein has not been established yet. Parts of LOC79017 were identified as members of uncharacterized Pfam families (residues 1-95 as PB006073 and residues 104-180 as PB031696). BLAST searches revealed homologues of LOC79017 in many eukaryotes, but none of them have been functionally characterized. Here, we report the crystal structure of H. sapiens protein LOC79017 (UniGene code Hs.530024, UniProt code O75223, CESG target number go.35223).

  5. Deprotonated imidodiphosphate in AMPPNP-containing protein structures

    International Nuclear Information System (INIS)

    Dauter, Miroslawa; Dauter, Zbigniew

    2011-01-01

    In certain AMPPNP-containing protein structures, the nitrogen bridging the two terminal phosphate groups can be deprotonated. Many different proteins utilize the chemical energy provided by the cofactor adenosine triphosphate (ATP) for their proper function. A number of structures in the Protein Data Bank (PDB) contain adenosine 5′-(β,γ-imido)triphosphate (AMPPNP), a nonhydrolysable analog of ATP in which the bridging O atom between the two terminal phosphate groups is substituted by the imido function. Under mild conditions imides do not have acidic properties and thus the imide nitrogen should be protonated. However, an analysis of protein structures containing AMPPNP reveals that the imide group is deprotonated in certain complexes if the negative charges of the phosphate moieties in AMPPNP are in part neutralized by coordinating divalent metals or a guanidinium group of an arginine

  6. EVA: continuous automatic evaluation of protein structure prediction servers.

    Science.gov (United States)

    Eyrich, V A; Martí-Renom, M A; Przybylski, D; Madhusudhan, M S; Fiser, A; Pazos, F; Valencia, A; Sali, A; Rost, B

    2001-12-01

    Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. http://cubic.bioc.columbia.edu/eva. eva@cubic.bioc.columbia.edu

  7. 3dRPC: a web server for 3D RNA-protein structure prediction.

    Science.gov (United States)

    Huang, Yangyu; Li, Haotian; Xiao, Yi

    2018-04-01

    RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.

  8. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    Directory of Open Access Journals (Sweden)

    Chi-Hua Tung

    2016-01-01

    Full Text Available Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  9. De novo protein structure generation from incomplete chemical shift assignments

    Energy Technology Data Exchange (ETDEWEB)

    Shen Yang [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States); Vernon, Robert; Baker, David [University of Washington, Department of Biochemistry and Howard Hughes Medical Institute (United States); Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)], E-mail: bax@nih.gov

    2009-02-15

    NMR chemical shifts provide important local structural information for proteins. Consistent structure generation from NMR chemical shift data has recently become feasible for proteins with sizes of up to 130 residues, and such structures are of a quality comparable to those obtained with the standard NMR protocol. This study investigates the influence of the completeness of chemical shift assignments on structures generated from chemical shifts. The Chemical-Shift-Rosetta (CS-Rosetta) protocol was used for de novo protein structure generation with various degrees of completeness of the chemical shift assignment, simulated by omission of entries in the experimental chemical shift data previously used for the initial demonstration of the CS-Rosetta approach. In addition, a new CS-Rosetta protocol is described that improves robustness of the method for proteins with missing or erroneous NMR chemical shift input data. This strategy, which uses traditional Rosetta for pre-filtering of the fragment selection process, is demonstrated for two paramagnetic proteins and also for two proteins with solid-state NMR chemical shift assignments.

  10. Blind Test of Physics-Based Prediction of Protein Structures

    Science.gov (United States)

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  11. Relationship between Molecular Structure Characteristics of Feed Proteins and Protein Digestibility and Solubility

    Directory of Open Access Journals (Sweden)

    Mingmei Bai

    2016-08-01

    Full Text Available The nutritional value of feed proteins and their utilization by livestock are related not only to the chemical composition but also to the structure of feed proteins, but few studies thus far have investigated the relationship between the structure of feed proteins and their solubility as well as digestibility in monogastric animals. To address this question we analyzed soybean meal, fish meal, corn distiller’s dried grains with solubles, corn gluten meal, and feather meal by Fourier transform infrared (FTIR spectroscopy to determine the protein molecular spectral band characteristics for amides I and II as well as α-helices and β-sheets and their ratios. Protein solubility and in vitro digestibility were measured with the Kjeldahl method using 0.2% KOH solution and the pepsin-pancreatin two-step enzymatic method, respectively. We found that all measured spectral band intensities (height and area of feed proteins were correlated with their the in vitro digestibility and solubility (p≤0.003; moreover, the relatively quantitative amounts of α-helices, random coils, and α-helix to β-sheet ratio in protein secondary structures were positively correlated with protein in vitro digestibility and solubility (p≤0.004. On the other hand, the percentage of β-sheet structures was negatively correlated with protein in vitro digestibility (p<0.001 and solubility (p = 0.002. These results demonstrate that the molecular structure characteristics of feed proteins are closely related to their in vitro digestibility at 28 h and solubility. Furthermore, the α-helix-to-β-sheet ratio can be used to predict the nutritional value of feed proteins.

  12. Structural and bioinformatic characterization of an Acinetobacter baumannii type II carrier protein

    International Nuclear Information System (INIS)

    Allen, C. Leigh; Gulick, Andrew M.

    2014-01-01

    The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented. Microorganisms produce a variety of natural products via secondary metabolic biosynthetic pathways. Two of these types of synthetic systems, the nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), use large modular enzymes containing multiple catalytic domains in a single protein. These multidomain enzymes use an integrated carrier protein domain to transport the growing, covalently bound natural product to the neighboring catalytic domains for each step in the synthesis. Interestingly, some PKS and NRPS clusters contain free-standing domains that interact intermolecularly with other proteins. Being expressed outside the architecture of a multi-domain protein, these so-called type II proteins present challenges to understand the precise role they play. Additional structures of individual and multi-domain components of the NRPS enzymes will therefore provide a better understanding of the features that govern the domain interactions in these interesting enzyme systems. The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented here. Comparison with the closest structural homologs of other carrier proteins identifies the requirements for a conserved glycine residue and additional important sequence and structural requirements within the regions that interact with partner proteins

  13. Structural and bioinformatic characterization of an Acinetobacter baumannii type II carrier protein

    Energy Technology Data Exchange (ETDEWEB)

    Allen, C. Leigh; Gulick, Andrew M., E-mail: gulick@hwi.buffalo.edu [University at Buffalo, Buffalo, NY 14203 (United States)

    2014-06-01

    The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented. Microorganisms produce a variety of natural products via secondary metabolic biosynthetic pathways. Two of these types of synthetic systems, the nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), use large modular enzymes containing multiple catalytic domains in a single protein. These multidomain enzymes use an integrated carrier protein domain to transport the growing, covalently bound natural product to the neighboring catalytic domains for each step in the synthesis. Interestingly, some PKS and NRPS clusters contain free-standing domains that interact intermolecularly with other proteins. Being expressed outside the architecture of a multi-domain protein, these so-called type II proteins present challenges to understand the precise role they play. Additional structures of individual and multi-domain components of the NRPS enzymes will therefore provide a better understanding of the features that govern the domain interactions in these interesting enzyme systems. The high-resolution crystal structure of a free-standing carrier protein from Acinetobacter baumannii that belongs to a larger NRPS-containing operon, encoded by the ABBFA-003406–ABBFA-003399 genes of A. baumannii strain AB307-0294, that has been implicated in A. baumannii motility, quorum sensing and biofilm formation, is presented here. Comparison with the closest structural homologs of other carrier proteins identifies the requirements for a conserved glycine residue and additional important sequence and structural requirements within the regions that interact with partner proteins.

  14. Characterization of structural proteins of hirame rhabdovirus, HRV

    Science.gov (United States)

    Nishizawa, Toyohiko; Yoshimizu, Mamoru; Winton, James; Ahne, Winfried; Kimura, Takahisa

    1991-01-01

    Structural proteins of hirame rhabdovirus (HRV) were analyzed by SDS-polyacrylarnide gel electrophoresis, western blotting, 2-dimensional gel electrophoresis, and Triton X-100 treatment. Purified HRV virions were composed of: polymerase (L), glycoprotein (G), nucleoprotein (N), and 2 matrix proteins (M1 and M2). Based upon their relative mobilities, the estimated molecular weights of the proteins were: L, 156 KDa; G, 68 KDa; N, 46.4 KDa; M1, 26.4 KDa; and M2, 19.9 KDa. The electrophorehc pattern formed by the structural proteins of HRV was clearly different from that formed by pike fry rhabdovirus, spring viremia of carp virus, eel virus of America, and eel virus European X which belong to the Vesiculovirus genus; however, it resembled the pattern formed by structural proteins of viral hemorrhagic septicemia virus (VHSV) and infectious hematopoietic necrosis virus (IHNV) which are members of the Lyssavirus genus. Among HRV, IHNV, and VHSV, differences were observed in the relative mobilities of the G, N, M1, and M2 proteins. Western blot analysis revealed that the G. N, and M2 proteins of HRV shared antigenic determinants with IHNV and VHSV, but not with any of the 4 fish vesiculoviruses tested. Cross-reactions between the M1 proteins of HRV, IHNV, or VHSV were not detected in this assay. Two-dimensional gel electrophoresis was used to show that HRV differed from IHNV or VHSV in the isoelectric point (PI) of the M1 and M2 proteins. In this system, 2 forms of the M1 protein of HRV and IHNV were observed.These subspecies of M1 had the same relative mobility but different p1 values. Treatment of purified virions with 2% Triton X-100 in Tris buffer containing NaCl removed the G, M1, and M2 proteins of IHNV, but HRV virions were more stable under these conditions.

  15. Conservation and divergence of C-terminal domain structure in the retinoblastoma protein family

    Energy Technology Data Exchange (ETDEWEB)

    Liban, Tyler J.; Medina, Edgar M.; Tripathi, Sarvind; Sengupta, Satyaki; Henry, R. William; Buchler, Nicolas E.; Rubin, Seth M. (UCSC); (Duke); (MSU)

    2017-04-24

    The retinoblastoma protein (Rb) and the homologous pocket proteins p107 and p130 negatively regulate cell proliferation by binding and inhibiting members of the E2F transcription factor family. The structural features that distinguish Rb from other pocket proteins have been unclear but are critical for understanding their functional diversity and determining why Rb has unique tumor suppressor activities. We describe here important differences in how the Rb and p107 C-terminal domains (CTDs) associate with the coiled-coil and marked-box domains (CMs) of E2Fs. We find that although CTD–CM binding is conserved across protein families, Rb and p107 CTDs show clear preferences for different E2Fs. A crystal structure of the p107 CTD bound to E2F5 and its dimer partner DP1 reveals the molecular basis for pocket protein–E2F binding specificity and how cyclin-dependent kinases differentially regulate pocket proteins through CTD phosphorylation. Our structural and biochemical data together with phylogenetic analyses of Rb and E2F proteins support the conclusion that Rb evolved specific structural motifs that confer its unique capacity to bind with high affinity those E2Fs that are the most potent activators of the cell cycle.

  16. Cold-set globular protein gels: Interactions, structure and rheology as a function of protein concentration.

    NARCIS (Netherlands)

    Alting, A.C.; Hamer, R.J.; Kruif, de C.G.

    2003-01-01

    We identified the contribution of covalent and noncovalent interactions to the scaling behavior of the structural and rheological properties in a cold gelling protein system. The system we studied consisted of two types of whey protein aggregates, equal in size but different in the amount of

  17. Features Of Household Lexics, Their Characteristics And Structural Analysis In The Modern English Language

    Directory of Open Access Journals (Sweden)

    Aygun Yusifova

    2014-04-01

    Full Text Available The present paper aims to analyze the most inherent features and characteristics of household lexis in English. Special emphasis has been placed on their names of the objects used in everyday life, kitchen utensils, animal and birds. Lexical units concerning ceremonies, habits and traditions are also among the scope of the paper. Moreover, the study deals with the structural features of the units under consideration. It is believed that the thematic-semantic characterization of every-day lexis can have both pedagogical and linguistic implications, especially when dealing with comparative structures.

  18. From the Protein's Perspective: The Benefits and Challenges of Protein Structure-Based Pharmacophore Modeling

    NARCIS (Netherlands)

    Sanders, M.P.A.; McGuire, R; Roumen, L.; de Esch, I.J.P.; de Vlieg, J; Klomp, J.P.G; de Graaf, C.

    2011-01-01

    A pharmacophore describes the arrangement of molecular features a ligand must contain to efficaciously bind a receptor. Pharmacophore models are developed to improve molecular understanding of ligand-protein interactions, and can be used as a tool to identify novel compounds that fulfil the

  19. Identification of structural domains in proteins by a graph heuristic

    NARCIS (Netherlands)

    Wernisch, Lorenz; Hunting, M.M.G.; Wodak, Shoshana J.

    1999-01-01

    A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain

  20. Connecting Protein Structure to Intermolecular Interactions: A Computer Modeling Laboratory

    Science.gov (United States)

    Abualia, Mohammed; Schroeder, Lianne; Garcia, Megan; Daubenmire, Patrick L.; Wink, Donald J.; Clark, Ginevra A.

    2016-01-01

    An understanding of protein folding relies on a solid foundation of a number of critical chemical concepts, such as molecular structure, intra-/intermolecular interactions, and relating structure to function. Recent reports show that students struggle on all levels to achieve these understandings and use them in meaningful ways. Further, several…

  1. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  2. Tchebichef image moment approach to the prediction of protein secondary structures based on circular dichroism.

    Science.gov (United States)

    Li, Sha Sha; Li, Bao Qiong; Liu, Jin Jin; Lu, Shao Hua; Zhai, Hong Lin

    2018-04-20

    Circular dichroism (CD) spectroscopy is a widely used technique for the evaluation of protein secondary structures that has a significant impact for the understanding of molecular biology. However, the quantitative analysis of protein secondary structures based on CD spectra is still a hard work due to the serious overlap of the spectra corresponding to different structural motifs. Here, Tchebichef image moment (TM) approach is introduced for the first time, which can effectively extract the chemical features in CD spectra for the quantitative analysis of protein secondary structures. The proposed approach was applied to analyze reference set. and the obtained results were evaluated by the strict statistical parameters such as correlation coefficient, cross-validation correlation coefficient and root mean squared error. Compared with several specialized prediction methods, TM approach provided satisfactory results, especially for turns and unordered structures. Our study indicates that TM approach can be regarded as a feasible tool for the analysis of the secondary structures of proteins based on CD spectra. An available TMs package is provided and can be used directly for secondary structures prediction. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.

  3. Evolutionary history, structural features and biochemical diversity of the NlpC/P60 superfamily of enzymes.

    Science.gov (United States)

    Anantharaman, Vivek; Aravind, L

    2003-01-01

    Peptidoglycan is hydrolyzed by a diverse set of enzymes during bacterial growth, development and cell division. The N1pC/P60 proteins define a family of cell-wall peptidases that are widely represented in various bacterial lineages. Currently characterized members are known to hydrolyze D-gamma-glutamyl-meso-diaminopimelate or N-acetylmuramate-L-alanine linkages. Detailed analysis of the N1pC/P60 peptidases showed that these proteins define a large superfamily encompassing several diverse groups of proteins. In addition to the well characterized P60-like proteins, this superfamily includes the AcmB/LytN and YaeF/YiiX families of bacterial proteins, the amidase domain of bacterial and kinetoplastid glutathionylspermidine synthases (GSPSs), and several proteins from eukaryotes, phages, poxviruses, positive-strand RNA viruses, and certain archaea. The eukaryotic members include lecithin retinol acyltransferase (LRAT), nematode developmental regulator Egl-26, and candidate tumor suppressor H-rev107. These eukaryotic proteins, along with the bacterial YaeF/poxviral G6R family, show a circular permutation of the catalytic domain. We identified three conserved residues, namely a cysteine, a histidine and a polar residue, that are involved in the catalytic activities of this superfamily. Evolutionary analysis of this superfamily shows that it comprises four major families, with diverse domain architectures in each of them. Several related, but distinct, catalytic activities, such as murein degradation, acyl transfer and amide hydrolysis, have emerged in the N1pC/P60 superfamily. The three conserved catalytic residues of this superfamily are shown to be equivalent to the catalytic triad of the papain-like thiol peptidases. The predicted structural features indicate that the N1pC/P60 enzymes contain a fold similar to the papain-like peptidases, transglutaminases and arylamine acetyltransferases.

  4. The Protein Model Portal--a comprehensive resource for protein structure and model information.

    Science.gov (United States)

    Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

    2013-01-01

    The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org.

  5. The Protein Model Portal—a comprehensive resource for protein structure and model information

    Science.gov (United States)

    Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

    2013-01-01

    The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org PMID:23624946

  6. Protein Structural Change Data - PSCDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PSCDB Protein Structural Change Data Data detail Data name Protein Structural Change Data DO...History of This Database Site Policy | Contact Us Protein Structural Change Data - PSCDB | LSDB Archive ...

  7. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  8. Structure and Dynamic Properties of Membrane Proteins using NMR

    DEFF Research Database (Denmark)

    Rösner, Heike; Kragelund, Birthe

    2012-01-01

    conformational changes. Their structural and functional decoding is challenging and has imposed demanding experimental development. Solution nuclear magnetic resonance (NMR) spectroscopy is one of the techniques providing the capacity to make a significant difference in the deciphering of the membrane protein...... structure-function paradigm. The method has evolved dramatically during the last decade resulting in a plethora of new experiments leading to a significant increase in the scientific repertoire for studying membrane proteins. Besides solving the three-dimensional structures using state-of-the-art approaches......-populated states, this review seeks to introduce the vast possibilities solution NMR can offer to the study of membrane protein structure-function analyses with special focus on applicability. © 2012 American Physiological Society. Compr Physiol 2:1491-1539, 2012....

  9. Perspective: Structural fluctuation of protein and Anfinsen's thermodynamic hypothesis

    Science.gov (United States)

    Hirata, Fumio; Sugita, Masatake; Yoshida, Masasuke; Akasaka, Kazuyuki

    2018-01-01

    The thermodynamics hypothesis, casually referred to as "Anfinsen's dogma," is described theoretically in terms of a concept of the structural fluctuation of protein or the first moment (average structure) and the second moment (variance and covariance) of the structural distribution. The new theoretical concept views the unfolding and refolding processes of protein as a shift of the structural distribution induced by a thermodynamic perturbation, with the variance-covariance matrix varying. Based on the theoretical concept, a method to characterize the mechanism of folding (or unfolding) is proposed. The transition state, if any, between two stable states is interpreted as a gap in the distribution, which is created due to an extensive reorganization of hydrogen bonds among back-bone atoms of protein and with water molecules in the course of conformational change. Further perspective to applying the theory to the computer-aided drug design, and to the material science, is briefly discussed.

  10. Protein structure prediction using bee colony optimization metaheuristic

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

    2010-01-01

    of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... our BCO method to generate good solutions to the protein structure prediction problem. The results show that BCO generally ¿nds better solutions than simulated annealing which so far has been the metaheuristic of choice for this problem....

  11. Structural and biochemical characterization of the cell fate determining nucleotidyltransferase fold protein MAB21L1.

    Science.gov (United States)

    de Oliveira Mann, Carina C; Kiefersauer, Reiner; Witte, Gregor; Hopfner, Karl-Peter

    2016-06-08

    The exceptionally conserved metazoan MAB21 proteins are implicated in cell fate decisions and share considerable sequence homology with the cyclic GMP-AMP synthase. cGAS is the major innate immune sensor for cytosolic DNA and produces the second messenger 2'-5', 3'-5' cyclic GMP-AMP. Little is known about the structure and biochemical function of other proteins of the cGAS-MAB21 subfamily, such as MAB21L1, MAB21L2 and MAB21L3. We have determined the crystal structure of human full-length MAB21L1. Our analysis reveals high structural conservation between MAB21L1 and cGAS but also uncovers important differences. Although monomeric in solution, MAB21L1 forms a highly symmetric double-pentameric oligomer in the crystal, raising the possibility that oligomerization could be a feature of MAB21L1. In the crystal, MAB21L1 is in an inactive conformation requiring a conformational change - similar to cGAS - to develop any nucleotidyltransferase activity. Co-crystallization with NTP identified a putative ligand binding site of MAB21 proteins that corresponds to the DNA binding site of cGAS. Finally, we offer a structure-based explanation for the effects of MAB21L2 mutations in patients with eye malformations. The underlying residues participate in fold-stabilizing interaction networks and mutations destabilize the protein. In summary, we provide a first structural framework for MAB21 proteins.

  12. Amyloid fibril formation from sequences of a natural beta-structured fibrous protein, the adenovirus fiber.

    Science.gov (United States)

    Papanikolopoulou, Katerina; Schoehn, Guy; Forge, Vincent; Forsyth, V Trevor; Riekel, Christian; Hernandez, Jean-François; Ruigrok, Rob W H; Mitraki, Anna

    2005-01-28

    Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.

  13. Crystal structure of secretory protein Hcp3 from Pseudomonas aeruginosa.

    Science.gov (United States)

    Osipiuk, Jerzy; Xu, Xiaohui; Cui, Hong; Savchenko, Alexei; Edwards, Aled; Joachimiak, Andrzej

    2011-03-01

    The Type VI secretion pathway transports proteins across the cell envelope of Gram-negative bacteria. Pseudomonas aeruginosa, an opportunistic Gram-negative bacterial pathogen infecting humans, uses the type VI secretion pathway to export specific effector proteins crucial for its pathogenesis. The HSI-I virulence locus encodes for several proteins that has been proposed to participate in protein transport including the Hcp1 protein, which forms hexameric rings that assemble into nanotubes in vitro. Two Hcp1 paralogues have been identified in the P. aeruginosa genome, Hsp2 and Hcp3. Here, we present the structure of the Hcp3 protein from P. aeruginosa. The overall structure of the monomer resembles Hcp1 despite the lack of amino-acid sequence similarity between the two proteins. The monomers assemble into hexamers similar to Hcp1. However, instead of forming nanotubes in head-to-tail mode like Hcp1, Hcp3 stacks its rings in head-to-head mode forming double-ring structures.

  14. Structural Elements Regulating AAA+ Protein Quality Control Machines.

    Science.gov (United States)

    Chang, Chiung-Wen; Lee, Sukyeong; Tsai, Francis T F

    2017-01-01

    Members of the ATPases Associated with various cellular Activities (AAA+) superfamily participate in essential and diverse cellular pathways in all kingdoms of life by harnessing the energy of ATP binding and hydrolysis to drive their biological functions. Although most AAA+ proteins share a ring-shaped architecture, AAA+ proteins have evolved distinct structural elements that are fine-tuned to their specific functions. A central question in the field is how ATP binding and hydrolysis are coupled to substrate translocation through the central channel of ring-forming AAA+ proteins. In this mini-review, we will discuss structural elements present in AAA+ proteins involved in protein quality control, drawing similarities to their known role in substrate interaction by AAA+ proteins involved in DNA translocation. Elements to be discussed include the pore loop-1, the Inter-Subunit Signaling (ISS) motif, and the Pre-Sensor I insert (PS-I) motif. Lastly, we will summarize our current understanding on the inter-relationship of those structural elements and propose a model how ATP binding and hydrolysis might be coupled to polypeptide translocation in protein quality control machines.

  15. Models of protein-ligand crystal structures: trust, but verify.

    Science.gov (United States)

    Deller, Marc C; Rupp, Bernhard

    2015-09-01

    X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.

  16. Rotational order–disorder structure of fluorescent protein FP480

    International Nuclear Information System (INIS)

    Pletnev, Sergei; Morozova, Kateryna S.; Verkhusha, Vladislav V.; Dauter, Zbigniew

    2009-01-01

    An analysis of the rotational order–disorder structure of fluorescent protein FP480 is presented. In the last decade, advances in instrumentation and software development have made crystallography a powerful tool in structural biology. Using this method, structural information can now be acquired from pathological crystals that would have been abandoned in earlier times. In this paper, the order–disorder (OD) structure of fluorescent protein FP480 is discussed. The structure is composed of tetramers with 222 symmetry incorporated into the lattice in two different ways, namely rotated 90° with respect to each other around the crystal c axis, with tetramer axes coincident with crystallographic twofold axes. The random distribution of alternatively oriented tetramers in the crystal creates a rotational OD structure with statistically averaged I422 symmetry, although the presence of very weak and diffuse additional reflections suggests that the randomness is only approximate

  17. Tertiary alphabet for the observable protein structural universe.

    Science.gov (United States)

    Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg

    2016-11-22

    Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.

  18. Fragger: a protein fragment picker for structural queries.

    Science.gov (United States)

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  19. DNA nanotubes for NMR structure determination of membrane proteins.

    Science.gov (United States)

    Bellot, Gaëtan; McClintock, Mark A; Chou, James J; Shih, William M

    2013-04-01

    Finding a way to determine the structures of integral membrane proteins using solution nuclear magnetic resonance (NMR) spectroscopy has proved to be challenging. A residual-dipolar-coupling-based refinement approach can be used to resolve the structure of membrane proteins up to 40 kDa in size, but to do this you need a weak-alignment medium that is detergent-resistant and it has thus far been difficult to obtain such a medium suitable for weak alignment of membrane proteins. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400-nm-long six-helix bundles, each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, toward collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes using counter ions and small DNA-binding molecules. This detergent-resistant liquid-crystal medium offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility and structural programmability. Production of sufficient nanotubes for four or five NMR experiments can be completed in 1 week by a single individual.

  20. The structure of pyogenecin immunity protein, a novel bacteriocin-like immunity protein from streptococcus pyogenes.

    Energy Technology Data Exchange (ETDEWEB)

    Chang, C.; Coggill, P.; Bateman, A.; Finn, R.; Cymborowski, M.; Otwinowski, Z.; Minor, W.; Volkart, L.; Joachimiak, A.; Wellcome Trust Sanger Inst.; Univ. of Virginia; UT Southwestern Medical Center

    2009-12-17

    Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable. We have solved the crystal structure of the gene-product of locus Spy-2152 from S. pyogenes, (PDB: 2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides. Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.

  1. The future of primordial features with large-scale structure surveys

    International Nuclear Information System (INIS)

    Chen, Xingang; Namjoo, Mohammad Hossein; Dvorkin, Cora; Huang, Zhiqi; Verde, Licia

    2016-01-01

    Primordial features are one of the most important extensions of the Standard Model of cosmology, providing a wealth of information on the primordial Universe, ranging from discrimination between inflation and alternative scenarios, new particle detection, to fine structures in the inflationary potential. We study the prospects of future large-scale structure (LSS) surveys on the detection and constraints of these features. We classify primordial feature models into several classes, and for each class we present a simple template of power spectrum that encodes the essential physics. We study how well the most ambitious LSS surveys proposed to date, including both spectroscopic and photometric surveys, will be able to improve the constraints with respect to the current Planck data. We find that these LSS surveys will significantly improve the experimental sensitivity on features signals that are oscillatory in scales, due to the 3D information. For a broad range of models, these surveys will be able to reduce the errors of the amplitudes of the features by a factor of 5 or more, including several interesting candidates identified in the recent Planck data. Therefore, LSS surveys offer an impressive opportunity for primordial feature discovery in the next decade or two. We also compare the advantages of both types of surveys.

  2. The future of primordial features with large-scale structure surveys

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Xingang; Namjoo, Mohammad Hossein [Institute for Theory and Computation, Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Dvorkin, Cora [Department of Physics, Harvard University, Cambridge, MA 02138 (United States); Huang, Zhiqi [School of Physics and Astronomy, Sun Yat-Sen University, 135 Xingang Xi Road, Guangzhou, 510275 (China); Verde, Licia, E-mail: xingang.chen@cfa.harvard.edu, E-mail: dvorkin@physics.harvard.edu, E-mail: huangzhq25@sysu.edu.cn, E-mail: mohammad.namjoo@cfa.harvard.edu, E-mail: liciaverde@icc.ub.edu [ICREA and ICC-UB, University of Barcelona (IEEC-UB), Marti i Franques, 1, Barcelona 08028 (Spain)

    2016-11-01

    Primordial features are one of the most important extensions of the Standard Model of cosmology, providing a wealth of information on the primordial Universe, ranging from discrimination between inflation and alternative scenarios, new particle detection, to fine structures in the inflationary potential. We study the prospects of future large-scale structure (LSS) surveys on the detection and constraints of these features. We classify primordial feature models into several classes, and for each class we present a simple template of power spectrum that encodes the essential physics. We study how well the most ambitious LSS surveys proposed to date, including both spectroscopic and photometric surveys, will be able to improve the constraints with respect to the current Planck data. We find that these LSS surveys will significantly improve the experimental sensitivity on features signals that are oscillatory in scales, due to the 3D information. For a broad range of models, these surveys will be able to reduce the errors of the amplitudes of the features by a factor of 5 or more, including several interesting candidates identified in the recent Planck data. Therefore, LSS surveys offer an impressive opportunity for primordial feature discovery in the next decade or two. We also compare the advantages of both types of surveys.

  3. Universal features in the genome-level evolution of protein domains.

    Science.gov (United States)

    Cosentino Lagomarsino, Marco; Sellerio, Alessandro L; Heijning, Philip D; Bassetti, Bruno

    2009-01-01

    Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features. We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.

  4. Packaging and structural phenotype of brome mosaic virus capsid protein with altered N-terminal β-hexamer structure

    International Nuclear Information System (INIS)

    Wispelaere, Melissanne de; Chaturvedi, Sonali; Wilkens, Stephan; Rao, A.L.N.

    2011-01-01

    The first 45 amino acid region of brome mosaic virus (BMV) capsid protein (CP) contains RNA binding and structural domains that are implicated in the assembly of infectious virions. One such important structural domain encompassing amino acids 28 QPVIV 32 , highly conserved between BMV and cowpea chlorotic mottle virus (CCMV), exhibits a β-hexamer structure. In this study we report that alteration of the β-hexamer structure by mutating 28 QPVIV 32 to 28 AAAAA 32 had no effect either on symptom phenotype, local and systemic movement in Chenopodium quinoa and RNA profile of in vivo assembled virions. However, sensitivity to RNase and assembly phenotypes distinguished virions assembled with CP subunits having β-hexamer from those of wild type. A comparison of 3-D models obtained by cryo electron microscopy revealed overall similar structural features for wild type and mutant virions, with small but significant differences near the 3-fold axes of symmetry.

  5. Multiple structure-intrinsic disorder interactions regulate and coordinate Hox protein function

    Science.gov (United States)

    Bondos, Sarah

    During animal development, Hox transcription factors determine fate of developing tissues to generate diverse organs and appendages. Hox proteins are famous for their bizarre mutant phenotypes, such as replacing antennae with legs. Clearly, the functions of individual Hox proteins must be distinct and reliable in vivo, or the organism risks malformation or death. However, within the Hox protein family, the DNA-binding homeodomains are highly conserved and the amino acids that contact DNA are nearly invariant. These observations raise the question: How do different Hox proteins correctly identify their distinct target genes using a common DNA binding domain? One possible means to modulate DNA binding is through the influence of the non-homeodomain protein regions, which differ significantly among Hox proteins. However genetic approaches never detected intra-protein interactions, and early biochemical attempts were hindered because the special features of ``intrinsically disordered'' sequences were not appreciated. We propose the first-ever structural model of a Hox protein to explain how specific contacts between distant, intrinsically disordered regions of the protein and the homeodomain regulate DNA binding and coordinate this activity with other Hox molecular functions.

  6. Constraint Logic Programming approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Fogolari Federico

    2004-11-01

    Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  7. Constraint Logic Programming approach to protein structure prediction.

    Science.gov (United States)

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  8. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  9. A Systematic Review of the Effects of Plant Compared with Animal Protein Sources on Features of Metabolic Syndrome.

    Science.gov (United States)

    Chalvon-Demersay, Tristan; Azzout-Marniche, Dalila; Arfsten, Judith; Egli, Léonie; Gaudichon, Claire; Karagounis, Leonidas G; Tomé, Daniel

    2017-03-01

    Dietary protein may play an important role in the prevention of metabolic dysfunctions. However, the way in which the protein source affects these dysfunctions has not been clearly established. The aim of the current systematic review was to compare the impact of plant- and animal-sourced dietary proteins on several features of metabolic syndrome in humans. The PubMed database was searched for both chronic and acute interventional studies, as well as observational studies, in healthy humans or those with metabolic dysfunctions, in which the impact of animal and plant protein intake was compared while using the following variables: cholesterolemia and triglyceridemia, blood pressure, glucose homeostasis, and body composition. Based on data extraction, we observed that soy protein consumption (with isoflavones), but not soy protein alone (without isoflavones) or other plant proteins (pea and lupine proteins, wheat gluten), leads to a 3% greater decrease in both total and LDL cholesterol compared with animal-sourced protein ingestion, especially in individuals with high fasting cholesterol concentrations. This observation was made when animal proteins were provided as a whole diet rather than given supplementally. Some observational studies reported an inverse association between plant protein intake and systolic and diastolic blood pressure, but this was not confirmed by intervention studies. Moreover, plant protein (wheat gluten, soy protein) intake as part of a mixed meal resulted in a lower postprandial insulin response than did whey. This systematic review provides some evidence that the intake of soy protein associated with isoflavones may prevent the onset of risk factors associated with cardiovascular disease, i.e., hypercholesterolemia and hypertension, in humans. However, we were not able to draw any further conclusions from the present work on the positive effects of plant proteins relating to glucose homeostasis and body composition. © 2017 American

  10. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    Science.gov (United States)

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Structure and dynamics of the human pleckstrin DEP domain: distinct molecular features of a novel DEP domain subfamily.

    Science.gov (United States)

    Civera, Concepcion; Simon, Bernd; Stier, Gunter; Sattler, Michael; Macias, Maria J

    2005-02-01

    Pleckstrin1 is a major substrate for protein kinase C in platelets and leukocytes, and comprises a central DEP (disheveled, Egl-10, pleckstrin) domain, which is flanked by two PH (pleckstrin homology) domains. DEP domains display a unique alpha/beta fold and have been implicated in membrane binding utilizing different mechanisms. Using multiple sequence alignments and phylogenetic tree reconstructions, we find that 6 subfamilies of the DEP domain exist, of which pleckstrin represents a novel and distinct subfamily. To clarify structural determinants of the DEP fold and to gain further insight into the role of the DEP domain, we determined the three-dimensional structure of the pleckstrin DEP domain using heteronuclear NMR spectroscopy. Pleckstrin DEP shares main structural features with the DEP domains of disheveled and Epac, which belong to different DEP subfamilies. However, the pleckstrin DEP fold is distinct from these structures and contains an additional, short helix alpha4 inserted in the beta4-beta5 loop that exhibits increased backbone mobility as judged by NMR relaxation measurements. Based on sequence conservation, the helix alpha4 may also be present in the DEP domains of regulator of G-protein signaling (RGS) proteins, which are members of the same DEP subfamily. In pleckstrin, the DEP domain is surrounded by two PH domains. Structural analysis and charge complementarity suggest that the DEP domain may interact with the N-terminal PH domain in pleckstrin. Phosphorylation of the PH-DEP linker, which is required for pleckstrin function, could regulate such an intramolecular interaction. This suggests a role of the pleckstrin DEP domain in intramolecular domain interactions, which is distinct from the functions of other DEP domain subfamilies found so far.

  12. Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.

    Science.gov (United States)

    Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2015-01-01

    Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residu