WorldWideScience

Sample records for supercomputer protein prediction

  1. [Experience in simulating the structural and dynamic features of small proteins using table supercomputers].

    Science.gov (United States)

    Kondrat'ev, M S; Kabanov, A V; Komarov, V M; Khechinashvili, N N; Samchenko, A A

    2011-01-01

    The results of theoretical studies of the structural and dynamic features of peptides and small proteins have been presented that were carried out by quantum chemical and molecular dynamics methods in high-performance graphic stations, "table supercomputers", using distributed calculations by the CUDA technology.

  2. Operational numerical weather prediction on a GPU-accelerated cluster supercomputer

    Science.gov (United States)

    Lapillonne, Xavier; Fuhrer, Oliver; Spörri, Pascal; Osuna, Carlos; Walser, André; Arteaga, Andrea; Gysi, Tobias; Rüdisühli, Stefan; Osterried, Katherine; Schulthess, Thomas

    2016-04-01

    The local area weather prediction model COSMO is used at MeteoSwiss to provide high resolution numerical weather predictions over the Alpine region. In order to benefit from the latest developments in computer technology the model was optimized and adapted to run on Graphical Processing Units (GPUs). Thanks to these model adaptations and the acquisition of a dedicated hybrid supercomputer a new set of operational applications have been introduced, COSMO-1 (1 km deterministic), COSMO-E (2 km ensemble) and KENDA (data assimilation) at MeteoSwiss. These new applications correspond to an increase of a factor 40x in terms of computational load as compared to the previous operational setup. We present an overview of the porting approach of the COSMO model to GPUs together with a detailed description of and performance results on the new hybrid Cray CS-Storm computer, Piz Kesch.

  3. Mixed precision numerical weather prediction on hybrid GPU-CPU supercomputers

    Science.gov (United States)

    Lapillonne, Xavier; Osuna, Carlos; Spoerri, Pascal; Osterried, Katherine; Charpilloz, Christophe; Fuhrer, Oliver

    2017-04-01

    A new version of the climate and weather model COSMO that runs faster on traditional high performance computing systems with CPUs as well as on heterogeneous architectures using graphics processing units (GPUs) has been developed. The model was in addition adapted to be able to run in "single precision" mode. After discussing the key changes introduced in this new model version and the tools used in the porting approach, we present 3 applications, namely the MeteoSwiss operational weather prediction system, COSMO-LEPS and the CALMO project, which already take advantage of the performance improvement, up to a factor 4, by running on GPU system and using the single precision mode. We discuss how the code changes open new perspectives for scientific research and can enable researchers to get access to a new class of supercomputers.

  4. Supercomputational science

    CERN Document Server

    Wilson, S

    1990-01-01

    In contemporary research, the supercomputer now ranks, along with radio telescopes, particle accelerators and the other apparatus of "big science", as an expensive resource, which is nevertheless essential for state of the art research. Supercomputers are usually provided as shar.ed central facilities. However, unlike, telescopes and accelerators, they are find a wide range of applications which extends across a broad spectrum of research activity. The difference in performance between a "good" and a "bad" computer program on a traditional serial computer may be a factor of two or three, but on a contemporary supercomputer it can easily be a factor of one hundred or even more! Furthermore, this factor is likely to increase with future generations of machines. In keeping with the large capital and recurrent costs of these machines, it is appropriate to devote effort to training and familiarization so that supercomputers are employed to best effect. This volume records the lectures delivered at a Summer School ...

  5. Grassroots Supercomputing

    CERN Multimedia

    Buchanan, Mark

    2005-01-01

    What started out as a way for SETI to plow through its piles or radio-signal data from deep space has turned into a powerful research tool as computer users acrosse the globe donate their screen-saver time to projects as diverse as climate-change prediction, gravitational-wave searches, and protein folding (4 pages)

  6. Protein docking prediction using predicted protein-protein interface

    Directory of Open Access Journals (Sweden)

    Li Bin

    2012-01-01

    Full Text Available Abstract Background Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. Results We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm, is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. Conclusion We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  7. KAUST Supercomputing Laboratory

    KAUST Repository

    Bailey, April Renee

    2011-11-15

    KAUST has partnered with IBM to establish a Supercomputing Research Center. KAUST is hosting the Shaheen supercomputer, named after the Arabian falcon famed for its swiftness of flight. This 16-rack IBM Blue Gene/P system is equipped with 4 gigabyte memory per node and capable of 222 teraflops, making KAUST campus the site of one of the world’s fastest supercomputers in an academic environment. KAUST is targeting petaflop capability within 3 years.

  8. Protein Sorting Prediction

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2017-01-01

    Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths...

  9. Emerging supercomputer architectures

    Energy Technology Data Exchange (ETDEWEB)

    Messina, P.C.

    1987-01-01

    This paper will examine the current and near future trends for commercially available high-performance computers with architectures that differ from the mainstream ''supercomputer'' systems in use for the last few years. These emerging supercomputer architectures are just beginning to have an impact on the field of high performance computing. 7 refs., 1 tab.

  10. NSF Commits to Supercomputers.

    Science.gov (United States)

    Waldrop, M. Mitchell

    1985-01-01

    The National Science Foundation (NSF) has allocated at least $200 million over the next five years to support four new supercomputer centers. Issues and trends related to this NSF initiative are examined. (JN)

  11. Energy sciences supercomputing 1990

    Energy Technology Data Exchange (ETDEWEB)

    Mirin, A.A.; Kaiper, G.V. (eds.)

    1990-01-01

    This report contains papers on the following topics: meeting the computational challenge; lattice gauge theory: probing the standard model; supercomputing for the superconducting super collider; and overview of ongoing studies in climate model diagnosis and intercomparison; MHD simulation of the fueling of a tokamak fusion reactor through the injection of compact toroids; gyrokinetic particle simulation of tokamak plasmas; analyzing chaos: a visual essay in nonlinear dynamics; supercomputing and research in theoretical chemistry; monte carlo simulations of light nuclei; parallel processing; and scientists of the future: learning by doing.

  12. Protein Chemical Shift Prediction

    CERN Document Server

    Larsen, Anders S

    2014-01-01

    The protein chemical shifts holds a large amount of information about the 3-dimensional structure of the protein. A number of chemical shift predictors based on the relationship between structures resolved with X-ray crystallography and the corresponding experimental chemical shifts have been developed. These empirical predictors are very accurate on X-ray structures but tends to be insensitive to small structural changes. To overcome this limitation it has been suggested to make chemical shift predictors based on quantum mechanical(QM) calculations. In this thesis the development of the QM derived chemical shift predictor Procs14 is presented. Procs14 is based on 2.35 million density functional theory(DFT) calculations on tripeptides and contains corrections for hydrogen bonding, ring current and the effect of the previous and following residue. Procs14 is capable at performing predictions for the 13CA, 13CB, 13CO, 15NH, 1HN and 1HA backbone atoms. In order to benchmark Procs14, a number of QM NMR calculatio...

  13. Supercomputers to transform Science

    CERN Multimedia

    2006-01-01

    "New insights into the structure of space and time, climate modeling, and the design of novel drugs, are but a few of the many research areas that will be transforned by the installation of three supercomputers at the Unversity of Bristol." (1/2 page)

  14. Petaflop supercomputers of China

    Institute of Scientific and Technical Information of China (English)

    Guoliang CHEN

    2010-01-01

    @@ After ten years of development, high performance computing (HPC) in China has made remarkable progress. In November, 2010, the NUDT Tianhe-1A and the Dawning Nebulae respectively claimed the 1st and 3rd places in the Top500 Supercomputers List; this recognizes internationally the level that China has achieved in high performance computer manufacturing.

  15. Protein domain prediction

    NARCIS (Netherlands)

    Ingolfsson, Helgi; Yona, Golan

    2008-01-01

    Domains are considered to be the building blocks of protein structures. A protein can contain a single domain or multiple domains, each one typically associated with a specific function. The combination of domains determines the function of the protein, its subcellular localization and the interacti

  16. Introduction to Reconfigurable Supercomputing

    CERN Document Server

    Lanzagorta, Marco; Rosenberg, Robert

    2010-01-01

    This book covers technologies, applications, tools, languages, procedures, advantages, and disadvantages of reconfigurable supercomputing using Field Programmable Gate Arrays (FPGAs). The target audience is the community of users of High Performance Computers (HPe who may benefit from porting their applications into a reconfigurable environment. As such, this book is intended to guide the HPC user through the many algorithmic considerations, hardware alternatives, usability issues, programming languages, and design tools that need to be understood before embarking on the creation of reconfigur

  17. Update on protein structure prediction

    DEFF Research Database (Denmark)

    Hubbard, T; Tramontano, A; Barton, G

    1996-01-01

    Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a numbe...

  18. Protein Structure Prediction by Protein Threading

    Science.gov (United States)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  19. Toolbox for Protein Structure Prediction.

    Science.gov (United States)

    Roche, Daniel Barry; McGuffin, Liam James

    2016-01-01

    Protein tertiary structure prediction algorithms aim to predict, from amino acid sequence, the tertiary structure of a protein. In silico protein structure prediction methods have become extremely important, as in vitro-based structural elucidation is unable to keep pace with the current growth of sequence databases due to high-throughput next-generation sequencing, which has exacerbated the gaps in our knowledge between sequences and structures.Here we briefly discuss protein tertiary structure prediction, the biennial competition for the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and its role in shaping the field. We also discuss, in detail, our cutting-edge web-server method IntFOLD2-TS for tertiary structure prediction. Furthermore, we provide a step-by-step guide on using the IntFOLD2-TS web server, along with some real world examples, where the IntFOLD server can and has been used to improve protein tertiary structure prediction and aid in functional elucidation.

  20. Predicting protein structure classes from function predictions

    DEFF Research Database (Denmark)

    Sommer, I.; Rahnenfuhrer, J.; de Lichtenberg, Ulrik;

    2004-01-01

    We introduce a new approach to using the information contained in sequence-to-function prediction data in order to recognize protein template classes, a critical step in predicting protein structure. The data on which our method is based comprise probabilities of functional categories; for given...... query sequences these probabilities are obtained by a neural net that has previously been trained on a variety of functionally important features. On a training set of sequences we assess the relevance of individual functional categories for identifying a given structural family. Using a combination...... of the most relevant categories, the likelihood of a query sequence to belong to a specific family can be estimated. Results: The performance of the method is evaluated using cross-validation. For a fixed structural family and for every sequence, a score is calculated that measures the evidence for family...

  1. Enabling department-scale supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Greenberg, D.S.; Hart, W.E.; Phillips, C.A.

    1997-11-01

    The Department of Energy (DOE) national laboratories have one of the longest and most consistent histories of supercomputer use. The authors summarize the architecture of DOE`s new supercomputers that are being built for the Accelerated Strategic Computing Initiative (ASCI). The authors then argue that in the near future scaled-down versions of these supercomputers with petaflop-per-weekend capabilities could become widely available to hundreds of research and engineering departments. The availability of such computational resources will allow simulation of physical phenomena to become a full-fledged third branch of scientific exploration, along with theory and experimentation. They describe the ASCI and other supercomputer applications at Sandia National Laboratories, and discuss which lessons learned from Sandia`s long history of supercomputing can be applied in this new setting.

  2. Ultrascalable petaflop parallel supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Blumrich, Matthias A. (Ridgefield, CT); Chen, Dong (Croton On Hudson, NY); Chiu, George (Cross River, NY); Cipolla, Thomas M. (Katonah, NY); Coteus, Paul W. (Yorktown Heights, NY); Gara, Alan G. (Mount Kisco, NY); Giampapa, Mark E. (Irvington, NY); Hall, Shawn (Pleasantville, NY); Haring, Rudolf A. (Cortlandt Manor, NY); Heidelberger, Philip (Cortlandt Manor, NY); Kopcsay, Gerard V. (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Salapura, Valentina (Chappaqua, NY); Sugavanam, Krishnan (Mahopac, NY); Takken, Todd (Brewster, NY)

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  3. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    The problem of predicting the three-dimensional structure of a protein given its amino acid sequence is one of the most important open problems in bioinformatics. One of the carbon atoms in amino acids is the C-atom and the overall structure of a protein is often represented by a so-called C...... is competitive in quality and speed with other state-of-the-art decoy generation algorithms. Our third C-trace reconstruction approach is based on bee-colony optimization [24]. We demonstrate why this algorithm has some important properties that makes it suitable for protein structure prediction. Our approach......-trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE...

  4. Structure Prediction of Membrane Proteins

    Institute of Scientific and Technical Information of China (English)

    Chunlong Zhou; Yao Zheng; Yan Zhou

    2004-01-01

    There is a large gap between the number of membrane protein (MP) sequences and that of their decoded 3D structures, especially high-resolution structures, due to difficulties in crystal preparation of MPs. However, detailed knowledge of the 3D structure is required for the fundamental understanding of the function of an MP and the interactions between the protein and its inhibitors or activators. In this paper, some computational approaches that have been used to predict MP structures are discussed and compared.

  5. Prediction of Protein-Protein Interactions Using Protein Signature Profiling

    Institute of Scientific and Technical Information of China (English)

    Mahmood A. Mahdavi; Yen-Han Lin

    2007-01-01

    Protein domains are conserved and functionally independent structures that play an important role in interactions among related proteins. Domain-domain inter- actions have been recently used to predict protein-protein interactions (PPI). In general, the interaction probability of a pair of domains is scored using a trained scoring function. Satisfying a threshold, the protein pairs carrying those domains are regarded as "interacting". In this study, the signature contents of proteins were utilized to predict PPI pairs in Saccharomyces cerevisiae, Caenorhabditis ele- gans, and Homo sapiens. Similarity between protein signature patterns was scored and PPI predictions were drawn based on the binary similarity scoring function. Results show that the true positive rate of prediction by the proposed approach is approximately 32% higher than that using the maximum likelihood estimation method when compared with a test set, resulting in 22% increase in the area un- der the receiver operating characteristic (ROC) curve. When proteins containing one or two signatures were removed, the sensitivity of the predicted PPI pairs in- creased significantly. The predicted PPI pairs are on average 11 times more likely to interact than the random selection at a confidence level of 0.95, and on aver- age 4 times better than those predicted by either phylogenetic profiling or gene expression profiling.

  6. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-01-01

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  7. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-12-31

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  8. Dust modelling and forecasting in the Barcelona Supercomputing Center: Activities and developments

    Energy Technology Data Exchange (ETDEWEB)

    Perez, C; Baldasano, J M; Jimenez-Guerrero, P; Jorba, O; Haustein, K; Basart, S [Earth Sciences Department. Barcelona Supercomputing Center. Barcelona (Spain); Cuevas, E [Izanaa Atmospheric Research Center. Agencia Estatal de Meteorologia, Tenerife (Spain); Nickovic, S [Atmospheric Research and Environment Branch, World Meteorological Organization, Geneva (Switzerland)], E-mail: carlos.perez@bsc.es

    2009-03-01

    The Barcelona Supercomputing Center (BSC) is the National Supercomputer Facility in Spain, hosting MareNostrum, one of the most powerful Supercomputers in Europe. The Earth Sciences Department of BSC operates daily regional dust and air quality forecasts and conducts intensive modelling research for short-term operational prediction. This contribution summarizes the latest developments and current activities in the field of sand and dust storm modelling and forecasting.

  9. Computational Dimensionalities of Global Supercomputing

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2013-12-01

    Full Text Available This Invited Paper pertains to subject of my Plenary Keynote Speech at the 17th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2013 held in Orlando, Florida on July 9-12, 2013. The title of my Plenary Keynote Speech was: "Dimensionalities of Computation: from Global Supercomputing to Data, Text and Web Mining" but this Invited Paper will focus only on the "Computational Dimensionalities of Global Supercomputing" and is based upon a summary of the contents of several individual articles that have been previously written with myself as lead author and published in [75], [76], [77], [78], [79], [80] and [11]. The topics of these of the Plenary Speech included Overview of Current Research in Global Supercomputing [75], Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing [76], Data Mining Supercomputing with SAS™ JMP® Genomics ([77], [79], [80], and Visualization by Supercomputing Data Mining [81]. ______________________ [11.] Committee on the Future of Supercomputing, National Research Council (2003, The Future of Supercomputing: An Interim Report, ISBN-13: 978-0-309-09016- 2, http://www.nap.edu/catalog/10784.html [75.] Segall, Richard S.; Zhang, Qingyu and Cook, Jeffrey S.(2013, "Overview of Current Research in Global Supercomputing", Proceedings of Forty- Fourth Meeting of Southwest Decision Sciences Institute (SWDSI, Albuquerque, NM, March 12-16, 2013. [76.] Segall, Richard S. and Zhang, Qingyu (2010, "Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing", Proceedings of 5th INFORMS Workshop on Data Mining and Health Informatics, Austin, TX, November 6, 2010. [77.] Segall, Richard S., Zhang, Qingyu and Pierce, Ryan M.(2010, "Data Mining Supercomputing with SAS™ JMP®; Genomics: Research-in-Progress, Proceedings of 2010 Conference on Applied Research in Information Technology, sponsored by

  10. Microprocessors: from desktops to supercomputers.

    Science.gov (United States)

    Baskett, F; Hennessy, J L

    1993-08-13

    Continuing improvements in integrated circuit technology and computer architecture have driven microprocessors to performance levels that rival those of supercomputers-at a fraction of the price. The use of sophisticated memory hierarchies enables microprocessor-based machines to have very large memories built from commodity dynamic random access memory while retaining the high bandwidth and low access time needed in a high-performance machine. Parallel processors composed of these high-performance microprocessors are becoming the supercomputing technology of choice for scientific and engineering applications. The challenges for these new supercomputers have been in developing multiprocessor architectures that are easy to program and that deliver high performance without extraordinary programming efforts by users. Recent progress in multiprocessor architecture has led to ways to meet these challenges.

  11. World's fastest supercomputer opens up to users

    Science.gov (United States)

    Xin, Ling

    2016-08-01

    China's latest supercomputer - Sunway TaihuLight - has claimed the crown as the world's fastest computer according to the latest TOP500 list, released at the International Supercomputer Conference in Frankfurt in late June.

  12. Improved Access to Supercomputers Boosts Chemical Applications.

    Science.gov (United States)

    Borman, Stu

    1989-01-01

    Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)

  13. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  14. Neural Networks for protein Structure Prediction

    DEFF Research Database (Denmark)

    Bohr, Henrik

    1998-01-01

    This is a review about neural network applications in bioinformatics. Especially the applications to protein structure prediction, e.g. prediction of secondary structures, prediction of surface structure, fold class recognition and prediction of the 3-dimensional structure of protein backbones...

  15. An iterative approach of protein function prediction

    Directory of Open Access Journals (Sweden)

    Chi Xiaoxiao

    2011-11-01

    Full Text Available Abstract Background Current approaches of predicting protein functions from a protein-protein interaction (PPI dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins. Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms. Results In this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. Conclusions The iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting

  16. Desktop supercomputers. Advance medical imaging.

    Science.gov (United States)

    Frisiello, R S

    1991-02-01

    Medical imaging tools that radiologists as well as a wide range of clinicians and healthcare professionals have come to depend upon are emerging into the next phase of functionality. The strides being made in supercomputing technologies--including reduction of size and price--are pushing medical imaging to a new level of accuracy and functionality.

  17. An assessment of worldwide supercomputer usage

    Energy Technology Data Exchange (ETDEWEB)

    Wasserman, H.J.; Simmons, M.L.; Hayes, A.H.

    1995-01-01

    This report provides a comparative study of advanced supercomputing usage in Japan and the United States as of Spring 1994. It is based on the findings of a group of US scientists whose careers have centered on programming, evaluating, and designing high-performance supercomputers for over ten years. The report is a follow-on to an assessment of supercomputing technology in Europe and Japan that was published in 1993. Whereas the previous study focused on supercomputer manufacturing capabilities, the primary focus of the current work was to compare where and how supercomputers are used. Research for this report was conducted through both literature studies and field research in Japan.

  18. Algorithm for Predicting Protein Secondary Structure

    CERN Document Server

    Senapati, K K; Bhaumik, D

    2010-01-01

    Predicting protein structure from amino acid sequence is one of the most important unsolved problems of molecular biology and biophysics.Not only would a successful prediction algorithm be a tremendous advance in the understanding of the biochemical mechanisms of proteins, but, since such an algorithm could conceivably be used to design proteins to carry out specific functions.Prediction of the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure as well as its function. In this research, we use different Hidden Markov models for protein secondary structure prediction. In this paper we have proposed an algorithm for predicting protein secondary structure. We have used Hidden Markov model with sliding window for secondary structure prediction.The secondary structure has three regular forms, for each secondary structural element we are using one Hidden Markov Model.

  19. GECluster: a novel protein complex prediction method.

    Science.gov (United States)

    Su, Lingtao; Liu, Guixia; Wang, Han; Tian, Yuan; Zhou, Zhihui; Han, Liang; Yan, Lun

    2014-07-04

    Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein-protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly.

  20. Protein Residue Contacts and Prediction Methods

    Science.gov (United States)

    Adhikari, Badri

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch. In this chapter, we briefly discuss many elements of protein residue–residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  1. Power-constrained supercomputing

    Science.gov (United States)

    Bailey, Peter E.

    As we approach exascale systems, power is turning from an optimization goal to a critical operating constraint. With power bounds imposed by both stakeholders and the limitations of existing infrastructure, achieving practical exascale computing will therefore rely on optimizing performance subject to a power constraint. However, this requirement should not add to the burden of application developers; optimizing the runtime environment given restricted power will primarily be the job of high-performance system software. In this dissertation, we explore this area and develop new techniques that extract maximum performance subject to a particular power constraint. These techniques include a method to find theoretical optimal performance, a runtime system that shifts power in real time to improve performance, and a node-level prediction model for selecting power-efficient operating points. We use a linear programming (LP) formulation to optimize application schedules under various power constraints, where a schedule consists of a DVFS state and number of OpenMP threads for each section of computation between consecutive message passing events. We also provide a more flexible mixed integer-linear (ILP) formulation and show that the resulting schedules closely match schedules from the LP formulation. Across four applications, we use our LP-derived upper bounds to show that current approaches trail optimal, power-constrained performance by up to 41%. This demonstrates limitations of current systems, and our LP formulation provides future optimization approaches with a quantitative optimization target. We also introduce Conductor, a run-time system that intelligently distributes available power to nodes and cores to improve performance. The key techniques used are configuration space exploration and adaptive power balancing. Configuration exploration dynamically selects the optimal thread concurrency level and DVFS state subject to a hardware-enforced power bound

  2. Alpha complexes in protein structure prediction

    DEFF Research Database (Denmark)

    Winter, Pawel; Fonseca, Rasmus

    2015-01-01

    Reducing the computational effort and increasing the accuracy of potential energy functions is of utmost importance in modeling biological systems, for instance in protein structure prediction, docking or design. Evaluating interactions between nonbonded atoms is the bottleneck of such computations......-complexes and kinetic a-complexes in protein related problems (e.g., protein structure prediction and protein-ligand docking) deserves furhter investigation.)...

  3. TOP500 Supercomputers for June 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-06-23

    23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.

  4. Protein Structure Prediction with Visuospatial Analogy

    Science.gov (United States)

    Davies, Jim; Glasgow, Janice; Kuo, Tony

    We show that visuospatial representations and reasoning techniques can be used as a similarity metric for analogical protein structure prediction. Our system retrieves pairs of α-helices based on contact map similarity, then transfers and adapts the structure information to an unknown helix pair, showing that similar protein contact maps predict similar 3D protein structure. The success of this method provides support for the notion that changing representations can enable similarity metrics in analogy.

  5. Predicting protein dynamics from structural ensembles

    CERN Document Server

    Copperman, J

    2015-01-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using e...

  6. The MULTICOM toolbox for protein structure prediction

    Directory of Open Access Journals (Sweden)

    Cheng Jianlin

    2012-04-01

    Full Text Available Abstract Background As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or Results To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction. Conclusions These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9 from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  7. Predicting where small molecules bind at protein-protein interfaces.

    Directory of Open Access Journals (Sweden)

    Peter Walter

    Full Text Available Small molecules that bind at protein-protein interfaces may either block or stabilize protein-protein interactions in cells. Thus, some of these binding interfaces may turn into prospective targets for drug design. Here, we collected 175 pairs of protein-protein (PP complexes and protein-ligand (PL complexes with known three-dimensional structures for which (1 one protein from the PP complex shares at least 40% sequence identity with the protein from the PL complex, and (2 the interface regions of these proteins overlap at least partially with each other. We found that those residues of the interfaces that may bind the other protein as well as the small molecule are evolutionary more conserved on average, have a higher tendency of being located in pockets and expose a smaller fraction of their surface area to the solvent than the remaining protein-protein interface region. Based on these findings we derived a statistical classifier that predicts patches at binding interfaces that have a higher tendency to bind small molecules. We applied this new prediction method to more than 10,000 interfaces from the protein data bank. For several complexes related to apoptosis the predicted binding patches were in direct contact to co-crystallized small molecules.

  8. Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners

    National Research Council Canada - National Science Library

    Shoemaker, Benjamin A; Panchenko, Anna R

    2007-01-01

    .... In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions...

  9. Information assessment on predicting protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Gerstein Mark

    2004-10-01

    Full Text Available Abstract Background Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. Results Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. Conclusions In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the

  10. INTEL: Intel based systems move up in supercomputing ranks

    CERN Multimedia

    2002-01-01

    "The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).

  11. Comparing Clusters and Supercomputers for Lattice QCD

    CERN Document Server

    Gottlieb, S

    2001-01-01

    Since the development of the Beowulf project to build a parallel computer from commodity PC components, there have been many such clusters built. The MILC QCD code has been run on a variety of clusters and supercomputers. Key design features are identified, and the cost effectiveness of clusters and supercomputers are compared.

  12. Low Cost Supercomputer for Applications in Physics

    Science.gov (United States)

    Ahmed, Maqsood; Ahmed, Rashid; Saeed, M. Alam; Rashid, Haris; Fazal-e-Aleem

    2007-02-01

    Using parallel processing technique and commodity hardware, Beowulf supercomputers can be built at a much lower cost. Research organizations and educational institutions are using this technique to build their own high performance clusters. In this paper we discuss the architecture and design of Beowulf supercomputer and our own experience of building BURRAQ cluster.

  13. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    -trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...

  14. Algorithms for Protein Structure Prediction

    DEFF Research Database (Denmark)

    Paluszewski, Martin

    -trace. Here we present three different approaches for reconstruction of C-traces from predictable measures. In our first approach [63, 62], the C-trace is positioned on a lattice and a tabu-search algorithm is applied to find minimum energy structures. The energy function is based on half-sphere-exposure (HSE......) is more robust than standard Monte Carlo search. In the second approach for reconstruction of C-traces, an exact branch and bound algorithm has been developed [67, 65]. The model is discrete and makes use of secondary structure predictions, HSE, CN and radius of gyration. We show how to compute good lower...... bounds for partial structures very fast. Using these lower bounds, we are able to find global minimum structures in a huge conformational space in reasonable time. We show that many of these global minimum structures are of good quality compared to the native structure. Our branch and bound algorithm...

  15. Protein secondary structure: category assignment and predictability

    DEFF Research Database (Denmark)

    Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

    2001-01-01

    In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary...... structures. Single sequence prediction of the new three category assignment gives an overall prediction improvement of 3.1% and 5.1%, compared to the DSSP assignment and schemes where the helix category consists of a-helix and 3(10)-helix, respectively. These results were achieved using a standard feed...

  16. Protein secondary structure: category assignment and predictability

    DEFF Research Database (Denmark)

    Andersen, Claus A.; Bohr, Henrik; Brunak, Søren

    2001-01-01

    In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary......-forward neural network with one hidden layer on a data set identical to the one used in earlier work....

  17. Year 2 Report: Protein Function Prediction Platform

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  18. Chemical shift prediction for denatured proteins

    Energy Technology Data Exchange (ETDEWEB)

    Prestegard, James H., E-mail: jpresteg@ccrc.uga.edu; Sahu, Sarata C.; Nkari, Wendy K.; Morris, Laura C.; Live, David; Gruta, Christian

    2013-02-15

    While chemical shift prediction has played an important role in aspects of protein NMR that include identification of secondary structure, generation of torsion angle constraints for structure determination, and assignment of resonances in spectra of intrinsically disordered proteins, interest has arisen more recently in using it in alternate assignment strategies for crosspeaks in {sup 1}H-{sup 15}N HSQC spectra of sparsely labeled proteins. One such approach involves correlation of crosspeaks in the spectrum of the native protein with those observed in the spectrum of the denatured protein, followed by assignment of the peaks in the latter spectrum. As in the case of disordered proteins, predicted chemical shifts can aid in these assignments. Some previously developed empirical formulas for chemical shift prediction have depended on basis data sets of 20 pentapeptides. In each case the central residue was varied among the 20 amino common acids, with the flanking residues held constant throughout the given series. However, previous choices of solvent conditions and flanking residues make the parameters in these formulas less than ideal for general application to denatured proteins. Here, we report {sup 1}H and {sup 15}N shifts for a set of alanine based pentapeptides under the low pH urea denaturing conditions that are more appropriate for sparse label assignments. New parameters have been derived and a Perl script was created to facilitate comparison with other parameter sets. A small, but significant, improvement in shift predictions for denatured ubiquitin is demonstrated.

  19. Protein contact order prediction from primary sequences

    Directory of Open Access Journals (Sweden)

    Wishart David S

    2008-05-01

    Full Text Available Abstract Background Contact order is a topological descriptor that has been shown to be correlated with several interesting protein properties such as protein folding rates and protein transition state placements. Contact order has also been used to select for viable protein folds from ab initio protein structure prediction programs. For proteins of known three-dimensional structure, their contact order can be calculated directly. However, for proteins with unknown three-dimensional structure, there is no effective prediction method currently available. Results In this paper, we propose several simple yet very effective methods to predict contact order from the amino acid sequence only. One set of methods is based on a weighted linear combination of predicted secondary structure content and amino acid composition. Depending on the number of components used in these equations it is possible to achieve a correlation coefficient of 0.857–0.870 between the observed and predicted contact order. A second method, based on sequence similarity to known three-dimensional structures, is able to achieve a correlation coefficient of 0.977. We have also developed a much more robust implementation for calculating contact order directly from PDB coordinates that works for > 99% PDB files. All of these contact order predictors and calculators have been implemented as a web server (see Availability and requirements section for URL. Conclusion Protein contact order can be effectively predicted from the primary sequence, at the absence of three-dimensional structure. Three factors, percentage of residues in alpha helices, percentage of residues in beta strands, and sequence length, appear to be strongly correlated with the absolute contact order.

  20. Protein complexes predictions within protein interaction networks using genetic algorithms.

    Science.gov (United States)

    Ramadan, Emad; Naef, Ahmed; Ahmed, Moataz

    2016-07-25

    Protein-protein interaction networks are receiving increased attention due to their importance in understanding life at the cellular level. A major challenge in systems biology is to understand the modular structure of such biological networks. Although clustering techniques have been proposed for clustering protein-protein interaction networks, those techniques suffer from some drawbacks. The application of earlier clustering techniques to protein-protein interaction networks in order to predict protein complexes within the networks does not yield good results due to the small-world and power-law properties of these networks. In this paper, we construct a new clustering algorithm for predicting protein complexes through the use of genetic algorithms. We design an objective function for exclusive clustering and overlapping clustering. We assess the quality of our proposed clustering algorithm using two gold-standard data sets. Our algorithm can identify protein complexes that are significantly enriched in the gold-standard data sets. Furthermore, our method surpasses three competing methods: MCL, ClusterOne, and MCODE in terms of the quality of the predicted complexes. The source code and accompanying examples are freely available at http://faculty.kfupm.edu.sa/ics/eramadan/GACluster.zip .

  1. TOP500 Supercomputers for June 2005

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2005-06-22

    25th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/L LNL BlueGene/L and IBM gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 25th edition of the TOP500 list of the world's fastest supercomputers was released today (June 22, 2005) at the 20th International Supercomputing Conference (ISC2005) in Heidelberg Germany.

  2. TOP500 Supercomputers for November 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-11-16

    22nd Edition of TOP500 List of World s Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 22nd edition of the TOP500 list of the worlds fastest supercomputers was released today (November 16, 2003). The Earth Simulator supercomputer retains the number one position with its Linpack benchmark performance of 35.86 Tflop/s (''teraflops'' or trillions of calculations per second). It was built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan.

  3. 16 million [pounds] investment for 'virtual supercomputer'

    CERN Multimedia

    Holland, C

    2003-01-01

    "The Particle Physics and Astronomy Research Council is to spend 16million [pounds] to create a massive computing Grid, equivalent to the world's second largest supercomputer after Japan's Earth Simulator computer" (1/2 page)

  4. Supercomputers open window of opportunity for nursing.

    Science.gov (United States)

    Meintz, S L

    1993-01-01

    A window of opportunity was opened for nurse researchers with the High Performance Computing and Communications (HPCC) initiative in President Bush's 1992 fiscal-year budget. Nursing research moved into the high-performance computing environment through the University of Nevada Las Vegas/Cray Project for Nursing and Health Data Research (PNHDR). USing the CRAY YMP 2/216 supercomputer, the PNHDR established the validity of a supercomputer platform for nursing research. In addition, the research has identified a paradigm shift in statistical analysis, delineated actual and potential barriers to nursing research in a supercomputing environment, conceptualized a new branch of nursing science called Nurmetrics, and discovered new avenue for nursing research utilizing supercomputing tools.

  5. Predicting Protein Structure Using Parallel Genetic Algorithms.

    Science.gov (United States)

    1994-12-01

    34 IEEE Transactions on Systems, Man and Cybernetics, 10(9) (September 1980). 16. De Jong, Kenneth A. "On Using Genetic Algoriths to Search Program...By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H...iiLite-d Approved for public release; distribution unlimited AFIT/ GCS /ENG/94D-03 Predicting Protein Structure Using Parallel Genetic Algorithms

  6. Protein Secondary Structure Prediction Using Dynamic Programming

    Institute of Scientific and Technical Information of China (English)

    Jing ZHAO; Pei-Ming SONG; Qing FANG; Jian-Hua LUO

    2005-01-01

    In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%.

  7. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    Directory of Open Access Journals (Sweden)

    László Kaján

    2013-01-01

    Full Text Available We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd, nuclear localization signals (predictnls, and intrinsically disordered regions (norsnet. We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  8. Cloud prediction of protein structure and function with PredictProtein for Debian.

    Science.gov (United States)

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  9. TOP500 Supercomputers for November 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-11-08

    24th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/IBM BlueGene/L and NASA/SGI's Columbia gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 24th edition of the TOP500 list of the worlds fastest supercomputers was released today (November 8, 2004) at the SC2004 Conference in Pittsburgh, Pa.

  10. Misleading Performance Reporting in the Supercomputing Field

    Directory of Open Access Journals (Sweden)

    David H. Bailey

    1992-01-01

    Full Text Available In a previous humorous note, I outlined 12 ways in which performance figures for scientific supercomputers can be distorted. In this paper, the problem of potentially misleading performance reporting is discussed in detail. Included are some examples that have appeared in recent published scientific papers. This paper also includes some proposed guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion in the field of supercomputing.

  11. Simulating Galactic Winds on Supercomputers

    Science.gov (United States)

    Schneider, Evan

    2017-01-01

    Galactic winds are a ubiquitous feature of rapidly star-forming galaxies. Observations of nearby galaxies have shown that winds are complex, multiphase phenomena, comprised of outflowing gas at a large range of densities, temperatures, and velocities. Describing how starburst-driven outflows originate, evolve, and affect the circumgalactic medium and gas supply of galaxies is an important challenge for theories of galaxy evolution. In this talk, I will discuss how we are using a new hydrodynamics code, Cholla, to improve our understanding of galactic winds. Cholla is a massively parallel, GPU-based code that takes advantage of specialized hardware on the newest generation of supercomputers. With Cholla, we can perform large, three-dimensional simulations of multiphase outflows, allowing us to track the coupling of mass and momentum between gas phases across hundreds of parsecs at sub-parsec resolution. The results of our recent simulations demonstrate that the evolution of cool gas in galactic winds is highly dependent on the initial structure of embedded clouds. In particular, we find that turbulent density structures lead to more efficient mass transfer from cool to hot phases of the wind. I will discuss the implications of our results both for the incorporation of winds into cosmological simulations, and for interpretations of observed multiphase winds and the circumgalatic medium of nearby galaxies.

  12. HMM in Predicting Protein Secondary Structure

    Institute of Scientific and Technical Information of China (English)

    Huang Jing; Shi Feng; Zou Xiu-fen; Li Yuan-xiang; Zhou Huai-bei

    2003-01-01

    We introduced a new method --duration Hidden Markov Model (dHMM) to predicate the secondary structure of Protein. In our study, we divide the basic second structure of protein into three parts: H (α-Helix), E (β-sheet) and O (others, include coil and turn). HMM is a kind of probabilistic model which more thinking of the interaction between adjacent amino acids (these interaction were represented by transmit probability), and we use genetic algorithm to determine the nodel parameters. After improving on the model and fixed on the parameters of the model, we write aprogram HMMPS. Our example shows that HMM is a nice method for protein secondary structure prediction.

  13. Predicting Secretory Proteins with SignalP

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2017-01-01

    SignalP is the currently most widely used program for prediction of signal peptides from amino acid sequences. Proteins with signal peptides are targeted to the secretory pathway, but are not necessarily secreted. After a brief introduction to the biology of signal peptides and the history...

  14. Is protein structure prediction still an enigma?

    African Journals Online (AJOL)

    STORAGESEVER

    2008-12-29

    Dec 29, 2008 ... data bank during the last few years, it is highly desirable to develop some rapid and effective computational methods to predict the structure of new proteins so as to expedite the process of ... problem of much scientific interest and it is still not clear ..... the state-of-the-art alignment-based methods yields.

  15. A Software Pipeline for Protein Structure Prediction

    Science.gov (United States)

    2006-11-01

    Department of Cell Biology and Biochemistry U. S. Army Medical Research Institute of Infectious Diseases Frederick, MD 21702 In-Chul Yeh, Nela ...Suite for Protein Structure Prediction Michael S. Lee U. S. Army Research Laboratory In-Chul Yeh, Nela Zavaljevski, Paul Wilson, and Jaques Reifman

  16. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction

    Directory of Open Access Journals (Sweden)

    Zheng Sun

    2014-01-01

    Full Text Available WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1 and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA, two integrin beta (ITGB, and one syndecan (SDC. Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp.

  17. GASP: Gapped Ancestral Sequence Prediction for proteins

    Directory of Open Access Journals (Sweden)

    Shields Denis C

    2004-09-01

    Full Text Available Abstract Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction, for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.

  18. GASP: Gapped Ancestral Sequence Prediction for proteins.

    Science.gov (United States)

    Edwards, Richard J; Shields, Denis C

    2004-09-06

    The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.

  19. Protein-protein interaction predictions using text mining methods.

    Science.gov (United States)

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Iliopoulos, Ioannis

    2015-03-01

    It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein-protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools.

  20. Computational prediction of protein-protein interactions in Leishmania predicted proteomes.

    Directory of Open Access Journals (Sweden)

    Antonio M Rezende

    Full Text Available The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks

  1. GREEN SUPERCOMPUTING IN A DESKTOP BOX

    Energy Technology Data Exchange (ETDEWEB)

    HSU, CHUNG-HSING [Los Alamos National Laboratory; FENG, WU-CHUN [NON LANL; CHING, AVERY [NON LANL

    2007-01-17

    The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicated computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.

  2. A training program for scientific supercomputing users

    Energy Technology Data Exchange (ETDEWEB)

    Hanson, F.; Moher, T.; Sabelli, N.; Solem, A.

    1988-01-01

    There is need for a mechanism to transfer supercomputing technology into the hands of scientists and engineers in such a way that they will acquire a foundation of knowledge that will permit integration of supercomputing as a tool in their research. Most computing center training emphasizes computer-specific information about how to use a particular computer system; most academic programs teach concepts to computer scientists. Only a few brief courses and new programs are designed for computational scientists. This paper describes an eleven-week training program aimed principally at graduate and postdoctoral students in computationally-intensive fields. The program is designed to balance the specificity of computing center courses, the abstractness of computer science courses, and the personal contact of traditional apprentice approaches. It is based on the experience of computer scientists and computational scientists, and consists of seminars and clinics given by many visiting and local faculty. It covers a variety of supercomputing concepts, issues, and practices related to architecture, operating systems, software design, numerical considerations, code optimization, graphics, communications, and networks. Its research component encourages understanding of scientific computing and supercomputer hardware issues. Flexibility in thinking about computing needs is emphasized by the use of several different supercomputer architectures, such as the Cray X/MP48 at the National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign, IBM 3090 600E/VF at the Cornell National Supercomputer Facility, and Alliant FX/8 at the Advanced Computing Research Facility at Argonne National Laboratory. 11 refs., 6 tabs.

  3. Predicting Resistance Mutations Using Protein Design Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  4. Protein Structure Prediction with Evolutionary Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  5. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained......The primary structure of a protein is the sequence of its amino acids. The secondary structure describes structural properties of the molecule such as which parts of it form sheets, helices or coils. Spacial and other properties are described by the higher order structures. The classification task...

  6. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.

    Science.gov (United States)

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.

  7. TOP500 Supercomputers for June 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-06-23

    21st Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 21st edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2003). The Earth Simulator supercomputer built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan, with its Linpack benchmark performance of 35.86 Tflop/s (teraflops or trillions of calculations per second), retains the number one position. The number 2 position is held by the re-measured ASCI Q system at Los Alamos National Laboratory. With 13.88 Tflop/s, it is the second system ever to exceed the 10 Tflop/smark. ASCIQ was built by Hewlett-Packard and is based on the AlphaServerSC computer system.

  8. TOP500 Supercomputers for June 2002

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2002-06-20

    19th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 19th edition of the TOP500 list of the worlds fastest supercomputers was released today (June 20, 2002). The recently installed Earth Simulator supercomputer at the Earth Simulator Center in Yokohama, Japan, is as expected the clear new number 1. Its performance of 35.86 Tflop/s (trillions of calculations per second) running the Linpack benchmark is almost five times higher than the performance of the now No.2 IBM ASCI White system at Lawrence Livermore National Laboratory (7.2 Tflop/s). This powerful leap frogging to the top by a system so much faster than the previous top system is unparalleled in the history of the TOP500.

  9. TOP500 Supercomputers for November 2002

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2002-11-15

    20th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 20th edition of the TOP500 list of the world's fastest supercomputers was released today (November 15, 2002). The Earth Simulator supercomputer installed earlier this year at the Earth Simulator Center in Yokohama, Japan, is with its Linpack benchmark performance of 35.86 Tflop/s (trillions of calculations per second) retains the number one position. The No.2 and No.3 positions are held by two new, identical ASCI Q systems at Los Alamos National Laboratory (7.73Tflop/s each). These systems are built by Hewlett-Packard and based on the Alpha Server SC computer system.

  10. Input/output behavior of supercomputing applications

    Science.gov (United States)

    Miller, Ethan L.

    1991-01-01

    The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.

  11. GPUs: An Oasis in the Supercomputing Desert

    CERN Document Server

    Kamleh, Waseem

    2012-01-01

    A novel metric is introduced to compare the supercomputing resources available to academic researchers on a national basis. Data from the supercomputing Top 500 and the top 500 universities in the Academic Ranking of World Universities (ARWU) are combined to form the proposed "500/500" score for a given country. Australia scores poorly in the 500/500 metric when compared with other countries with a similar ARWU ranking, an indication that HPC-based researchers in Australia are at a relative disadvantage with respect to their overseas competitors. For HPC problems where single precision is sufficient, commodity GPUs provide a cost-effective means of quenching the computational thirst of otherwise parched Lattice practitioners traversing the Australian supercomputing desert. We explore some of the more difficult terrain in single precision territory, finding that BiCGStab is unreliable in single precision at large lattice sizes. We test the CGNE and CGNR forms of the conjugate gradient method on the normal equa...

  12. Floating point arithmetic in future supercomputers

    Science.gov (United States)

    Bailey, David H.; Barton, John T.; Simon, Horst D.; Fouts, Martin J.

    1989-01-01

    Considerations in the floating-point design of a supercomputer are discussed. Particular attention is given to word size, hardware support for extended precision, format, and accuracy characteristics. These issues are discussed from the perspective of the Numerical Aerodynamic Simulation Systems Division at NASA Ames. The features believed to be most important for a future supercomputer floating-point design include: (1) a 64-bit IEEE floating-point format with 11 exponent bits, 52 mantissa bits, and one sign bit and (2) hardware support for reasonably fast double-precision arithmetic.

  13. A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions.

    Science.gov (United States)

    Birlutiu, Adriana; d'Alché-Buc, Florence; Heskes, Tom

    2015-01-01

    Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.

  14. Predicting protein-protein interactions in the post synaptic density.

    Science.gov (United States)

    Bar-shira, Ossnat; Chechik, Gal

    2013-09-01

    The post synaptic density (PSD) is a specialization of the cytoskeleton at the synaptic junction, composed of hundreds of different proteins. Characterizing the protein components of the PSD and their interactions can help elucidate the mechanism of long-term changes in synaptic plasticity, which underlie learning and memory. Unfortunately, our knowledge of the proteome and interactome of the PSD is still partial and noisy. In this study we describe a computational framework to improve the reconstruction of the PSD network. The approach is based on learning the characteristics of PSD protein interactions from a set of trusted interactions, expanding this set with data collected from large scale repositories, and then predicting novel interaction with proteins that are suspected to reside in the PSD. Using this method we obtained thirty predicted interactions, with more than half of which having supporting evidence in the literature. We discuss in details two of these new interactions, Lrrtm1 with PSD-95 and Src with Capg. The first may take part in a mechanism underlying glutamatergic dysfunction in schizophrenia. The second suggests an alternative mechanism to regulate dendritic spines maturation.

  15. PCA for predicting quaternary structure of protein

    Institute of Scientific and Technical Information of China (English)

    Tong WANG; Hongbin SHEN; Lixiu YAO; Jie YANG; Kuochen CHOU

    2008-01-01

    The number and arrangement of subunits that form a protein are referred to as quaternary structure. Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system. With the explosion of protein sequences generated in the Post-Genomic Age, it is vital to develop an automated method to deal with such a challenge. To explore this prob-lem, we adopted an approach based on the pseudo position-specific score matrix (Pse-PSSM) descriptor, proposed by Chou and Shen, representing a protein sample. The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion. However, incorporating all these effects into a descriptor may cause 'high dimension disaster'. To over-come such a problem, the fusion approach was adopted by Chou and Shen. A completely different approach, linear dimensionality reduction algorithm principal component analysis (PCA) is introduced to extract key features from the high-dimensional Pse-PSSM space. The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector. The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems, such as predicting the quaternary struc-ture of proteins.

  16. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile

    Institute of Scientific and Technical Information of China (English)

    GAO Lei; LI Xia; GUO Zheng; ZHU MingZhu; LI YanHui; RAO ShaoQi

    2007-01-01

    GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to "biology process" by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.

  17. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.

  18. Protein complex prediction based on k-connected subgraphs in protein interaction network

    OpenAIRE

    Habibi Mahnaz; Eslahchi Changiz; Wong Limsoon

    2010-01-01

    Abstract Background Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph. Results We propose a more appropriate protein complex prediction method, CFA, that is based on ...

  19. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  20. Prediction of Protein-protein Interactions on the Basis of Evolutionary Conservation of Protein Functions

    Directory of Open Access Journals (Sweden)

    Ekaterina Kotelnikova

    2007-01-01

    Full Text Available Motivation: Although a great deal of progress is being made in the development of fast and reliable experimental techniques to extract genome-wide networks of protein-protein and protein-DNA interactions, the sequencing of new genomes proceeds at an even faster rate. That is why there is a considerable need for reliable methods of in-silico prediction of protein interaction based solely on sequence similarity information and known interactions from well-studied organisms. This problem can be solved if a dependency exists between sequence similarity and the conservation of the proteins’ functions.Results: In this paper, we introduce a novel probabilistic method for prediction of protein-protein interactions using a new empirical probabilistic formula describing the loss of interactions between homologous proteins during the course of evolution. This formula describes an evolutional process quite similar to the process of the Earth’s population growth. In addition, our method favors predictions confi rmed by several interacting pairs over predictions coming from a single interacting pair. Our approach is useful in working with “noisy” data such as those coming from high-throughput experiments. We have generated predictions for fi ve “model” organisms: H. sapiens, D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae and evaluated the quality of these predictions.

  1. A scoring framework for predicting protein structures

    Science.gov (United States)

    Zou, Xiaoqin

    2013-03-01

    We have developed a statistical mechanics-based iterative method to extract statistical atomic interaction potentials from known, non-redundant protein structures. Our method circumvents the long-standing reference state problem in deriving traditional knowledge-based scoring functions, by using rapid iterations through a physical, global convergence function. The rapid convergence of this physics-based method, unlike other parameter optimization methods, warrants the feasibility of deriving distance-dependent, all-atom statistical potentials to keep the scoring accuracy. The derived potentials, referred to as ITScore/Pro, have been validated using three diverse benchmarks: the high-resolution decoy set, the AMBER benchmark decoy set, and the CASP8 decoy set. Significant improvement in performance has been achieved. Finally, comparisons between the potentials of our model and potentials of a knowledge-based scoring function with a randomized reference state have revealed the reason for the better performance of our scoring function, which could provide useful insight into the development of other physical scoring functions. The potentials developed in the present study are generally applicable for structural selection in protein structure prediction.

  2. Structure prediction of magnetosome-associated proteins

    Directory of Open Access Journals (Sweden)

    Hila eNudelman

    2014-01-01

    Full Text Available Magnetotactic bacteria (MTB are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4 or greigite (Fe3S4 crystals ordered into chains along the cell. Magnetosome formation, assembly and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs. Most MAP-encoding genes are located in a conserved genomic region – the magnetosome island (MAI. The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models’ overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs.

  3. Adventures in Supercomputing: An innovative program

    Energy Technology Data Exchange (ETDEWEB)

    Summers, B.G.; Hicks, H.R.; Oliver, C.E.

    1995-06-01

    Within the realm of education, seldom does an innovative program become available with the potential to change an educator`s teaching methodology and serve as a spur to systemic reform. The Adventures in Supercomputing (AiS) program, sponsored by the Department of Energy, is such a program. Adventures in Supercomputing is a program for high school and middle school teachers. It has helped to change the teaching paradigm of many of the teachers involved in the program from a teacher-centered classroom to a student-centered classroom. ``A student-centered classroom offers better opportunities for development of internal motivation, planning skills, goal setting and perseverance than does the traditional teacher-directed mode``. Not only is the process of teaching changed, but evidences of systemic reform are beginning to surface. After describing the program, the authors discuss the teaching strategies being used and the evidences of systemic change in many of the AiS schools in Tennessee.

  4. Solidification in a Supercomputer: From Crystal Nuclei to Dendrite Assemblages

    Science.gov (United States)

    Shibuta, Yasushi; Ohno, Munekazu; Takaki, Tomohiro

    2015-08-01

    Thanks to the recent progress in high-performance computational environments, the range of applications of computational metallurgy is expanding rapidly. In this paper, cutting-edge simulations of solidification from atomic to microstructural levels performed on a graphics processing unit (GPU) architecture are introduced with a brief introduction to advances in computational studies on solidification. In particular, million-atom molecular dynamics simulations captured the spontaneous evolution of anisotropy in a solid nucleus in an undercooled melt and homogeneous nucleation without any inducing factor, which is followed by grain growth. At the microstructural level, the quantitative phase-field model has been gaining importance as a powerful tool for predicting solidification microstructures. In this paper, the convergence behavior of simulation results obtained with this model is discussed, in detail. Such convergence ensures the reliability of results of phase-field simulations. Using the quantitative phase-field model, the competitive growth of dendrite assemblages during the directional solidification of a binary alloy bicrystal at the millimeter scale is examined by performing two- and three-dimensional large-scale simulations by multi-GPU computation on the supercomputer, TSUBAME2.5. This cutting-edge approach using a GPU supercomputer is opening a new phase in computational metallurgy.

  5. Protein function prediction using neighbor relativity in protein-protein interaction network.

    Science.gov (United States)

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. Integrating protein-protein interactions and text mining for protein function prediction

    Directory of Open Access Journals (Sweden)

    Leser Ulf

    2008-07-01

    Full Text Available Abstract Background Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature. Results Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision according to the verifications from a trained curator. Conclusion Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.

  7. Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu

    2013-12-01

    In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

  8. New approach for predicting protein-protein interactions

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    @@ Protein-protein interactions (PPIs) are of vital importance for virtually all processes of a living cell. The study of these associations of protein molecules could improve people's understanding of diseases and provide basis for therapeutic approaches.

  9. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    Science.gov (United States)

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  10. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  11. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    OpenAIRE

    Arvind Kumar Tiwari; Rajeev Srivastava

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational int...

  12. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...... can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties...

  13. Computational Methods for Protein Structure Prediction and Modeling Volume 2: Structure Prediction

    CERN Document Server

    Xu, Ying; Liang, Jie

    2007-01-01

    Volume 2 of this two-volume sequence focuses on protein structure prediction and includes protein threading, De novo methods, applications to membrane proteins and protein complexes, structure-based drug design, as well as structure prediction as a systems problem. A series of appendices review the biological and chemical basics related to protein structure, computer science for structural informatics, and prerequisite mathematics and statistics.

  14. Data-intensive computing on numerically-insensitive supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Ahrens, James P [Los Alamos National Laboratory; Fasel, Patricia K [Los Alamos National Laboratory; Habib, Salman [Los Alamos National Laboratory; Heitmann, Katrin [Los Alamos National Laboratory; Lo, Li - Ta [Los Alamos National Laboratory; Patchett, John M [Los Alamos National Laboratory; Williams, Sean J [Los Alamos National Laboratory; Woodring, Jonathan L [Los Alamos National Laboratory; Wu, Joshua [Los Alamos National Laboratory; Hsu, Chung - Hsing [ONL

    2010-12-03

    With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.

  15. Refinement of herpesvirus B-capsid structure on parallel supercomputers.

    Science.gov (United States)

    Zhou, Z H; Chiu, W; Haskell, K; Spears, H; Jakana, J; Rixon, F J; Scott, L R

    1998-01-01

    Electron cryomicroscopy and icosahedral reconstruction are used to obtain the three-dimensional structure of the 1250-A-diameter herpesvirus B-capsid. The centers and orientations of particles in focal pairs of 400-kV, spot-scan micrographs are determined and iteratively refined by common-lines-based local and global refinement procedures. We describe the rationale behind choosing shared-memory multiprocessor computers for executing the global refinement, which is the most computationally intensive step in the reconstruction procedure. This refinement has been implemented on three different shared-memory supercomputers. The speedup and efficiency are evaluated by using test data sets with different numbers of particles and processors. Using this parallel refinement program, we refine the herpesvirus B-capsid from 355-particle images to 13-A resolution. The map shows new structural features and interactions of the protein subunits in the three distinct morphological units: penton, hexon, and triplex of this T = 16 icosahedral particle.

  16. Parallel supercomputers for lattice gauge theory.

    Science.gov (United States)

    Brown, F R; Christ, N H

    1988-03-18

    During the past 10 years, particle physicists have increasingly employed numerical simulation to answer fundamental theoretical questions about the properties of quarks and gluons. The enormous computer resources required by quantum chromodynamic calculations have inspired the design and construction of very powerful, highly parallel, dedicated computers optimized for this work. This article gives a brief description of the numerical structure and current status of these large-scale lattice gauge theory calculations, with emphasis on the computational demands they make. The architecture, present state, and potential of these special-purpose supercomputers is described. It is argued that a numerical solution of low energy quantum chromodynamics may well be achieved by these machines.

  17. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    Directory of Open Access Journals (Sweden)

    Xin Deng

    2015-07-01

    Full Text Available Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.

  18. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions.

    Science.gov (United States)

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-07-07

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale.

  19. C-reactive protein, fibrinogen, and cardiovascular disease prediction

    DEFF Research Database (Denmark)

    Kaptoge, Stephen; Di Angelantonio, Emanuele; Pennells, Lisa

    2012-01-01

    There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events.......There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events....

  20. Prediction of Factors Determining Changes in Stability in Protein Mutants

    OpenAIRE

    Parthiban, Vijayarangakannan

    2006-01-01

    Analysing the factors behind protein stability is a key research topic in molecular biology and has direct implications on protein structure prediction and protein-protein docking solutions. Protein stability upon point mutations were analysed using a distance dependant pair potential representing mainly through-space interactions and torsion angle potential representing neighbouring effects as a basic statistical mechanical setup for the analysis. The synergetic effect of accessible surface ...

  1. Refining intra-protein contact prediction by graph analysis

    Directory of Open Access Journals (Sweden)

    Eyal Eran

    2007-05-01

    Full Text Available Abstract Background Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence. Results We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%. Conclusion Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.

  2. A survey of computational intelligence techniques in protein function prediction.

    Science.gov (United States)

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.

  3. Supercomputing Centers and Electricity Service Providers

    DEFF Research Database (Denmark)

    Patki, Tapasya; Bates, Natalie; Ghatikar, Girish

    2016-01-01

    Supercomputing Centers (SCs) have high and variable power demands, which increase the challenges of the Electricity Service Providers (ESPs) with regards to efficient electricity distribution and reliable grid operation. High penetration of renewable energy generation further exacerbates this pro......Supercomputing Centers (SCs) have high and variable power demands, which increase the challenges of the Electricity Service Providers (ESPs) with regards to efficient electricity distribution and reliable grid operation. High penetration of renewable energy generation further exacerbates...... from a detailed, quantitative survey-based analysis and compare the perspectives of the European grid and SCs to the ones of the United States (US). We then show that contrary to the expectation, SCs in the US are more open toward cooperating and developing demand-management strategies with their ESPs...... (LRZ). We conclude that perspectives on demand management are dependent on the electricity market and pricing in the geographical region and on the degree of control that a particular SC has in terms of power-purchase negotiation....

  4. Multi-petascale highly efficient parallel supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O' Brien, John K.; O' Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  5. A workbench for tera-flop supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Resch, M.M.; Kuester, U.; Mueller, M.S.; Lang, U. [High Performance Computing Center Stuttgart (HLRS), Stuttgart (Germany)

    2003-07-01

    Supercomputers currently reach a peak performance in the range of TFlop/s. With but one exception - the Japanese Earth Simulator - none of these systems has so far been able to also show a level of sustained performance for a variety of applications that comes close to the peak performance. Sustained TFlop/s are therefore rarely seen. The reasons are manifold and are well known: Bandwidth and latency both for main memory and for the internal network are the key internal technical problems. Cache hierarchies with large caches can bring relief but are no remedy to the problem. However, there are not only technical problems that inhibit the full exploitation by scientists of the potential of modern supercomputers. More and more organizational issues come to the forefront. This paper shows the approach of the High Performance Computing Center Stuttgart (HLRS) to deliver a sustained performance of TFlop/s for a wide range of applications from a large group of users spread over Germany. The core of the concept is the role of the data. Around this we design a simulation workbench that hides the complexity of interacting computers, networks and file systems from the user. (authors)

  6. Support vector machine approach for protein subcellular localization prediction.

    Science.gov (United States)

    Hua, S; Sun, Z

    2001-08-01

    Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.

  7. CNNcon: improved protein contact maps prediction using cascaded neural networks.

    Directory of Open Access Journals (Sweden)

    Wang Ding

    Full Text Available BACKGROUNDS: Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly. METHODS: CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction. RESULTS: The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective

  8. Seismic signal processing on heterogeneous supercomputers

    Science.gov (United States)

    Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas

    2015-04-01

    The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that

  9. Protein secondary structure prediction using deep convolutional neural fields

    OpenAIRE

    Sheng Wang; Jian Peng; Jianzhu Ma; Jinbo Xu

    2015-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF)...

  10. Signal peptides and protein localization prediction

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2005-01-01

    In 1999, the Nobel prize in Physiology or Medicine was awarded to Gunther Blobel “for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell”. Since the subcellular localization of a protein is an important clue to its function...

  11. Predictable tuning of protein expression in bacteria

    DEFF Research Database (Denmark)

    Bonde, Mads; Pedersen, Margit; Klausen, Michael Schantz

    2016-01-01

    We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expressi...

  12. Predictive Protein Toxicity and Its Use in Risk Assessment.

    Science.gov (United States)

    Franceschi, Niccolò; Paraskevopoulos, Konstantinos; Waigmann, Elisabeth; Ramon, Matthew

    2017-06-01

    In the EU novel proteins used in food or feed are assessed for their potential toxic effects in humans and livestock animals. The discovery of clear molecular features linked to the toxicity of a protein may be an important step towards the use of predictive protein toxicity in risk assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Support vector machine for predicting protein interactions using domain scores

    Institute of Scientific and Technical Information of China (English)

    PENG Xin-jun; WANG Yi-fei

    2009-01-01

    Protein-protein interactions play a crucial role in the cellular process such as metabolic pathways and immunological recognition. This paper presents a new domain score-based support vector machine (SVM) to infer protein interactions, which can be used not only to explore all possible domain interactions by the kernel method, but also to reflect the evolutionary conservation of domains in proteins by using the domain scores of proteins. The experimental result on the Saccharomyces cerevisiae dataset demonstrates that this approach can predict protein-protein interactions with higher performances compared to the existing approaches.

  14. Most Social Scientists Shun Free Use of Supercomputers.

    Science.gov (United States)

    Kiernan, Vincent

    1998-01-01

    Social scientists, who frequently complain that the federal government spends too little on them, are passing up what scholars in the physical and natural sciences see as the government's best give-aways: free access to supercomputers. Some social scientists say the supercomputers are difficult to use; others find desktop computers provide…

  15. Text Mining Improves Prediction of Protein Functional Sites

    Science.gov (United States)

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  16. Text mining improves prediction of protein functional sites.

    Science.gov (United States)

    Verspoor, Karin M; Cohn, Judith D; Ravikumar, Komandur E; Wall, Michael E

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

  17. Text mining improves prediction of protein functional sites.

    Directory of Open Access Journals (Sweden)

    Karin M Verspoor

    Full Text Available We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites. The structure analysis was carried out using Dynamics Perturbation Analysis (DPA, which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

  18. PPT-DB: the protein property prediction and testing database.

    Science.gov (United States)

    Wishart, David S; Arndt, David; Berjanskii, Mark; Guo, An Chi; Shi, Yi; Shrivastava, Savita; Zhou, Jianjun; Zhou, You; Lin, Guohui

    2008-01-01

    The protein property prediction and testing database (PPT-DB) is a database housing nearly 30 carefully curated databases, each of which contains commonly predicted protein property information. These properties include both structural (i.e. secondary structure, contact order, disulfide pairing) and dynamic (i.e. order parameters, B-factors, folding rates) features that have been measured, derived or tabulated from a variety of sources. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable and easily queried repository of predictable or 'derived' protein property data. In this role, PPT-DB can serve as a one-stop, fully standardized repository for developers to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish to create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users may query PPT-DB with a sequence of interest and have a specific property predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. PPT-DB exploits the well-known fact that protein structure and dynamic properties are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 85-95% correct (for categorical predictions, such as secondary structure) or exhibit correlations of >0.80 (for numeric predictions, such as accessible surface area). This performance is 10-20% better than what is typically obtained from standard 'ab initio' predictions. PPT-DB, its prediction utilities and all of its contents are available at http://www.pptdb.ca.

  19. Supercomputing - Use Cases, Advances, The Future (2/2)

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the second day, we will focus on software and software paradigms driving supercomputers, workloads that need supercomputing treatment, advances in technology and possible future developments. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and i...

  20. Probabilistic protein function prediction from heterogeneous genome-wide data.

    Directory of Open Access Journals (Sweden)

    Naoki Nariai

    Full Text Available Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.

  1. Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier

    Directory of Open Access Journals (Sweden)

    Haijiang Geng

    2015-01-01

    Full Text Available Protein functions through interactions with other proteins and biomolecules and these interactions occur on the so-called interface residues of the protein sequences. Identifying interface residues makes us better understand the biological mechanism of protein interaction. Meanwhile, information about the interface residues contributes to the understanding of metabolic, signal transduction networks and indicates directions in drug designing. In recent years, researchers have focused on developing new computational methods for predicting protein interface residues. Here we creatively used a 181-dimension protein sequence feature vector as input to the Naive Bayes Classifier- (NBC- based method to predict interaction sites in protein-protein complexes interaction. The prediction of interaction sites in protein interactions is regarded as an amino acid residue binary classification problem by applying NBC with protein sequence features. Independent test results suggested that Naive Bayes Classifier-based method with the protein sequence features as input vectors performed well.

  2. Predictive and comparative analysis of Ebolavirus proteins

    OpenAIRE

    Cong, Qian; Pei, Jimin; Grishin, Nick V.

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, a...

  3. The 82-plex plasma protein signature that predicts increasing inflammation

    DEFF Research Database (Denmark)

    Tepel, Martin; Beck, Hans C; Tan, Qihua;

    2015-01-01

    transplant recipients and quantified 359 plasma proteins simultaneously using nano-Liquid-Chromatography-Tandem Mass-Spectrometry in individual samples and plasma C-reactive protein on the index day and the next day. Next-day C-reactive protein increased in 59 patients whereas it decreased in 32 patients......The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney....... The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P protein signature (P 

  4. Simultaneous prediction of protein secondary structure and transmembrane spans.

    Science.gov (United States)

    Leman, Julia Koehler; Mueller, Ralf; Karakas, Mert; Woetzel, Nils; Meiler, Jens

    2013-07-01

    Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α-helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three-state secondary structure prediction, and 94.8% for three-state transmembrane span prediction. These accuracies are comparable to state-of-the-art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org.

  5. Predicting and validating protein interactions using network structure.

    Directory of Open Access Journals (Sweden)

    Pao-Yang Chen

    Full Text Available Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.

  6. Scalable prediction of compound-protein interactions using minwise hashing.

    Science.gov (United States)

    Tabei, Yasuo; Yamanishi, Yoshihiro

    2013-01-01

    The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.

  7. Computational Prediction of RNA-Binding Proteins and Binding Sites.

    Science.gov (United States)

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%-8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein-RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein-RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.

  8. Deep learning methods for protein torsion angle prediction.

    Science.gov (United States)

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  9. Will Your Next Supercomputer Come from Costco?

    Energy Technology Data Exchange (ETDEWEB)

    Farber, Rob

    2007-04-15

    A fun topic for April, one that is not an April fool’s joke, is that you can purchase a commodity 200+ Gflop (single-precision) Linux supercomputer for around $600 from your favorite electronic vendor. Yes, it’s true. Just walk in and ask for a Sony Playstation 3 (PS3), take it home and install Linux on it. IBM has provided an excellent tutorial for installing Linux and building applications at http://www-128.ibm.com/developerworks/power/library/pa-linuxps3-1. If you want to raise some eyebrows at work, then submit a purchase request for a Sony PS3 game console and watch the reactions as your paperwork wends its way through the procurement process.

  10. HPL and STREAM Benchmarks on SANAM Supercomputer

    KAUST Repository

    Bin Sulaiman, Riman A.

    2017-03-13

    SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.

  11. Multiprocessing on supercomputers for computational aerodynamics

    Science.gov (United States)

    Yarrow, Maurice; Mehta, Unmeel B.

    1991-01-01

    Little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPS or more) to improve turnaround time in computational aerodynamics. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, such improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) is applied through multitasking via a strategy that requires relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-Fortran-Unix interface. The existing code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor.

  12. The PMS project Poor Man's Supercomputer

    CERN Document Server

    Csikor, Ferenc; Hegedüs, P; Horváth, V K; Katz, S D; Piróth, A

    2001-01-01

    We briefly describe the Poor Man's Supercomputer (PMS) project that is carried out at Eotvos University, Budapest. The goal is to develop a cost effective, scalable, fast parallel computer to perform numerical calculations of physical problems that can be implemented on a lattice with nearest neighbour interactions. To reach this goal we developed the PMS architecture using PC components and designed a special, low cost communication hardware and the driver software for Linux OS. Our first implementation of the PMS includes 32 nodes (PMS1). The performance of the PMS1 was tested by Lattice Gauge Theory simulations. Using SU(3) pure gauge theory or bosonic MSSM on the PMS1 computer we obtained 3$/Mflops price-per-sustained performance ratio. The design of the special hardware and the communication driver are freely available upon request for non-profit organizations.

  13. The BlueGene/L Supercomputer

    CERN Document Server

    Bhanot, G V; Gara, A; Vranas, P M; Bhanot, Gyan; Chen, Dong; Gara, Alan; Vranas, Pavlos

    2002-01-01

    The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32x32x64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.

  14. NOXclass: prediction of protein-protein interaction types

    Directory of Open Access Journals (Sweden)

    Sommer Ingolf

    2006-01-01

    Full Text Available Abstract Background Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Results Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. Conclusion NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.

  15. Protein Structure Prediction Using String Kernels

    Science.gov (United States)

    2006-03-03

    Prediction using String Kernels 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER...consists of 4352 sequences from SCOP version 1.53 extracted from the Astral database, grouped into families and superfamilies. The dataset is processed

  16. PSPP: a protein structure prediction pipeline for computing clusters.

    Directory of Open Access Journals (Sweden)

    Michael S Lee

    Full Text Available BACKGROUND: Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster. METHODOLOGY/PRINCIPAL FINDINGS: The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML formats. So far, the pipeline has been used to study viral and bacterial proteomes. CONCLUSIONS: The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a

  17. Prediction of intrinsic disorder in proteins using MFDp2.

    Science.gov (United States)

    Mizianty, Marcin J; Uversky, Vladimir; Kurgan, Lukasz

    2014-01-01

    Intrinsically disordered proteins (IDPs) are either entirely disordered or contain disordered regions in their native state. IDPs were found to be abundant across all kingdoms of life, particularly in eukaryotes, and are implicated in numerous cellular processes. Experimental annotation of disorder lags behind the rapidly growing sizes of the protein databases and thus computational methods are used to close this gap and to investigate the disorder. MFDp2 is a novel webserver for accurate sequence-based prediction of protein disorder which also outputs well-described sequence-derived information that allows profiling the predicted disorder. We conveniently visualize sequence conservation, predicted secondary structure, relative solvent accessibility, and alignments to chains with annotated disorder. The webserver allows predictions for multiple proteins at the same time, includes help pages and tutorial, and the results can be downloaded as text-based (parsable) file. MFDp2 is freely available at http://biomine.ece.ualberta.ca/MFDp2/.

  18. A modified resonant recognition model to predict protein-protein interaction

    Institute of Scientific and Technical Information of China (English)

    LIU Xiang; WANG Yifei

    2007-01-01

    Proteins are fundamental components of all living cells and the protein-protein interaction plays an important role in vital movement.This paper briefly introduced the original Resonant Recognition Model (RRM),and then modified it by using the wavelet transform to acquire the Modified Resonant Recognition Model (MRRM).The key characteristic of the new model is that it can predict directly the proteinprotein interaction from the primary sequence,and the MRRM is more suitable than the RRM for this prediction.The results of numerical experiments show that the MRRM is effective for predicting the protein-protein interaction.

  19. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  20. Protein Structure and Function Prediction Using I-TASSER.

    Science.gov (United States)

    Yang, Jianyi; Zhang, Yang

    2015-12-17

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.

  1. Computational Prediction of RNA-Binding Proteins and Binding Sites

    Directory of Open Access Journals (Sweden)

    Jingna Si

    2015-11-01

    Full Text Available Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs. Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.

  2. HMMpTM: improving transmembrane protein topology prediction using phosphorylation and glycosylation site prediction.

    Science.gov (United States)

    Tsaousis, Georgios N; Bagos, Pantelis G; Hamodrakas, Stavros J

    2014-02-01

    During the last two decades a large number of computational methods have been developed for predicting transmembrane protein topology. Current predictors rely on topogenic signals in the protein sequence, such as the distribution of positively charged residues in extra-membrane loops and the existence of N-terminal signals. However, phosphorylation and glycosylation are post-translational modifications (PTMs) that occur in a compartment-specific manner and therefore the presence of a phosphorylation or glycosylation site in a transmembrane protein provides topological information. We examine the combination of phosphorylation and glycosylation site prediction with transmembrane protein topology prediction. We report the development of a Hidden Markov Model based method, capable of predicting the topology of transmembrane proteins and the existence of kinase specific phosphorylation and N/O-linked glycosylation sites along the protein sequence. Our method integrates a novel feature in transmembrane protein topology prediction, which results in improved performance for topology prediction and reliable prediction of phosphorylation and glycosylation sites. The method is freely available at http://bioinformatics.biol.uoa.gr/HMMpTM.

  3. Combining neural networks for protein secondary structure prediction

    DEFF Research Database (Denmark)

    Riis, Søren Kamaric

    1995-01-01

    In this paper structured neural networks are applied to the problem of predicting the secondary structure of proteins. A hierarchical approach is used where specialized neural networks are designed for each structural class and then combined using another neural network. The submodels are designe...... is better than most secondary structure prediction methods based on single sequences even though this model contains much fewer parameters...

  4. Engineering genes for predictable protein expression.

    Science.gov (United States)

    Gustafsson, Claes; Minshull, Jeremy; Govindarajan, Sridhar; Ness, Jon; Villalobos, Alan; Welch, Mark

    2012-05-01

    The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.

  5. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  6. Predicting nucleic acid binding interfaces from structural models of proteins.

    Science.gov (United States)

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  7. Study and prediction of secondary structure for membrane proteins

    NARCIS (Netherlands)

    Amirova, Svetlana R.; Milchevsky, Juri V.; Filatov, Ivan V.; Esipova, Natalia G.; Tumanyan, Vladimir G.

    2007-01-01

    In this paper we present a novel approach to membrane protein secondary structure prediction based on the statistical stepwise discriminant analysis method. A new aspect of our approach is the possibility to derive physical -chemical properties that may affect the formation of membrane protein secon

  8. A large-scale evaluation of computational protein function prediction

    NARCIS (Netherlands)

    Radivojac, P.; Clark, W.T.; Oron, T.R.; Schnoes, A.M.; Wittkop, T.; Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Friedberg, I.

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high

  9. Protein complex prediction based on k-connected subgraphs in protein interaction network

    Directory of Open Access Journals (Sweden)

    Habibi Mahnaz

    2010-09-01

    Full Text Available Abstract Background Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph. Results We propose a more appropriate protein complex prediction method, CFA, that is based on connectivity number on subgraphs. We evaluate CFA using several protein interaction networks on reference protein complexes in two benchmark data sets (MIPS and Aloy, containing 1142 and 61 known complexes respectively. We compare CFA to some existing protein complex prediction methods (CMC, MCL, PCP and RNSC in terms of recall and precision. We show that CFA predicts more complexes correctly at a competitive level of precision. Conclusions Many real complexes with different connectivity level in protein interaction network can be predicted based on connectivity number. Our CFA program and results are freely available from http://www.bioinf.cs.ipm.ir/softwares/cfa/CFA.rar.

  10. SIFT: predicting amino acid changes that affect protein function

    OpenAIRE

    Ng, Pauline C.; Henikoff, Steven

    2003-01-01

    Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in...

  11. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    Energy Technology Data Exchange (ETDEWEB)

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold (Hal) Edward; Stevenson, Joel O.; Benner, Robert E., Jr. (.,; .); Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  12. Improving protein function prediction methods with integrated literature data

    Directory of Open Access Journals (Sweden)

    Gabow Aaron P

    2008-04-01

    Full Text Available Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder

  13. Application of ACO algorithm in protein structure prediction

    Institute of Scientific and Technical Information of China (English)

    TANG Hao-xuan; QU Yi

    2009-01-01

    The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved AGO algorithm for the protein structure prediction. In the algorithm, the "lone" ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.

  14. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    Science.gov (United States)

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  15. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  16. Blind test of physics-based prediction of protein structures.

    Science.gov (United States)

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  17. WeFold: a coopetition for protein structure prediction.

    Science.gov (United States)

    Khoury, George A; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O; Faccioli, Rodrigo A; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A; Sieradzan, Adam K; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C B; Floudas, Christodoulos A; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A; Skolnick, Jeffrey; Crivelli, Silvia N

    2014-09-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.

  18. Hidden Markov models for prediction of protein features

    DEFF Research Database (Denmark)

    Bystroff, Christopher; Krogh, Anders

    2008-01-01

    Hidden Markov Models (HMMs) are an extremely versatile statistical representation that can be used to model any set of one-dimensional discrete symbol data. HMMs can model protein sequences in many ways, depending on what features of the protein are represented by the Markov states. For protein...... structure prediction, states have been chosen to represent either homologous sequence positions, local or secondary structure types, or transmembrane locality. The resulting models can be used to predict common ancestry, secondary or local structure, or membrane topology by applying one of the two standard...... algorithms for comparing a sequence to a model. In this chapter, we review those algorithms and discuss how HMMs have been constructed and refined for the purpose of protein structure prediction....

  19. Sann: solvent accessibility prediction of proteins by nearest neighbor method.

    Science.gov (United States)

    Joo, Keehyoung; Lee, Sung Jong; Lee, Jooyoung

    2012-07-01

    We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/~newton/sann/.

  20. A holistic molecular docking approach for predicting protein-protein complex structure

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    A holistic protein-protein molecular docking approach,HoDock,was established,composed of such steps as binding site prediction,initial complex structure sampling,refined complex structure sampling,structure clustering,scoring and final structure selection.This article explains the detailed steps and applications for CAPRI Target 39.The CAPRI result showed that three predicted binding site residues,A191HIS,B512ARG and B531ARG,were correct,and there were five submitted structures with a high fraction of correct receptor-ligand interface residues,indicating that this docking approach may improve prediction accuracy for protein-protein complex structures.

  1. Machine Learning Approaches for Predicting Protein Complex Similarity.

    Science.gov (United States)

    Farhoodi, Roshanak; Akbal-Delibas, Bahar; Haspel, Nurit

    2017-01-01

    Discriminating native-like structures from false positives with high accuracy is one of the biggest challenges in protein-protein docking. While there is an agreement on the existence of a relationship between various favorable intermolecular interactions (e.g., Van der Waals, electrostatic, and desolvation forces) and the similarity of a conformation to its native structure, the precise nature of this relationship is not known. Existing protein-protein docking methods typically formulate this relationship as a weighted sum of selected terms and calibrate their weights by using a training set to evaluate and rank candidate complexes. Despite improvements in the predictive power of recent docking methods, producing a large number of false positives by even state-of-the-art methods often leads to failure in predicting the correct binding of many complexes. With the aid of machine learning methods, we tested several approaches that not only rank candidate structures relative to each other but also predict how similar each candidate is to the native conformation. We trained a two-layer neural network, a multilayer neural network, and a network of Restricted Boltzmann Machines against extensive data sets of unbound complexes generated by RosettaDock and PyDock. We validated these methods with a set of refinement candidate structures. We were able to predict the root mean squared deviations (RMSDs) of protein complexes with a very small, often less than 1.5 Å, error margin when trained with structures that have RMSD values of up to 7 Å. In our most recent experiments with the protein samples having RMSD values up to 27 Å, the average prediction error was still relatively small, attesting to the potential of our approach in predicting the correct binding of protein-protein complexes.

  2. World's biggest 'virtual supercomputer' given the go-ahead

    CERN Multimedia

    2003-01-01

    "The Particle Physics and Astronomy Research Council has today announced GBP 16 million to create a massive computing Grid, equivalent to the world's second largest supercomputer after Japan's Earth Simulator computer" (1 page).

  3. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    Directory of Open Access Journals (Sweden)

    Humira eSonah

    2016-02-01

    Full Text Available Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and thousands of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are hundreds of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete through an analytical pipeline.

  4. Prediction of protein-protein interactions between viruses and human by an SVM model

    Directory of Open Access Journals (Sweden)

    Cui Guangyu

    2012-05-01

    Full Text Available Abstract Background Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species. Results We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV and hepatitis C virus (HCV, our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO annotations of proteins, we predicted new interactions between virus proteins and human proteins. Conclusions Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1 it enables a prediction model to achieve a better performance than other representations, (2 it generates feature vectors of fixed length regardless of the sequence length, and (3 the same representation is applicable to different types of proteins.

  5. Predicting Protein Subcellular Localization: Past, Present, and Future

    Institute of Scientific and Technical Information of China (English)

    Pierre D(o)nnes; Annette H(o)glund

    2004-01-01

    Functional characterization of every single protein is a major challenge of the postgenomic era. The large-scale analysis of a cell's proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user.This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.

  6. Feature Fusion Based SVM Classifier for Protein Subcellular Localization Prediction.

    Science.gov (United States)

    Rahman, Julia; Mondal, Md Nazrul Islam; Islam, Md Khaled Ben; Hasan, Md Al Mehedi

    2016-12-18

    For the importance of protein subcellular localization in different branches of life science and drug discovery, researchers have focused their attentions on protein subcellular localization prediction. Effective representation of features from protein sequences plays a most vital role in protein subcellular localization prediction specially in case of machine learning techniques. Single feature representation-like pseudo amino acid composition (PseAAC), physiochemical property models (PPM), and amino acid index distribution (AAID) contains insufficient information from protein sequences. To deal with such problems, we have proposed two feature fusion representations, AAIDPAAC and PPMPAAC, to work with Support Vector Machine classifiers, which fused PseAAC with PPM and AAID accordingly. We have evaluated the performance for both single and fused feature representation of a Gram-negative bacterial dataset. We have got at least 3% more actual accuracy by AAIDPAAC and 2% more locative accuracy by PPMPAAC than single feature representation.

  7. Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-10-01

    Full Text Available Abstract Background With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. Results In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. Conclusions By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.

  8. Predicting RNA-Protein Interactions Using Only Sequence Information

    Directory of Open Access Journals (Sweden)

    Muppirala Usha K

    2011-12-01

    Full Text Available Abstract Background RNA-protein interactions (RPIs play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. Results We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB, RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC curve of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99% of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Conclusions Our experiments with RPISeq demonstrate that RNA-protein interactions can be

  9. Efficient prediction of co-complexed proteins based on coevolution.

    Directory of Open Access Journals (Sweden)

    Damien M de Vienne

    Full Text Available The prediction of the network of protein-protein interactions (PPI of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure.We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method.A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli.

  10. Exploiting protein flexibility to predict the location of allosteric sites

    Directory of Open Access Journals (Sweden)

    Panjkovich Alejandro

    2012-10-01

    Full Text Available Abstract Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity, by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing

  11. Prediction of Protein Thermostability by an Efficient Neural Network Approach

    Directory of Open Access Journals (Sweden)

    Jalal Rezaeenour

    2016-10-01

    Full Text Available Introduction: Manipulation of protein stability is important for understanding the principles that govern protein thermostability, both in basic research and industrial applications. Various data mining techniques exist for prediction of thermostable proteins. Furthermore, ANN methods have attracted significant attention for prediction of thermostability, because they constitute an appropriate approach to mapping the non-linear input-output relationships and massive parallel computing. Method: An Extreme Learning Machine (ELM was applied to estimate thermal behavior of 1289 proteins. In the proposed algorithm, the parameters of ELM were optimized using a Genetic Algorithm (GA, which tuned a set of input variables, hidden layer biases, and input weights, to and enhance the prediction performance. The method was executed on a set of amino acids, yielding a total of 613 protein features. A number of feature selection algorithms were used to build subsets of the features. A total of 1289 protein samples and 613 protein features were calculated from UniProt database to understand features contributing to the enzymes’ thermostability and find out the main features that influence this valuable characteristic. Results:At the primary structure level, Gln, Glu and polar were the features that mostly contributed to protein thermostability. At the secondary structure level, Helix_S, Coil, and charged_Coil were the most important features affecting protein thermostability. These results suggest that the thermostability of proteins is mainly associated with primary structural features of the protein. According to the results, the influence of primary structure on the thermostabilty of a protein was more important than that of the secondary structure. It is shown that prediction accuracy of ELM (mean square error can improve dramatically using GA with error rates RMSE=0.004 and MAPE=0.1003. Conclusion: The proposed approach for forecasting problem

  12. Predicting Protein Subcellular Location Using Digital Signal Processing

    Institute of Scientific and Technical Information of China (English)

    Yu-Xi PAN; Da-Wei LI; Yun DUAN; Zhi-Zhou ZHANG; Ming-Qing XU; Guo-Yin FENG; Lin HE

    2005-01-01

    The biological functions of a protein are closely related to its attributes in a cell. With the rapid accumulation of newly found protein sequence data in databanks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will expedite the functional determination of newly found proteins and the process of prioritizing genes and proteins identified by genomic efforts as potential molecular targets for drug design. The traditional algorithms for predicting these attributes were based solely on amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns in protein sequences is extremely large, posing a formidable difficulty for realizing this goal. To deal with such difficulty, a well-developed tool in digital signal processing named digital Fourier transform (DFT) [1] was introduced. After being translated to a digital signal according to the hydrophobicity of each amino acid, a protein was analyzed by DFT within the frequency domain. A set of frequency spectrum parameters, thus obtained, were regarded as the factors to represent the sequence order effect. A significant improvement in prediction quality was observed by incorporating the frequency spectrum parameters with the conventional amino acid composition. One of the crucial merits of this approach is that many existing tools in mathematics and engineering can be easily applied in the predicting process. It is anticipated that digital signal processing may serve as a useful vehicle for many other protein science areas.

  13. Predicting protein subcellular location using digital signal processing.

    Science.gov (United States)

    Pan, Yu-Xi; Li, Da-Wei; Duan, Yun; Zhang, Zhi-Zhou; Xu, Ming-Qing; Feng, Guo-Yin; He, Lin

    2005-02-01

    The biological functions of a protein are closely related to its attributes in a cell. With the rapid accumulation of newly found protein sequence data in databanks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will expedite the functional determination of newly found proteins and the process of prioritizing genes and proteins identified by genomic efforts as potential molecular targets for drug design. The traditional algorithms for predicting these attributes were based solely on amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns in protein sequences is extremely large, posing a formidable difficulty for realizing this goal. To deal with such difficulty, a well-developed tool in digital signal processing named digital Fourier transform (DFT) [1] was introduced. After being translated to a digital signal according to the hydrophobicity of each amino acid, a protein was analyzed by DFT within the frequency domain. A set of frequency spectrum parameters, thus obtained, were regarded as the factors to represent the sequence order effect. A significant improvement in prediction quality was observed by incorporating the frequency spectrum parameters with the conventional amino acid composition. One of the crucial merits of this approach is that many existing tools in mathematics and engineering can be easily applied in the predicting process. It is anticipated that digital signal processing may serve as a useful vehicle for many other protein science areas.

  14. Prediction of disease-related mutations affecting protein localization

    Directory of Open Access Journals (Sweden)

    Laurila Kirsti

    2009-03-01

    Full Text Available Abstract Background Eukaryotic cells contain numerous compartments, which have different protein constituents. Proteins are typically directed to compartments by short peptide sequences that act as targeting signals. Translocation to the proper compartment allows a protein to form the necessary interactions with its partners and take part in biological networks such as signalling and metabolic pathways. If a protein is not transported to the correct intracellular compartment either the reaction performed or information carried by the protein does not reach the proper site, causing either inactivation of central reactions or misregulation of signalling cascades, or the mislocalized active protein has harmful effects by acting in the wrong place. Results Numerous methods have been developed to predict protein subcellular localization with quite high accuracy. We applied bioinformatics methods to investigate the effects of known disease-related mutations on protein targeting and localization by analyzing over 22,000 missense mutations in more than 1,500 proteins with two complementary prediction approaches. Several hundred putative localization affecting mutations were identified and investigated statistically. Conclusion Although alterations to localization signals are rare, these effects should be taken into account when analyzing the consequences of disease-related mutations.

  15. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Developing Fortran Code for Kriging on the Stampede Supercomputer

    Science.gov (United States)

    Hodgess, Erin

    2016-04-01

    Kriging is easily accessible in the open source statistical language R (R Core Team, 2015) in the gstat (Pebesma, 2004) package. It works very well, but can be slow on large data sets, particular if the prediction space is large as well. We are working on the Stampede supercomputer at the Texas Advanced Computing Center to develop code using a combination of R and the Message Passage Interface (MPI) bindings to Fortran. We have a function similar to the autofitVariogram found in the automap (Hiemstra {et al}, 2008) package and it is very effective. We are comparing R with MPI/Fortran, MPI/Fortran alone, and R with the Rmpi package, which uses bindings to C. We will present results from simulation studies and real-world examples. References: Hiemstra, P.H., Pebesma, E.J., Twenhofel, C.J.W. and G.B.M. Heuvelink, 2008. Real-time automatic interpolation of ambient gamma dose rates from the Dutch Radioactivity Monitoring Network. Computers and Geosciences, accepted for publication. Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers and Geosciences, 30: 683-691. R Core Team, 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

  17. Prediction of nuclear proteins using SVM and HMM models

    Directory of Open Access Journals (Sweden)

    Raghava Gajendra PS

    2009-01-01

    Full Text Available Abstract Background The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy. Results All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins http://www.imtech.res.in/raghava/nppred/. Conclusion This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear

  18. Interactome-wide prediction of protein-protein binding sites reveals effects of protein sequence variation in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Felipe Leal Valentim

    Full Text Available The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.

  19. Critical Features of Fragment Libraries for Protein Structure Prediction.

    Science.gov (United States)

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  20. Critical Features of Fragment Libraries for Protein Structure Prediction

    Science.gov (United States)

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  1. Fast dynamics perturbation analysis for prediction of protein functional sites

    Directory of Open Access Journals (Sweden)

    Cohn Judith D

    2008-01-01

    Full Text Available Abstract Background We present a fast version of the dynamics perturbation analysis (DPA algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy Dx. Such regions are associated with functional sites. Results The Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that Dx in a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm. Conclusion Compared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.

  2. Prediction of transmembrane helix orientation in polytopic membrane proteins

    Directory of Open Access Journals (Sweden)

    Liang Jie

    2006-06-01

    Full Text Available Abstract Background Membrane proteins compose up to 30% of coding sequences within genomes. However, their structure determination is lagging behind compared with soluble proteins due to the experimental difficulties. Therefore, it is important to develop reliable computational methods to predict structures of membrane proteins. Results We present a method for prediction of the TM helix orientation, which is an essential step in ab initio modeling of membrane proteins. Our method is based on a canonical model of the heptad repeat originally developed for coiled coils. We identify the helical surface patches that interface with lipid molecules at an accuracy of about 88% from the sequence information alone, using an empirical scoring function LIPS (LIPid-facing Surface, which combines lipophilicity and conservation of residues in the helix. We test and discuss results of prediction of helix-lipid interfaces on 162 transmembrane helices from 18 polytopic membrane proteins and present predicted orientations of TM helices in TRPV1 channel. We also apply our method to two structures of homologous cytochrome b6f complexes and find discrepancy in the assignment of TM helices from subunits PetG, PetN and PetL. The results of LIPS calculations and analysis of packing and H-bonding interactions support the helix assignment found in the cytochrome b6f structure from green alga but not the assignment of TM helices in the cyanobacterium b6f structure. Conclusion LIPS calculations can be used for the prediction of helix orientation in ab initio modeling of polytopic membrane proteins. We also show with the example of two cytochrome b6f structures that our method can identify questionable helix assignments in membrane proteins. The LIPS server is available online at http://gila.bioengr.uic.edu/lab/larisa/lips.html.

  3. DSP: a protein shape string and its profile prediction server.

    Science.gov (United States)

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-07-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.

  4. Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features.

    Science.gov (United States)

    Dai, Qi; Wu, Li; Li, Lihua

    2011-12-01

    Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.

  5. Predicting N-terminal myristoylation sites in plant proteins

    Directory of Open Access Journals (Sweden)

    Podell Sheila

    2004-06-01

    Full Text Available Abstract Background N-terminal myristoylation plays a vital role in membrane targeting and signal transduction in plant responses to environmental stress. Although N-myristoyltransferase enzymatic function is conserved across plant, animal, and fungal kingdoms, exact substrate specificities vary, making it difficult to predict protein myristoylation accurately within specific taxonomic groups. Results A new method for predicting N-terminal myristoylation sites specifically in plants has been developed and statistically tested for sensitivity, specificity, and robustness. Compared to previously available methods, the new model is both more sensitive in detecting known positives, and more selective in avoiding false positives. Scores of myristoylated and non-myristoylated proteins are more widely separated than with other methods, greatly reducing ambiguity and the number of sequences giving intermediate, uninformative results. The prediction model is available at http://plantsp.sdsc.edu/myrist.html. Conclusion Superior performance of the new model is due to the selection of a plant-specific training set, covering 266 unique sequence examples from 40 different species, the use of a probability-based hidden Markov model to obtain predictive scores, and a threshold cutoff value chosen to provide maximum positive-negative discrimination. The new model has been used to predict 589 plant proteins likely to contain N-terminal myristoylation signals, and to analyze the functional families in which these proteins occur.

  6. Storage-Intensive Supercomputing Benchmark Study

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, J; Dossa, D; Gokhale, M; Hysom, D; May, J; Pearce, R; Yoo, A

    2007-10-30

    Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe: (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows

  7. Predicting protein structures with a multiplayer online game

    OpenAIRE

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; ,

    2010-01-01

    People exert significant amounts of problem solving effort playing computer games. Simple image- and text-recognition tasks have been successfully crowd-sourced through gamesi, ii, iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search sp...

  8. Taking ASCI supercomputing to the end game.

    Energy Technology Data Exchange (ETDEWEB)

    DeBenedictis, Erik P.

    2004-03-01

    The ASCI supercomputing program is broadly defined as running physics simulations on progressively more powerful digital computers. What happens if we extrapolate the computer technology to its end? We have developed a model for key ASCI computations running on a hypothetical computer whose technology is parameterized in ways that account for advancing technology. This model includes technology information such as Moore's Law for transistor scaling and developments in cooling technology. The model also includes limits imposed by laws of physics, such as thermodynamic limits on power dissipation, limits on cooling, and the limitation of signal propagation velocity to the speed of light. We apply this model and show that ASCI computations will advance smoothly for another 10-20 years to an 'end game' defined by thermodynamic limits and the speed of light. Performance levels at the end game will vary greatly by specific problem, but will be in the Exaflops to Zetaflops range for currently anticipated problems. We have also found an architecture that would be within a constant factor of giving optimal performance at the end game. This architecture is an evolutionary derivative of the mesh-connected microprocessor (such as ASCI Red Storm or IBM Blue Gene/L). We provide designs for the necessary enhancement to microprocessor functionality and the power-efficiency of both the processor and memory system. The technology we develop in the foregoing provides a 'perfect' computer model with which we can rate the quality of realizable computer designs, both in this writing and as a way of designing future computers. This report focuses on classical computers based on irreversible digital logic, and more specifically on algorithms that simulate space computing, irreversible logic, analog computers, and other ways to address stockpile stewardship that are outside the scope of this report.

  9. Simulating functional magnetic materials on supercomputers.

    Science.gov (United States)

    Gruner, Markus Ernst; Entel, Peter

    2009-07-22

    The recent passing of the petaflop per second landmark by the Roadrunner project at the Los Alamos National Laboratory marks a preliminary peak of an impressive world-wide development in the high-performance scientific computing sector. Also, purely academic state-of-the-art supercomputers such as the IBM Blue Gene/P at Forschungszentrum Jülich allow us nowadays to investigate large systems of the order of 10(3) spin polarized transition metal atoms by means of density functional theory. Three applications will be presented where large-scale ab initio calculations contribute to the understanding of key properties emerging from a close interrelation between structure and magnetism. The first two examples discuss the size dependent evolution of equilibrium structural motifs in elementary iron and binary Fe-Pt and Co-Pt transition metal nanoparticles, which are currently discussed as promising candidates for ultra-high-density magnetic data storage media. However, the preference for multiply twinned morphologies at smaller cluster sizes counteracts the formation of a single-crystalline L1(0) phase, which alone provides the required hard magnetic properties. The third application is concerned with the magnetic shape memory effect in the Ni-Mn-Ga Heusler alloy, which is a technologically relevant candidate for magnetomechanical actuators and sensors. In this material strains of up to 10% can be induced by external magnetic fields due to the field induced shifting of martensitic twin boundaries, requiring an extremely high mobility of the martensitic twin boundaries, but also the selection of the appropriate martensitic structure from the rich phase diagram.

  10. Prediction of protein disorder on amino acid substitutions.

    Science.gov (United States)

    Anoosha, P; Sakthivel, R; Gromiha, M Michael

    2015-12-15

    Intrinsically disordered regions of proteins are known to have many functional roles in cell signaling and regulatory pathways. The altered expression of these proteins due to mutations is associated with various diseases. Currently, most of the available methods focus on predicting the disordered proteins or the disordered regions in a protein. On the other hand, methods developed for predicting protein disorder on mutation showed a poor performance with a maximum accuracy of 70%. Hence, in this work, we have developed a novel method to classify the disorder-related amino acid substitutions using amino acid properties, substitution matrices, and the effect of neighboring residues that showed an accuracy of 90.0% with a sensitivity and specificity of 94.9 and 80.6%, respectively, in 10-fold cross-validation. The method was evaluated with a test set of 20% data using 10 iterations, which showed an average accuracy of 88.9%. Furthermore, we systematically analyzed the features responsible for the better performance of our method and observed that neighboring residues play an important role in defining the disorder of a given residue in a protein sequence. We have developed a prediction server to identify disorder-related mutations, and it is available at http://www.iitm.ac.in/bioinfo/DIM_Pred/. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Protein-binding site prediction based on three-dimensional protein modeling.

    Science.gov (United States)

    Oh, Mina; Joo, Keehyoung; Lee, Jooyoung

    2009-01-01

    Structural information of a protein can guide one to understand the function of the protein, and ligand binding is one of the major biochemical functions of proteins. We have applied a two-stage template-based ligand binding site prediction method to CASP8 targets and achieved high quality results with accuracy/coverage = 70/80 (LEE). First, templates are used for protein structure modeling and then for binding site prediction by structural clustering of ligand-containing templates to the predicted protein model. Remarkably, the results are only a few percent worse than those one can obtain from native structures, which were available only after the prediction. Prediction was performed without knowing identity of ligands, and consequently, in many cases the ligand molecules used for prediction were different from the actual ligands, and yet we find that the prediction was quite successful. The current approach can be easily combined with experiments to investigate protein activities in a systematic way. Copyright 2009 Wiley-Liss, Inc.

  12. Prediction of 492 human protein kinase substrate specificities.

    Science.gov (United States)

    Safaei, Javad; Maňuch, Ján; Gupta, Arvind; Stacho, Ladislav; Pelech, Steven

    2011-10-14

    Complex intracellular signaling networks monitor diverse environmental inputs to evoke appropriate and coordinated effector responses. Defective signal transduction underlies many pathologies, including cancer, diabetes, autoimmunity and about 400 other human diseases. Therefore, there is high impetus to define the composition and architecture of cellular communications networks in humans. The major components of intracellular signaling networks are protein kinases and protein phosphatases, which catalyze the reversible phosphorylation of proteins. Here, we have focused on identification of kinase-substrate interactions through prediction of the phosphorylation site specificity from knowledge of the primary amino acid sequence of the catalytic domain of each kinase. The presented method predicts 488 different kinase catalytic domain substrate specificity matrices in 478 typical and 4 atypical human kinases that rely on both positive and negative determinants for scoring individual phosphosites for their suitability as kinase substrates. This represents a marked advancement over existing methods such as those used in NetPhorest (179 kinases in 76 groups) and NetworKIN (123 kinases), which consider only positive determinants for kinase substrate prediction. Comparison of our predicted matrices with experimentally-derived matrices from about 9,000 known kinase-phosphosite substrate pairs revealed a high degree of concordance with the established preferences of about 150 well studied protein kinases. Furthermore for many of the better known kinases, the predicted optimal phosphosite sequences were more accurate than the consensus phosphosite sequences inferred by simple alignment of the phosphosites of known kinase substrates. Application of this improved kinase substrate prediction algorithm to the primary structures of over 23, 000 proteins encoded by the human genome has permitted the identification of about 650, 000 putative phosphosites, which are posted on the

  13. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    Science.gov (United States)

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  14. SIFT: Predicting amino acid changes that affect protein function.

    Science.gov (United States)

    Ng, Pauline C; Henikoff, Steven

    2003-07-01

    Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

  15. Benchmarking human protein complexes to investigate drug-related systems and evaluate predicted protein complexes.

    Directory of Open Access Journals (Sweden)

    Min Wu

    Full Text Available Protein complexes are key entities to perform cellular functions. Human diseases are also revealed to associate with some specific human protein complexes. In fact, human protein complexes are widely used for protein function annotation, inference of human protein interactome, disease gene prediction, and so on. Therefore, it is highly desired to build an up-to-date catalogue of human complexes to support the research in these applications. Protein complexes from different databases are as expected to be highly redundant. In this paper, we designed a set of concise operations to compile these redundant human complexes and built a comprehensive catalogue called CHPC2012 (Catalogue of Human Protein Complexes. CHPC2012 achieves a higher coverage for proteins and protein complexes than those individual databases. It is also verified to be a set of complexes with high quality as its co-complex protein associations have a high overlap with protein-protein interactions (PPI in various existing PPI databases. We demonstrated two distinct applications of CHPC2012, that is, investigating the relationship between protein complexes and drug-related systems and evaluating the quality of predicted protein complexes. In particular, CHPC2012 provides more insights into drug development. For instance, proteins involved in multiple complexes (the overlapping proteins are potential drug targets; the drug-complex network is utilized to investigate multi-target drugs and drug-drug interactions; and the disease-specific complex-drug networks will provide new clues for drug repositioning. With this up-to-date reference set of human protein complexes, we believe that the CHPC2012 catalogue is able to enhance the studies for protein interactions, protein functions, human diseases, drugs, and related fields of research. CHPC2012 complexes can be downloaded from http://www1.i2r.a-star.edu.sg/xlli/CHPC2012/CHPC2012.htm.

  16. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  17. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    Science.gov (United States)

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.

  18. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    Science.gov (United States)

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  19. C-Reactive Protein, fibrinogen and cardiovascular disease prediction

    NARCIS (Netherlands)

    Kromhout, D.

    2012-01-01

    Background There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events. Full Text of Background... Methods We analyzed data from 52 prospective studies that included 246,669 participants

  20. C-Reactive Protein, Fibrinogen, and Cardiovascular Disease Prediction

    NARCIS (Netherlands)

    Kaptoge, Stephen; Di Angelantonio, Emanuele; Pennells, Lisa; Wood, Angela M.; White, Ian R.; Gao, Pei; Walker, Matthew; Thompson, Alexander; Sarwar, Nadeem; Caslake, Muriel; Butterworth, Adam S.; Amouyel, Philippe; Assmann, Gerd; Bakker, Stephan J. L.; Barr, Elizabeth L. M.; Barrett-Connor, Elizabeth; Benjamin, Emelia J.; Bjorkelund, Cecilia; Brenner, Hermann; Brunner, Eric; Clarke, Robert; Cooper, Jackie A.; Cremer, Peter; Cushman, Mary; Dagenais, Gilles R.; D'Agostino, Ralph B.; Dankner, Rachel; Davey-Smith, George; Deeg, Dorly; Dekker, Jacqueline M.; Engstrom, Gunnar; Folsom, Aaron R.; Fowkes, F. Gerry R.; Gallacher, John; Gaziano, J. Michael; Giampaoli, Simona; Gillum, Richard F.; Hofman, Albert; Howard, Barbara V.; Ingelsson, Erik; Iso, Hiroyasu; Jorgensen, Torben; Kiechl, Stefan; Kitamura, Akihiko; Kiyohara, Yutaka; Koenig, Wolfgang; Kromhout, Daan; Kuller, Lewis H.; Lawlor, Debbie A.; Meade, Tom W.; Nissinen, Aulikki; Nordestgaard, Borge G.; Onat, Altan; Panagiotakos, Demosthenes B.; Psaty, Bruce M.; Rodriguez, Beatriz; Rosengren, Annika; Salomaa, Veikko; Kauhanen, Jussi; Salonen, Jukka T.; Shaffer, Jonathan A.; Shea, Steven; Ford, Ian; Stehouwer, Coen D. A.; Strandberg, Timo E.; Tipping, Robert W.; Tosetto, Alberto; Wassertheil-Smoller, Sylvia; Wennberg, Patrik; Westendorp, Rudi G.; Whincup, Peter H.; Wilhelmsen, Lars; Woodward, Mark; Lowe, Gordon D. O.; Wareham, Nicholas J.; Khaw, Kay-Tee; Sattar, Naveed; Packard, Chris J.; Gudnason, Vilmundur; Ridker, Paul M.; Pepys, Mark B.; Thompson, Simon G.; Danesh, John

    2012-01-01

    Background There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events. Methods We analyzed data from 52 prospective studies that included 246,669 participants without a history of cardiovascul

  1. C-reactive protein, fibrinogen, and cardiovascular disease prediction

    NARCIS (Netherlands)

    S. Kaptoge (Stephen); E. di Angelantonio (Emanuele); L. Pennells (Lisa); A.M. Wood (Angela); I.R. White (Ian); P. Gao (Pei); M. Walker (Mark); A. Thompson (Alexander); S. Sarwar (Sheryar); M. Caslake (Muriel); A.S. Butterworth (Adam); P. Amouyel (Philippe); G. Assmann (Gerd); S.J.L. Bakker (Stephan); E.L.M. Barr; E. Barrett-Connor (Elizabeth); E.J. Benjamin (Emelia); C. Björkelund (Cecilia); H. Brenner (Hermann); E. Brunner (Eric); R. Clarke (Robert); J.A. Cooper (Jackie); P. Cremer; M. Cushman (Mary Ann); G.R. Dagenais (Gilles R); R.B. D'Agostino (Ralph); R. Dankner (Rachel); G. Davey-Smith (George); D.J.H. Deeg (Dorly); J.M. Dekker (Jacqueline); G. Engström; A.R. Folsom (Aaron); F.G.R. Fowkes (F. Gerald R.); J. Gallacher (John); J.M. Gaziano (J. Michael); S. Giampaoli (Simona); R.F. Gillum (Richard); A. Hofman (Albert); B.V. Howard (Barbara); E. Ingelsson (Erik); H. Iso (Hiroyasu); T. Jorgensen (Torben); S. Kiechl (Stefan); A. Kitamura; Y. Kiyohara (Yutaka); W. Koenig (Wolfgang); D. Kromhout (Daan); L.H. Kuller (Lewis); D.A. Lawlor (Debbie); T. Meade (Tom); A. Nissinen (Aulikki); B.G. Nordestgaard (Børge); A. Onat (Altan); D.B. Panagiotakos (Demosthenes); B.M. Psaty (Bruce); B. Rodriguez (Beatriz); A. Rosengren (Annika); V. Salomaa (Veikko); J. Kauhanen (Jussi); J.T. Salonen; J.A. Shaffer (Jonathan); S. Shea (Steven); I. Ford (Ian); C.D. Stehouwer (Coen); T.E. Strandberg (Timo); A. Tipping (Alex); A. Tosetto (Alberto); S. Wassertheil-Smoller (Sylvia); P. Wennberg (Patrik); R.G.J. Westendorp (Rudi); P.H. Whincup (Peter); L. Wilhelmsen (Lars); M. Woodward (Mark); G.D.O. Lowe (Gordon); N.J. Wareham (Nick); K-T. Khaw (Kay-Tee); N. Sattar (Naveed); C. Packard (Chris); V. Gudnason (Vilmundur); P.M. Ridker (Paul); M.B. Pepys (Mark); S.G. Thompson (Simon); J. Danesh (John)

    2012-01-01

    textabstractBACKGROUND: There is debate about the value of assessing levels of C-reactive protein (CRP) and other biomarkers of inflammation for the prediction of first cardiovascular events. METHODS: We analyzed data from 52 prospective studies that included 246,669 participants without a history o

  2. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms

    DEFF Research Database (Denmark)

    Nielsen, Henrik

    2016-01-01

    When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach...

  3. Are specialized web servers better at predicting protein structures ...

    African Journals Online (AJOL)

    RABAIL HAFEEZ (0973106)

    2012-07-03

    Jul 3, 2012 ... This research study answers the question that technology is the best for predicting protein structures. Stand-alone .... server 3D Jury to show how it produces high quality, accurate ..... Nucleic Acids Res., 32: 14-16. Ginalski K ...

  4. Oxypred: Prediction and Classification of Oxygen-Binding Proteins

    Institute of Scientific and Technical Information of China (English)

    S.; Muthukrishnan; Aarti; Garg; G.P.S.; Raghava

    2007-01-01

    This study describes a method for predicting and classifying oxygen-binding pro- teins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding pro- teins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Sec- ondly, an SVM module was developed based on amino acid composition, classify- ing the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemo- cyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins(available from http://www.imtech.res.in/raghava/oxypred/).

  5. Models to predict intestinal absorption of therapeutic peptides and proteins.

    Science.gov (United States)

    Antunes, Filipa; Andrade, Fernanda; Ferreira, Domingos; Nielsen, Hanne Morck; Sarmento, Bruno

    2013-01-01

    Prediction of human intestinal absorption is a major goal in the design, optimization, and selection of drugs intended for oral delivery, in particular proteins, which possess intrinsic poor transport across intestinal epithelium. There are various techniques currently employed to evaluate the extension of protein absorption in the different phases of drug discovery and development. Screening protocols to evaluate protein absorption include a range of preclinical methodologies like in silico, in vitro, in situ, ex vivo and in vivo. It is the careful and critical use of these techniques that can help to identify drug candidates, which most probably will be well absorbed from the human intestinal tract. It is well recognized that the human intestinal permeability cannot be accurately predicted based on a single preclinical method. However, the present social and scientific concerns about the animal well care as well as the pharmaceutical industries need for rapid, cheap and reliable models predicting bioavailability give reasons for using methods providing an appropriate correlation between results of in vivo and in vitro drug absorption. The aim of this review is to describe and compare in silico, in vitro, in situ, ex vivo and in vivo methods used to predict human intestinal absorption, giving a special attention to the intestinal absorption of therapeutic peptides and proteins.

  6. Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

    NARCIS (Netherlands)

    Hulsman, M.; Reinders, M.J.T.; De Ridder, D.

    2008-01-01

    In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based cla

  7. C-reactive Protein Predicts Postoperative Delirium Following Vascular Surgery

    NARCIS (Netherlands)

    Pol, Robert A.; van Leeuwen, Barbara L.; Izaks, Gerbrand J.; Reijnen, Michel M. P. J.; Visser, Linda; Tielliu, Ignace F. J.; Zeebregts, Clark J.

    2014-01-01

    Background: The etiology of postoperative delirium (POD) following vascular surgery is generally unknown. The incidence, however, can be as high as 35%. A possible neuroinflammatory basis for delirium is likely and C-reactive protein (CRP) as a marker for inflammation can possibly play a predictive

  8. Prediction of N-terminal protein sorting signals

    DEFF Research Database (Denmark)

    Claros, Manuel G.; Brunak, Søren; von Heijne, Gunnar

    1997-01-01

    Recently, neural networks have been applied to a widening range of problems in molecular biology. An area particularly suited to neural-network methods is the identification of protein sorting signals and the prediction of their cleavage sites, as these functional units are encoded by local, linear...

  9. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    Science.gov (United States)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  10. Boosting compound-protein interaction prediction by deep learning.

    Science.gov (United States)

    Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng

    2016-11-01

    The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets.

  11. Improved hybrid optimization algorithm for 3D protein structure prediction.

    Science.gov (United States)

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  12. Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces

    Science.gov (United States)

    Chen, Ching-Tai; Peng, Hung-Pin; Jian, Jhih-Wei; Tsai, Keng-Chang; Chang, Jeng-Yih; Yang, Ei-Wen; Chen, Jun-Bo; Ho, Shinn-Ying; Hsu, Wen-Lian; Yang, An-Suei

    2012-01-01

    Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with

  13. Protein secondary structure prediction using NMR chemical shift data.

    Science.gov (United States)

    Zhao, Yuzhong; Alipanahi, Babak; Li, Shuai Cheng; Li, Ming

    2010-10-01

    Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary structure determination. Relatively few work has been done on this subject. There needs to be a systematic investigation of algorithms that are (a) robust for large datasets; (b) easily extendable to (the dynamic) new databases; and (c) approaching to the limit of accuracy. We introduce new approaches using k-nearest neighbor algorithm to do the basic prediction and use the BCJR algorithm to smooth the predictions and combine different predictions from chemical shifts and based on sequence information only. Our new system, SUCCES, improves the accuracy of all existing methods on a large dataset of 805 proteins (at 86% Q(3) accuracy and at 92.6% accuracy when the boundary residues are ignored), and it is easily extendable to any new dataset without requiring any new training. The software is publicly available at http://monod.uwaterloo.ca/nmr/succes.

  14. Three-dimensional protein structure prediction: Methods and computational strategies.

    Science.gov (United States)

    Dorn, Márcio; E Silva, Mariel Barbachan; Buriol, Luciana S; Lamb, Luis C

    2014-10-12

    A long standing problem in structural bioinformatics is to determine the three-dimensional (3-D) structure of a protein when only a sequence of amino acid residues is given. Many computational methodologies and algorithms have been proposed as a solution to the 3-D Protein Structure Prediction (3-D-PSP) problem. These methods can be divided in four main classes: (a) first principle methods without database information; (b) first principle methods with database information; (c) fold recognition and threading methods; and (d) comparative modeling methods and sequence alignment strategies. Deterministic computational techniques, optimization techniques, data mining and machine learning approaches are typically used in the construction of computational solutions for the PSP problem. Our main goal with this work is to review the methods and computational strategies that are currently used in 3-D protein prediction.

  15. Supercomputing - Use Cases, Advances, The Future (1/2)

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the first day, we will focus on the history and theory of supercomputing, the top500 list and the hardware that makes supercomputers tick. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and initiated projects with the private sector (e.g. HP an...

  16. Nanoparticles-cell association predicted by protein corona fingerprints

    Science.gov (United States)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  17. Prediction of protein binding sites in protein structures using hidden Markov support vector machine

    Directory of Open Access Journals (Sweden)

    Lin Lei

    2009-11-01

    Full Text Available Abstract Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

  18. Constraint Logic Programming approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Fogolari Federico

    2004-11-01

    Full Text Available Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  19. Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling

    DEFF Research Database (Denmark)

    Simonsen, Martin; Maetschke, S.R.; Ragan, M.A.

    2012-01-01

    Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein–protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa publicly......: We present three novel methods for automating the selection of RT, using machine learning based on known protein–protein interaction networks. One of these methods in particular, Tree-Based Search, yields greatly improved prediction accuracies. We further show that different methods for constituting...

  20. Neural network definitions of highly predictable protein secondary structure classes

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A. [Los Alamos National Lab., NM (United States)]|[Santa Fe Inst., NM (United States); Steeg, E. [Toronto Univ., ON (Canada). Dept. of Computer Science; Farber, R. [Los Alamos National Lab., NM (United States)

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  1. Modelling proteins' hidden conformations to predict antibiotic resistance

    Science.gov (United States)

    Hart, Kathryn M.; Ho, Chris M. W.; Dutta, Supratik; Gross, Michael L.; Bowman, Gregory R.

    2016-10-01

    TEM β-lactamase confers bacteria with resistance to many antibiotics and rapidly evolves activity against new drugs. However, functional changes are not easily explained by differences in crystal structures. We employ Markov state models to identify hidden conformations and explore their role in determining TEM's specificity. We integrate these models with existing drug-design tools to create a new technique, called Boltzmann docking, which better predicts TEM specificity by accounting for conformational heterogeneity. Using our MSMs, we identify hidden states whose populations correlate with activity against cefotaxime. To experimentally detect our predicted hidden states, we use rapid mass spectrometric footprinting and confirm our models' prediction that increased cefotaxime activity correlates with reduced Ω-loop flexibility. Finally, we design novel variants to stabilize the hidden cefotaximase states, and find their populations predict activity against cefotaxime in vitro and in vivo. Therefore, we expect this framework to have numerous applications in drug and protein design.

  2. Prediction of protein-destabilizing polymorphisms by manual curation with protein structure.

    Directory of Open Access Journals (Sweden)

    Craig Alan Gough

    Full Text Available The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein's sequence or large changes, insertions, or deletions in the sequence. We have analyzed a large set (n = 512 of insertions and deletions (indels and single nucleotide polymorphisms causing premature termination of translation in disease-related genes. Prediction of protein-destabilization effects was performed by graphical presentation of the locations of polymorphisms in the protein structure, using the Genomes TO Protein (GTOP database, and manual annotation with a set of specific criteria. Protein-destabilization was predicted for 44.4% of the nonsense SNPs, 32.4% of the frameshifting indels, and 9.1% of the non-frameshifting indels. A prediction of nonsense-mediated decay allowed to infer which truncated proteins would actually be translated as defective proteins. These cases included the proteins linked to diseases inherited dominantly, suggesting a relation between these diseases and toxic aggregation. Our approach would be useful in identifying potentially aggregation-inducing polymorphisms that may have pathological effects.

  3. An integrated distributed processing interface for supercomputers and workstations

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, J.; McGavran, L.

    1989-01-01

    Access to documentation, communication between multiple processes running on heterogeneous computers, and animation of simulations of engineering problems are typically weak in most supercomputer environments. This presentation will describe how we are improving this situation in the Computer Research and Applications group at Los Alamos National Laboratory. We have developed a tool using UNIX filters and a SunView interface that allows users simple access to documentation via mouse driven menus. We have also developed a distributed application that integrated a two point boundary value problem on one of our Cray Supercomputers. It is controlled and displayed graphically by a window interface running on a workstation screen. Our motivation for this research has been to improve the usual typewriter/static interface using language independent controls to show capabilities of the workstation/supercomputer combination. 8 refs.

  4. Evolutionary computer programming of protein folding and structure predictions.

    Science.gov (United States)

    Nölting, Bengt; Jülich, Dennis; Vonau, Winfried; Andert, Karl

    2004-07-07

    In order to understand the mechanism of protein folding and to assist the rational de-novo design of fast-folding, non-aggregating and stable artificial enzymes it is very helpful to be able to simulate protein folding reactions and to predict the structures of proteins and other biomacromolecules. Here, we use a method of computer programming called "evolutionary computer programming" in which a program evolves depending on the evolutionary pressure exerted on the program. In the case of the presented application of this method on a computer program for folding simulations, the evolutionary pressure exerted was towards faster finding deep minima in the energy landscape of protein folding. Already after 20 evolution steps, the evolved program was able to find deep minima in the energy landscape more than 10 times faster than the original program prior to the evolution process.

  5. Tandem Repeats in Proteins : Prediction Algorithms and Biological Role

    Directory of Open Access Journals (Sweden)

    Marco ePellegrini

    2015-09-01

    Full Text Available Tandem repetitions in protein sequence and structure is a fascinatingsubject of research which has been a focus of study since the late 1990's.In this survey we give an overviewon the multi-faceted aspects of research on protein tandem repeats textcolor{red}{(PTR for short}, including prediction algorithms, databases,early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design.We also touch on the rather open issue of the relationship between PTRand flexibility (or disorder in proteins.Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological signal-to-noise ratio that is a key feature of this problem.As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs willhave a high impact on the investigations of the biological role of PTR.

  6. Protein structure prediction using bee colony optimization metaheuristic

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Paluszewski, Martin; Winter, Pawel

    2010-01-01

    Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation of the pr......Predicting the native structure of proteins is one of the most challenging problems in molecular biology. The goal is to determine the three-dimensional struc- ture from the one-dimensional amino acid sequence. De novo prediction algorithms seek to do this by developing a representation...... of the proteins structure, an energy potential and some optimization algorithm that ¿nds the structure with minimal energy. Bee Colony Optimization (BCO) is a relatively new approach to solving opti- mization problems based on the foraging behaviour of bees. Several variants of BCO have been suggested...... in the literature. We have devised a new variant that uni¿es the existing and is much more ¿exible with respect to replacing the various elements of the BCO. In particular this applies to the choice of the local search as well as the method for generating scout locations and performing the waggle dance. We apply...

  7. Prediction of Protein-DNA binding by Monte Carlo method

    Science.gov (United States)

    Deng, Yuefan; Eisenberg, Moises; Korobka, Alex

    1997-08-01

    We present an analysis and prediction of protein-DNA binding specificity based on the hydrogen bonding between DNA, protein, and auxillary clusters of water molecules. Zif268, glucocorticoid receptor, λ-repressor mutant, HIN-recombinase, and tramtrack protein-DNA complexes are studied. Hydrogen bonds are approximated by the Lennard-Jones potential with a cutoff distance between the hydrogen and the acceptor atoms set to 3.2 Åand an angular component based on a dipole-dipole interaction. We use a three-stage docking algorithm: geometric hashing that matches pairs of hydrogen bonding sites; (2) least-squares minimization of pairwise distances to filter out insignificant matches; and (3) Monte Carlo stochastic search to minimize the energy of the system. More information can be obtained from our first paper on this subject [Y.Deng et all, J.Computational Chemistry (1995)]. Results show that the biologically correct base pair is selected preferentially when there are two or more strong hydrogen bonds (with LJ potential lower than -0.20) that bind it to the protein. Predicted sequences are less stable in the case of weaker bonding sites. In general the inclusion of water bridges does increase the number of base pairs for which correct specificity is predicted.

  8. Computational protein biomarker prediction: a case study for prostate cancer

    Directory of Open Access Journals (Sweden)

    Adam Bao-Ling

    2004-03-01

    Full Text Available Abstract Background Recent technological advances in mass spectrometry pose challenges in computational mathematics and statistics to process the mass spectral data into predictive models with clinical and biological significance. We discuss several classification-based approaches to finding protein biomarker candidates using protein profiles obtained via mass spectrometry, and we assess their statistical significance. Our overall goal is to implicate peaks that have a high likelihood of being biologically linked to a given disease state, and thus to narrow the search for biomarker candidates. Results Thorough cross-validation studies and randomization tests are performed on a prostate cancer dataset with over 300 patients, obtained at the Eastern Virginia Medical School using SELDI-TOF mass spectrometry. We obtain average classification accuracies of 87% on a four-group classification problem using a two-stage linear SVM-based procedure and just 13 peaks, with other methods performing comparably. Conclusions Modern feature selection and classification methods are powerful techniques for both the identification of biomarker candidates and the related problem of building predictive models from protein mass spectrometric profiles. Cross-validation and randomization are essential tools that must be performed carefully in order not to bias the results unfairly. However, only a biological validation and identification of the underlying proteins will ultimately confirm the actual value and power of any computational predictions.

  9. (PS)2: protein structure prediction server version 3.0.

    Science.gov (United States)

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/.

  10. Prediction of change in protein unfolding rates upon point mutations in two state proteins.

    Science.gov (United States)

    Chaudhary, Priyashree; Naganathan, Athi N; Gromiha, M Michael

    2016-09-01

    Studies on protein unfolding rates are limited and challenging due to the complexity of unfolding mechanism and the larger dynamic range of the experimental data. Though attempts have been made to predict unfolding rates using protein sequence-structure information there is no available method for predicting the unfolding rates of proteins upon specific point mutations. In this work, we have systematically analyzed a set of 790 single mutants and developed a robust method for predicting protein unfolding rates upon mutations (Δlnku) in two-state proteins by combining amino acid properties and knowledge-based classification of mutants with multiple linear regression technique. We obtain a mean absolute error (MAE) of 0.79/s and a Pearson correlation coefficient (PCC) of 0.71 between predicted unfolding rates and experimental observations using jack-knife test. We have developed a web server for predicting protein unfolding rates upon mutation and it is freely available at https://www.iitm.ac.in/bioinfo/proteinunfolding/unfoldingrace.html. Prominent features that determine unfolding kinetics as well as plausible reasons for the observed outliers are also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Prediction of Protein-Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests.

    Science.gov (United States)

    Jia, Jianhua; Xiao, Xuan; Liu, Bingxiang

    2016-06-01

    Protein-protein interactions (PPIs) provide valuable insight into the inner workings of cells, and it is significant to study the network of PPIs. It is vitally important to develop an automated method as a high-throughput tool to timely predict PPIs. Based on the physicochemical descriptors, a protein was converted into several digital signals, and then wavelet transform was used to analyze them. With such a formulation frame to represent the samples of protein sequences, the random forests algorithm was adopted to conduct prediction. The results on a large-scale independent-test data set show that the proposed model can achieve a good performance with an accuracy value of about 0.86 and a geometric mean value of about 0.85. Therefore, it can be a usefully supplementary tool for PPI prediction. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI_RF.

  12. Selective prediction of interaction sites in protein structures with THEMATICS

    Directory of Open Access Journals (Sweden)

    Murga Leonel F

    2007-04-01

    Full Text Available Abstract Background Methods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS, a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites. Results Using a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively. Conclusion With a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of

  13. PDBalert: automatic, recurrent remote homology tracking and protein structure prediction

    Directory of Open Access Journals (Sweden)

    Söding Johannes

    2008-11-01

    Full Text Available Abstract Background During the last years, methods for remote homology detection have grown more and more sensitive and reliable. Automatic structure prediction servers relying on these methods can generate useful 3D models even below 20% sequence identity between the protein of interest and the known structure (template. When no homologs can be found in the protein structure database (PDB, the user would need to rerun the same search at regular intervals in order to make timely use of a template once it becomes available. Results PDBalert is a web-based automatic system that sends an email alert as soon as a structure with homology to a protein in the user's watch list is released to the PDB database or appears among the sequences on hold. The mail contains links to the search results and to an automatically generated 3D homology model. The sequence search is performed with the same software as used by the very sensitive and reliable remote homology detection server HHpred, which is based on pairwise comparison of Hidden Markov models. Conclusion PDBalert will accelerate the information flow from the PDB database to all those who can profit from the newly released protein structures for predicting the 3D structure or function of their proteins of interest.

  14. Structure-based prediction of protein-folding transition paths

    CERN Document Server

    Jacobs, William M

    2016-01-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions.

  15. Predicting pKa for proteins using COSMO-RS

    DEFF Research Database (Denmark)

    Andersson, Martin Peter; Jensen, Jan Halborg; Stipp, Susan Louise Svane

    2013-01-01

    We have used the COSMO-RS implicit solvation method to calculate the equilibrium constants, pKa, for deprotonation of the acidic residues of the ovomucoid inhibitor protein, OMTKY3. The root mean square error for comparison with experimental data is only 0.5 pH units and the maximum error 0.8 p......H units. The results show that the accuracy of pKa prediction using COSMO-RS is as good for large biomolecules as it is for smaller inorganic and organic acids and that the method compares very well to previous pKa predictions of the OMTKY3 protein using Quantum Mechanics/Molecular Mechanics. Our approach...

  16. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Directory of Open Access Journals (Sweden)

    Aboul-Magd Mohammed O

    2009-07-01

    Full Text Available Abstract Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures from primary sequence data which makes use of Parallel Cascade Identification (PCI, a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at http://bioinf.sce.carleton.ca/PCISS. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input

  17. QSAR Models for the Prediction of Plasma Protein Binding

    Directory of Open Access Journals (Sweden)

    Zeshan Amin

    2013-02-01

    Full Text Available Introduction: The prediction of plasma protein binding (ppb is of paramount importance in the pharmacokinetics characterization of drugs, as it causes significant changes in volume of distribution, clearance and drug half life. This study utilized Quantitative Structure – Activity Relationships (QSAR for the prediction of plasma protein binding. Methods: Protein binding values for 794 compounds were collated from literature. The data was partitioned into a training set of 662 compounds and an external validation set of 132 compounds. Physicochemical and molecular descriptors were calculated for each compound using ACD labs/logD, MOE (Chemical Computing Group and Symyx QSAR software packages. Several data mining tools were employed for the construction of models. These included stepwise regression analysis, Classification and Regression Trees (CART, Boosted trees and Random Forest. Results: Several predictive models were identified; however, one model in particular produced significantly superior prediction accuracy for the external validation set as measured using mean absolute error and correlation coefficient. The selected model was a boosted regression tree model which had the mean absolute error for training set of 13.25 and for validation set of 14.96. Conclusion: Plasma protein binding can be modeled using simple regression trees or multiple linear regressions with reasonable model accuracies. These interpretable models were able to identify the governing molecular factors for a high ppb that included hydrophobicity, van der Waals surface area parameters, and aromaticity. On the other hand, the more complicated ensemble method of boosted regression trees produced the most accurate ppb estimations for the external validation set.

  18. Prediction of DNA-binding specificity in zinc finger proteins

    Indian Academy of Sciences (India)

    Sumedha Roy; Shayoni Dutta; Kanika Khanna; Shruti Singla; Durai Sundar

    2012-07-01

    Zinc finger proteins interact via their individual fingers to three base pair subsites on the target DNA. The four key residue positions −1, 2, 3 and 6 on the alpha-helix of the zinc fingers have hydrogen bond interactions with the DNA. Mutating these key residues enables generation of a plethora of combinatorial possibilities that can bind to any DNA stretch of interest. Exploiting the binding specificity and affinity of the interaction between the zinc fingers and the respective DNA can help to generate engineered zinc fingers for therapeutic purposes involving genome targeting. Exploring the structure–function relationships of the existing zinc finger–DNA complexes can aid in predicting the probable zinc fingers that could bind to any target DNA. Computational tools ease the prediction of such engineered zinc fingers by effectively utilizing information from the available experimental data. A study of literature reveals many approaches for predicting DNA-binding specificity in zinc finger proteins. However, an alternative approach that looks into the physico-chemical properties of these complexes would do away with the difficulties of designing unbiased zinc fingers with the desired affinity and specificity. We present a physico-chemical approach that exploits the relative strengths of hydrogen bonding between the target DNA and all combinatorially possible zinc fingers to select the most optimum zinc finger protein candidate.

  19. Predicting protein structures with a multiplayer online game.

    Science.gov (United States)

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-05

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  20. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    Science.gov (United States)

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ .

  1. Protein-spanning water networks and implications for prediction of protein-protein interactions mediated through hydrophobic effects.

    Science.gov (United States)

    Cui, Di; Ou, Shuching; Patel, Sandeep

    2014-12-01

    Hydrophobic effects, often conflated with hydrophobic forces, are implicated as major determinants in biological association and self-assembly processes. Protein-protein interactions involved in signaling pathways in living systems are a prime example where hydrophobic effects have profound implications. In the context of protein-protein interactions, a priori knowledge of relevant binding interfaces (i.e., clusters of residues involved directly with binding interactions) is difficult. In the case of hydrophobically mediated interactions, use of hydropathy-based methods relying on single residue hydrophobicity properties are routinely and widely used to predict propensities for such residues to be present in hydrophobic interfaces. However, recent studies suggest that consideration of hydrophobicity for single residues on a protein surface require accounting of the local environment dictated by neighboring residues and local water. In this study, we use a method derived from percolation theory to evaluate spanning water networks in the first hydration shells of a series of small proteins. We use residue-based water density and single-linkage clustering methods to predict hydrophobic regions of proteins; these regions are putatively involved in binding interactions. We find that this simple method is able to predict with sufficient accuracy and coverage the binding interface residues of a series of proteins. The approach is competitive with automated servers. The results of this study highlight the importance of accounting of local environment in determining the hydrophobic nature of individual residues on protein surfaces.

  2. Predictive energy landscapes for folding membrane protein assemblies

    Science.gov (United States)

    Truong, Ha H.; Kim, Bobby L.; Schafer, Nicholas P.; Wolynes, Peter G.

    2015-12-01

    We study the energy landscapes for membrane protein oligomerization using the Associative memory, Water mediated, Structure and Energy Model with an implicit membrane potential (AWSEM-membrane), a coarse-grained molecular dynamics model previously optimized under the assumption that the energy landscapes for folding α-helical membrane protein monomers are funneled once their native topology within the membrane is established. In this study we show that the AWSEM-membrane force field is able to sample near native binding interfaces of several oligomeric systems. By predicting candidate structures using simulated annealing, we further show that degeneracies in predicting structures of membrane protein monomers are generally resolved in the folding of the higher order assemblies as is the case in the assemblies of both nicotinic acetylcholine receptor and V-type Na+-ATPase dimers. The physics of the phenomenon resembles domain swapping, which is consistent with the landscape following the principle of minimal frustration. We revisit also the classic Khorana study of the reconstitution of bacteriorhodopsin from its fragments, which is the close analogue of the early Anfinsen experiment on globular proteins. Here, we show the retinal cofactor likely plays a major role in selecting the final functional assembly.

  3. Ranking beta sheet topologies with applications to protein structure prediction

    DEFF Research Database (Denmark)

    Fonseca, Rasmus; Helles, Glennie; Winter, Pawel

    2011-01-01

    One reason why ab initio protein structure predictors do not perform very well is their inability to reliably identify long-range interactions between amino acids. To achieve reliable long-range interactions, all potential pairings of ß-strands (ß-topologies) of a given protein are enumerated......, including the native ß-topology. Two very different ß-topology scoring methods from the literature are then used to rank all potential ß-topologies. This has not previously been attempted for any scoring method. The main result of this paper is a justification that one of the scoring methods, in particular...... of this paper is a method to deal with the inaccuracies of secondary structure predictors when enumerating potential ß-topologies. The results reported in this paper are highly relevant for ab initio protein structure prediction methods based on decoy generation. They indicate that decoy generation can...

  4. Predicting oligonucleotide-directed mutagenesis failures in protein engineering.

    Science.gov (United States)

    Wassman, Christopher D; Tam, Phillip Y; Lathrop, Richard H; Weiss, Gregory A

    2004-01-01

    Protein engineering uses oligonucleotide-directed mutagenesis to modify DNA sequences through a two-step process of hybridization and enzymatic synthesis. Inefficient reactions confound attempts to introduce mutations, especially for the construction of vast combinatorial protein libraries. This paper applied computational approaches to the problem of inefficient mutagenesis. Several results implicated oligonucleotide annealing to non-target sites, termed 'cross-hybridization', as a significant contributor to mutagenesis reaction failures. Test oligonucleotides demonstrated control over reaction outcomes. A novel cross-hybridization score, quickly computable for any plasmid and oligonucleotide mixture, directly correlated with yields of deleterious mutagenesis side products. Cross-hybridization was confirmed conclusively by partial incorporation of an oligonucleotide at a predicted cross-hybridization site, and by modification of putative template secondary structure to control cross-hybridization. Even in low concentrations, cross-hybridizing species in mixtures poisoned reactions. These results provide a basis for improved mutagenesis efficiencies and increased diversities of cognate protein libraries.

  5. Prediction and systematic study of protein-protein interaction networks of Leptospira interrogans

    Institute of Scientific and Technical Information of China (English)

    SUN Jingchun; XU Jinlin; CAO Jianping; LIU Qi; GUO Xiaokui; SHI Tieliu; LI Yixue

    2006-01-01

    Leptospira interrogans serovar Lai is a pathogenic bacterium that causes a spirochetal zoonosis in humans and some animals. With its complete genome sequence available, it is possible to analyze protein-protein interactions from a whole- genome standpoint. Here we combine four recently developed computational approaches (gene fusion method, gene neighbor method, phylogenetic profiles method, and operon method) to predict protein-pro- tein interaction networks of Leptospira interrogans strain Lai. Through comprehensive analysis on in- teractions among proteins of motility and chemotaxis system, signal transduction, lipopolysaccaride bio- synthesis and a series of proteins related to adhesion and invasion, we provided information for further studying on its pathogenic mechanism. In addition, we also assigned 203 previously uncharacterized proteins with possible functions based on the known functions of its interacting partners. This work is helpful for further investigating L. interrogans strain Lai.

  6. Recent results from the Swinburne supercomputer software correlator

    Science.gov (United States)

    Tingay, Steven; et al.

    I will descrcibe the development of software correlators on the Swinburne Beowulf supercomputer and recent work using the Cray XD-1 machine. I will also describe recent Australian and global VLBI experiments that have been processed on the Swinburne software correlator, along with imaging results from these data. The role of the software correlator in Australia's eVLBI project will be discussed.

  7. Flux-Level Transit Injection Experiments with NASA Pleiades Supercomputer

    Science.gov (United States)

    Li, Jie; Burke, Christopher J.; Catanzarite, Joseph; Seader, Shawn; Haas, Michael R.; Batalha, Natalie; Henze, Christopher; Christiansen, Jessie; Kepler Project, NASA Advanced Supercomputing Division

    2016-06-01

    Flux-Level Transit Injection (FLTI) experiments are executed with NASA's Pleiades supercomputer for the Kepler Mission. The latest release (9.3, January 2016) of the Kepler Science Operations Center Pipeline is used in the FLTI experiments. Their purpose is to validate the Analytic Completeness Model (ACM), which can be computed for all Kepler target stars, thereby enabling exoplanet occurrence rate studies. Pleiades, a facility of NASA's Advanced Supercomputing Division, is one of the world's most powerful supercomputers and represents NASA's state-of-the-art technology. We discuss the details of implementing the FLTI experiments on the Pleiades supercomputer. For example, taking into account that ~16 injections are generated by one core of the Pleiades processors in an hour, the “shallow” FLTI experiment, in which ~2000 injections are required per target star, can be done for 16% of all Kepler target stars in about 200 hours. Stripping down the transit search to bare bones, i.e. only searching adjacent high/low periods at high/low pulse durations, makes the computationally intensive FLTI experiments affordable. The design of the FLTI experiments and the analysis of the resulting data are presented in “Validating an Analytic Completeness Model for Kepler Target Stars Based on Flux-level Transit Injection Experiments” by Catanzarite et al. (#2494058).Kepler was selected as the 10th mission of the Discovery Program. Funding for the Kepler Mission has been provided by the NASA Science Mission Directorate.

  8. Access to Supercomputers. Higher Education Panel Report 69.

    Science.gov (United States)

    Holmstrom, Engin Inel

    This survey was conducted to provide the National Science Foundation with baseline information on current computer use in the nation's major research universities, including the actual and potential use of supercomputers. Questionnaires were sent to 207 doctorate-granting institutions; after follow-ups, 167 institutions (91% of the institutions…

  9. The Sky's the Limit When Super Students Meet Supercomputers.

    Science.gov (United States)

    Trotter, Andrew

    1991-01-01

    In a few select high schools in the U.S., supercomputers are allowing talented students to attempt sophisticated research projects using simultaneous simulations of nature, culture, and technology not achievable by ordinary microcomputers. Schools can get their students online by entering contests and seeking grants and partnerships with…

  10. Distance matrix-based approach to protein structure prediction.

    Science.gov (United States)

    Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr

    2009-03-01

    Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the

  11. INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS

    Energy Technology Data Exchange (ETDEWEB)

    De, K [University of Texas at Arlington; Jha, S [Rutgers University; Maeno, T [Brookhaven National Laboratory (BNL); Mashinistov, R. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Nilsson, P [Brookhaven National Laboratory (BNL); Novikov, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Oleynik, D [University of Texas at Arlington; Panitkin, S [Brookhaven National Laboratory (BNL); Poyda, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Ryabinkin, E. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Teslyuk, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Tsulaia, V. [Lawrence Berkeley National Laboratory (LBNL); Velikhov, V. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Wen, G. [University of Wisconsin, Madison; Wells, Jack C [ORNL; Wenaus, T [Brookhaven National Laboratory (BNL)

    2016-01-01

    Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data cen- ters are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Com- puting Facility (OLCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single- threaded workloads in parallel on Titan s multi-core worker nodes. This implementation was tested with a variety of

  12. Integration of Panda Workload Management System with supercomputers

    Science.gov (United States)

    De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.

    2016-09-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads

  13. Sequence-based prediction of protein protein interaction using a deep-learning algorithm.

    Science.gov (United States)

    Sun, Tanlin; Zhou, Bo; Lai, Luhua; Pei, Jianfeng

    2017-05-25

    Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested. We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods. To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.

  14. On the analysis of protein-protein interactions via knowledge-based potentials for the prediction of protein-protein docking

    DEFF Research Database (Denmark)

    Feliu, Elisenda; Aloy, Patrick; Oliva, Baldo

    2011-01-01

    Development of effective methods to screen binary interactions obtained by rigid-body protein-protein docking is key for structure prediction of complexes and for elucidating physicochemical principles of protein-protein binding. We have derived empirical knowledge-based potential functions...... results were compared with a residue-pair potential scoring function (RPScore) and an atomic-detailed scoring function (Zrank). We have combined knowledge-based potentials to score protein-protein poses of decoys of complexes classified either as transient or as permanent protein-protein interactions...

  15. Brainstorming: weighted voting prediction of inhibitors for protein targets.

    Science.gov (United States)

    Plewczynski, Dariusz

    2011-09-01

    The "Brainstorming" approach presented in this paper is a weighted voting method that can improve the quality of predictions generated by several machine learning (ML) methods. First, an ensemble of heterogeneous ML algorithms is trained on available experimental data, then all solutions are gathered and a consensus is built between them. The final prediction is performed using a voting procedure, whereby the vote of each method is weighted according to a quality coefficient calculated using multivariable linear regression (MLR). The MLR optimization procedure is very fast, therefore no additional computational cost is introduced by using this jury approach. Here, brainstorming is applied to selecting actives from large collections of compounds relating to five diverse biological targets of medicinal interest, namely HIV-reverse transcriptase, cyclooxygenase-2, dihydrofolate reductase, estrogen receptor, and thrombin. The MDL Drug Data Report (MDDR) database was used for selecting known inhibitors for these protein targets, and experimental data was then used to train a set of machine learning methods. The benchmark dataset (available at http://bio.icm.edu.pl/∼darman/chemoinfo/benchmark.tar.gz ) can be used for further testing of various clustering and machine learning methods when predicting the biological activity of compounds. Depending on the protein target, the overall recall value is raised by at least 20% in comparison to any single machine learning method (including ensemble methods like random forest) and unweighted simple majority voting procedures.

  16. TOUCHSTONE: a unified approach to protein structure prediction.

    Science.gov (United States)

    Skolnick, Jeffrey; Zhang, Yang; Arakaki, Adrian K; Kolinski, Andrzej; Boniecki, Michal; Szilágyi, András; Kihara, Daisuke

    2003-01-01

    We have applied the TOUCHSTONE structure prediction algorithm that spans the range from homology modeling to ab initio folding to all protein targets in CASP5. Using our threading algorithm PROSPECTOR that does not utilize input from metaservers, one threads against a representative set of PDB templates. If a template is significantly hit, Generalized Comparative Modeling designed to span the range from closely to distantly related proteins from the template is done. This involves freezing the aligned regions and relaxing the remaining structure to accommodate insertions or deletions with respect to the template. For all targets, consensus predicted side chain contacts from at least weakly threading templates are pooled and incorporated into ab initio folding. Often, TOUCHSTONE performs well in the CM to FR categories, with PROSPECTOR showing significant ability to identify analogous templates. When ab initio folding is done, frequently the best models are closer to the native state than the initial template. Among the particularly good predictions are T0130 in the CM/FR category, T0138 in the FR(H) category, T0135 in the FR(A) category, T0170 in the FR/NF category and T0181 in the NF category. Improvements in the approach are needed in the FR/NF and NF categories. Nevertheless, TOUCHSTONE was one of the best performing algorithms over all categories in CASP5.

  17. Factors influencing protein tyrosine nitration--structure-based predictive models.

    Science.gov (United States)

    Bayden, Alexander S; Yakovlev, Vasily A; Graves, Paul R; Mikkelsen, Ross B; Kellogg, Glen E

    2011-03-15

    Models for exploring tyrosine nitration in proteins have been created based on 3D structural features of 20 proteins for which high-resolution X-ray crystallographic or NMR data are available and for which nitration of 35 total tyrosines has been experimentally proven under oxidative stress. Factors suggested in previous work to enhance nitration were examined with quantitative structural descriptors. The role of neighboring acidic and basic residues is complex: for the majority of tyrosines that are nitrated the distance to the heteroatom of the closest charged side chain corresponds to the distance needed for suspected nitrating species to form hydrogen bond bridges between the tyrosine and that charged amino acid. This suggests that such bridges play a very important role in tyrosine nitration. Nitration is generally hindered for tyrosines that are buried and for those tyrosines for which there is insufficient space for the nitro group. For in vitro nitration, closed environments with nearby heteroatoms or unsaturated centers that can stabilize radicals are somewhat favored. Four quantitative structure-based models, depending on the conditions of nitration, have been developed for predicting site-specific tyrosine nitration. The best model, relevant for both in vitro and in vivo cases, predicts 30 of 35 tyrosine nitrations (positive predictive value) and has a sensitivity of 60/71 (11 false positives). Copyright © 2010 Elsevier Inc. All rights reserved.

  18. Prediction of Peptide and Protein Propensity for Amyloid Formation.

    Directory of Open Access Journals (Sweden)

    Carlos Família

    Full Text Available Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea. Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation.

  19. Automated protein function prediction--the genomic challenge.

    Science.gov (United States)

    Friedberg, Iddo

    2006-09-01

    Overwhelmed with genomic data, biologists are facing the first big post-genomic question--what do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a functional annotation which is standardized and machine readable so that function prediction programs could be incorporated into larger workflows. This is problematic due to the subjective and contextual definition of protein function. Third, there is a need to assess the quality of function predictors. Again, the subjectivity of the term 'function' and the various aspects of biological function make this a challenging effort. This article briefly outlines the history of automated protein function prediction and surveys the latest innovations in all three topics.

  20. HHomp—prediction and classification of outer membrane proteins

    Science.gov (United States)

    Remmert, Michael; Linke, Dirk; Lupas, Andrei N.; Söding, Johannes

    2009-01-01

    Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all β-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20 000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at http://toolkit.tuebingen.mpg.de/hhomp. PMID:19429691

  1. HHomp--prediction and classification of outer membrane proteins.

    Science.gov (United States)

    Remmert, Michael; Linke, Dirk; Lupas, Andrei N; Söding, Johannes

    2009-07-01

    Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all beta-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20,000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at http://toolkit.tuebingen.mpg.de/hhomp.

  2. Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction

    Directory of Open Access Journals (Sweden)

    Fofanov Viacheslav Y

    2010-05-01

    Full Text Available Abstract Background Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. Results This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST method uses all-against-all substructure comparison to determine Substructural Clusters (SCs. SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. Conclusions FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated

  3. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  4. Detecting reliable non interacting proteins (NIPs) significantly enhancing the computational prediction of protein-protein interactions using machine learning methods.

    Science.gov (United States)

    Srivastava, A; Mazzocco, G; Kel, A; Wyrwicz, L S; Plewczynski, D

    2016-03-01

    Protein-protein interactions (PPIs) play a vital role in most biological processes. Hence their comprehension can promote a better understanding of the mechanisms underlying living systems. However, besides the cost and the time limitation involved in the detection of experimentally validated PPIs, the noise in the data is still an important issue to overcome. In the last decade several in silico PPI prediction methods using both structural and genomic information were developed for this purpose. Here we introduce a unique validation approach aimed to collect reliable non interacting proteins (NIPs). Thereafter the most relevant protein/protein-pair related features were selected. Finally, the prepared dataset was used for PPI classification, leveraging the prediction capabilities of well-established machine learning methods. Our best classification procedure displayed specificity and sensitivity values of 96.33% and 98.02%, respectively, surpassing the prediction capabilities of other methods, including those trained on gold standard datasets. We showed that the PPI/NIP predictive performances can be considerably improved by focusing on data preparation.

  5. The origins of the evolutionary signal used to predict protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Swapna Lakshmipuram S

    2012-12-01

    Full Text Available Abstract Background The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods, many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis. Results In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence. Conclusions Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.

  6. Predicting the protein targets for athletic performance-enhancing substances.

    Science.gov (United States)

    Mavridis, Lazaros; Mitchell, John Bo

    2013-06-25

    The World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport. The ChEMBL database was screened and eight well populated categories of activities (Ki, Kd, EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels "active" or "inactive". The "active" compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL. We have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets

  7. Effects of protein conformation in docking: improved pose prediction through protein pocket adaptation.

    Science.gov (United States)

    Jain, Ajay N

    2009-06-01

    Computational methods for docking ligands have been shown to be remarkably dependent on precise protein conformation, where acceptable results in pose prediction have been generally possible only in the artificial case of re-docking a ligand into a protein binding site whose conformation was determined in the presence of the same ligand (the "cognate" docking problem). In such cases, on well curated protein/ligand complexes, accurate dockings can be returned as top-scoring over 75% of the time using tools such as Surflex-Dock. A critical application of docking in modeling for lead optimization requires accurate pose prediction for novel ligands, ranging from simple synthetic analogs to very different molecular scaffolds. Typical results for widely used programs in the "cross-docking case" (making use of a single fixed protein conformation) have rates closer to 20% success. By making use of protein conformations from multiple complexes, Surflex-Dock yields an average success rate of 61% across eight pharmaceutically relevant targets. Following docking, protein pocket adaptation and rescoring identifies single pose families that are correct an average of 67% of the time. Consideration of the best of two pose families (from alternate scoring regimes) yields a 75% mean success rate.

  8. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    Directory of Open Access Journals (Sweden)

    Colin A Smith

    Full Text Available Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface, interactions between and within parts of the structure (e.g. domains can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  9. Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

    Directory of Open Access Journals (Sweden)

    Huiying Zhao

    Full Text Available As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions. A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC of 0.77 with high precision (94% and high sensitivity (65%. We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA] is available as an on-line server at http://sparks-lab.org.

  10. Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures.

    Science.gov (United States)

    Damm, Kelly L; Carlson, Heather A

    2006-06-15

    Many proteins contain flexible structures such as loops and hinged domains. A simple root mean square deviation (RMSD) alignment of two different conformations of the same protein can be skewed by the difference between the mobile regions. To overcome this problem, we have developed a novel method to overlay two protein conformations by their atomic coordinates using a Gaussian-weighted RMSD (wRMSD) fit. The algorithm is based on the Kabsch least-squares method and determines an optimal transformation between two molecules by calculating the minimal weighted deviation between the two coordinate sets. Unlike other techniques that choose subsets of residues to overlay, all atoms are included in the wRMSD overlay. Atoms that barely move between the two conformations will have a greater weighting than those that have a large displacement. Our superposition tool has produced successful alignments when applied to proteins for which two conformations are known. The transformation calculation is heavily weighted by the coordinates of the static region of the two conformations, highlighting the range of flexibility in the overlaid structures. Lastly, we show how wRMSD fits can be used to evaluate predicted protein structures. Comparing a predicted fold to its experimentally determined target structure is another case of comparing two protein conformations of the same sequence, and the degree of alignment directly reflects the quality of the prediction.

  11. Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

    Directory of Open Access Journals (Sweden)

    Chuanhua Xing

    2011-07-01

    Full Text Available Protein-protein interactions (PPIs are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL, to lower the misclassification rate (both false positives and negatives through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.

  12. Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

    Science.gov (United States)

    Xing, Chuanhua; Dunson, David B

    2011-07-01

    Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.

  13. NNcon: improved protein contact map prediction using 2D-recursive neural networks.

    Science.gov (United States)

    Tegge, Allison N; Wang, Zheng; Eickholt, Jesse; Cheng, Jianlin

    2009-07-01

    Protein contact map prediction is useful for protein folding rate prediction, model selection and 3D structure prediction. Here we describe NNcon, a fast and reliable contact map prediction server and software. NNcon was ranked among the most accurate residue contact predictors in the Eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. Both NNcon server and software are available at http://casp.rnet.missouri.edu/nncon.html.

  14. Supercomputers and biological sequence comparison algorithms.

    Science.gov (United States)

    Core, N G; Edmiston, E W; Saltz, J H; Smith, R M

    1989-12-01

    Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax.

  15. Prediction and characterization of protein-protein interaction networks in swine

    Directory of Open Access Journals (Sweden)

    Wang Fen

    2012-01-01

    Full Text Available Abstract Background Studying the large-scale protein-protein interaction (PPI network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes. Results We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively. Conclusion The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/.

  16. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    Science.gov (United States)

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.

  17. Applications of parallel supercomputers: Scientific results and computer science lessons

    Energy Technology Data Exchange (ETDEWEB)

    Fox, G.C.

    1989-07-12

    Parallel Computing has come of age with several commercial and inhouse systems that deliver supercomputer performance. We illustrate this with several major computations completed or underway at Caltech on hypercubes, transputer arrays and the SIMD Connection Machine CM-2 and AMT DAP. Applications covered are lattice gauge theory, computational fluid dynamics, subatomic string dynamics, statistical and condensed matter physics,theoretical and experimental astronomy, quantum chemistry, plasma physics, grain dynamics, computer chess, graphics ray tracing, and Kalman filters. We use these applications to compare the performance of several advanced architecture computers including the conventional CRAY and ETA-10 supercomputers. We describe which problems are suitable for which computers in the terms of a matching between problem and computer architecture. This is part of a set of lessons we draw for hardware, software, and performance. We speculate on the emergence of new academic disciplines motivated by the growing importance of computers. 138 refs., 23 figs., 10 tabs.

  18. Extending ATLAS Computing to Commercial Clouds and Supercomputers

    CERN Document Server

    Nilsson, P; The ATLAS collaboration; Filipcic, A; Klimentov, A; Maeno, T; Oleynik, D; Panitkin, S; Wenaus, T; Wu, W

    2014-01-01

    The Large Hadron Collider will resume data collection in 2015 with substantially increased computing requirements relative to its first 2009-2013 run. A near doubling of the energy and the data rate, high level of event pile-up, and detector upgrades will mean the number and complexity of events to be analyzed will increase dramatically. A naive extrapolation of the Run 1 experience would suggest that a 5-6 fold increase in computing resources are needed - impossible within the anticipated flat computing budgets in the near future. Consequently ATLAS is engaged in an ambitious program to expand its computing to all available resources, notably including opportunistic use of commercial clouds and supercomputers. Such resources present new challenges in managing heterogeneity, supporting data flows, parallelizing workflows, provisioning software, and other aspects of distributed computing, all while minimizing operational load. We will present the ATLAS experience to date with clouds and supercomputers, and des...

  19. Integration of Titan supercomputer at OLCF with ATLAS production system

    CERN Document Server

    Panitkin, Sergey; The ATLAS collaboration

    2016-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this talk we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job...

  20. Training set reduction methods for protein secondary structure prediction in single-sequence condition

    OpenAIRE

    2007-01-01

    Orphan proteins are characterized by the lack of significant sequence similarity to database proteins. To infer the functional properties of the orphans, more elaborate techniques that utilize structural information are required. In this regard, the protein structure prediction gains considerable importance. Secondary structure prediction algorithms designed for orphan proteins (also known as single-sequence algorithms) cannot utilize multiple alignments or alignment prof...

  1. Prediction and evaluation of protein farnesyltransferase inhibition by commercial drugs

    Science.gov (United States)

    DeGraw, Amanda J.; Keiser, Michael J.; Ochocki, Joshua D.; Shoichet, Brian K.; Distefano, Mark D.

    2010-01-01

    The Similarity Ensemble Approach (SEAa) relates proteins based on the set-wise chemical similarity among their ligands. It can be used to rapidly search large compound databases and to build cross-target similarity maps. The emerging maps relate targets in ways that reveal relationships one might not recognize based on sequence or structural similarities alone. SEA has previously revealed cross talk between drugs acting primarily on G-protein coupled receptors (GPCRs). Here we used SEA to look for potential off-target inhibition of the enzyme protein farnesyltransferase (PFTase) by commercially available drugs. The inhibition of PFTase has profound consequences for oncogenesis, as well as a number of other diseases. In the present study, two commercial drugs, Loratadine and Miconazole, were identified as potential ligands for PFTase and subsequently confirmed as such experimentally. These results point towards the applicability of SEA for the prediction of not only GPCR-GPCR drug cross talk, but also GPCR-enzyme and enzyme-enzyme drug cross talk. PMID:20180535

  2. Frequently updated noise threat maps created with use of supercomputing grid

    Directory of Open Access Journals (Sweden)

    Szczodrak Maciej

    2014-09-01

    Full Text Available An innovative supercomputing grid services devoted to noise threat evaluation were presented. The services described in this paper concern two issues, first is related to the noise mapping, while the second one focuses on assessment of the noise dose and its influence on the human hearing system. The discussed serviceswere developed within the PL-Grid Plus Infrastructure which accumulates Polish academic supercomputer centers. Selected experimental results achieved by the usage of the services proposed were presented. The assessment of the environmental noise threats includes creation of the noise maps using either ofline or online data, acquired through a grid of the monitoring stations. A concept of estimation of the source model parameters based on the measured sound level for the purpose of creating frequently updated noise maps was presented. Connecting the noise mapping grid service with a distributed sensor network enables to automatically update noise maps for a specified time period. Moreover, a unique attribute of the developed software is the estimation of the auditory effects evoked by the exposure to noise. The estimation method uses a modified psychoacoustic model of hearing and is based on the calculated noise level values and on the given exposure period. Potential use scenarios of the grid services for research or educational purpose were introduced. Presentation of the results of predicted hearing threshold shift caused by exposure to excessive noise can raise the public awareness of the noise threats.

  3. Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Bethel, E. Wes; Leinweber, David; Ruebel, Oliver; Wu, Kesheng

    2011-09-16

    This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports to slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions.

  4. Supercomputers ready for use as discovery machines for neuroscience

    OpenAIRE

    Kunkel, Susanne; Schmidt, Maximilian; Helias, Moritz; Eppler, Jochen Martin; Igarashi, Jun; Masumoto, Gen; Fukai, Tomoki; Ishii, Shin; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus

    2013-01-01

    NEST is a widely used tool to simulate biological spiking neural networks [1]. The simulator is subject to continuous development, which is driven by the requirements of the current neuroscientific questions. At present, a major part of the software development focuses on the improvement of the simulator's fundamental data structures in order to enable brain-scale simulations on supercomputers such as the Blue Gene system in Jülich and the K computer in Kobe. Based on our memory-u...

  5. Scientists turn to supercomputers for knowledge about universe

    CERN Multimedia

    White, G

    2003-01-01

    The DOE is funding the computers at the Center for Astrophysical Thermonuclear Flashes which is based at the University of Chicago and uses supercomputers at the nation's weapons labs to study explosions in and on certain stars. The DOE is picking up the project's bill in the hope that the work will help the agency learn to better simulate the blasts of nuclear warheads (1 page).

  6. High Performance Networks From Supercomputing to Cloud Computing

    CERN Document Server

    Abts, Dennis

    2011-01-01

    Datacenter networks provide the communication substrate for large parallel computer systems that form the ecosystem for high performance computing (HPC) systems and modern Internet applications. The design of new datacenter networks is motivated by an array of applications ranging from communication intensive climatology, complex material simulations and molecular dynamics to such Internet applications as Web search, language translation, collaborative Internet applications, streaming video and voice-over-IP. For both Supercomputing and Cloud Computing the network enables distributed applicati

  7. Study of ATLAS TRT performance with GRID and supercomputers

    Science.gov (United States)

    Krasnopevtsev, D. V.; Klimentov, A. A.; Mashinistov, R. Yu.; Belyaev, N. L.; Ryabinkin, E. A.

    2016-09-01

    One of the most important studies dedicated to be solved for ATLAS physical analysis is a reconstruction of proton-proton events with large number of interactions in Transition Radiation Tracker. Paper includes Transition Radiation Tracker performance results obtained with the usage of the ATLAS GRID and Kurchatov Institute's Data Processing Center including Tier-1 grid site and supercomputer as well as analysis of CPU efficiency during these studies.

  8. Dynamic circadian protein-protein interaction networks predict temporal organization of cellular functions.

    Directory of Open Access Journals (Sweden)

    Thomas Wallach

    2013-03-01

    Full Text Available Essentially all biological processes depend on protein-protein interactions (PPIs. Timing of such interactions is crucial for regulatory function. Although circadian (~24-hour clocks constitute fundamental cellular timing mechanisms regulating important physiological processes, PPI dynamics on this timescale are largely unknown. Here, we identified 109 novel PPIs among circadian clock proteins via a yeast-two-hybrid approach. Among them, the interaction of protein phosphatase 1 and CLOCK/BMAL1 was found to result in BMAL1 destabilization. We constructed a dynamic circadian PPI network predicting the PPI timing using circadian expression data. Systematic circadian phenotyping (RNAi and overexpression suggests a crucial role for components involved in dynamic interactions. Systems analysis of a global dynamic network in liver revealed that interacting proteins are expressed at similar times likely to restrict regulatory interactions to specific phases. Moreover, we predict that circadian PPIs dynamically connect many important cellular processes (signal transduction, cell cycle, etc. contributing to temporal organization of cellular physiology in an unprecedented manner.

  9. Prediction of thermodynamic instabilities of protein solutions from simple protein–protein interactions

    Energy Technology Data Exchange (ETDEWEB)

    D’Agostino, Tommaso [Dipartimento di Fisica, Università di Palermo, Via Archirafi 36, 90123 Palermo (Italy); Solana, José Ramón [Departamento de Física Aplicada, Universidad de Cantabria, 39005 Santander (Spain); Emanuele, Antonio, E-mail: antonio.emanuele@unipa.it [Dipartimento di Fisica, Università di Palermo, Via Archirafi 36, 90123 Palermo (Italy)

    2013-10-16

    Highlights: ► We propose a model of effective protein–protein interaction embedding solvent effects. ► A previous square-well model is enhanced by giving to the interaction a free energy character. ► The temperature dependence of the interaction is due to entropic effects of the solvent. ► The validity of the original SW model is extended to entropy driven phase transitions. ► We get good fits for lysozyme and haemoglobin spinodal data taken from literature. - Abstract: Statistical thermodynamics of protein solutions is often studied in terms of simple, microscopic models of particles interacting via pairwise potentials. Such modelling can reproduce the short range structure of protein solutions at equilibrium and predict thermodynamics instabilities of these systems. We introduce a square well model of effective protein–protein interaction that embeds the solvent’s action. We modify an existing model [45] by considering a well depth having an explicit dependence on temperature, i.e. an explicit free energy character, thus encompassing the statistically relevant configurations of solvent molecules around proteins. We choose protein solutions exhibiting demixing upon temperature decrease (lysozyme, enthalpy driven) and upon temperature increase (haemoglobin, entropy driven). We obtain satisfactory fits of spinodal curves for both the two proteins without adding any mean field term, thus extending the validity of the original model. Our results underline the solvent role in modulating or stretching the interaction potential.

  10. From Thread to Transcontinental Computer: Disturbing Lessons in Distributed Supercomputing

    CERN Document Server

    Groen, Derek

    2015-01-01

    We describe the political and technical complications encountered during the astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation of large scale structure in the universe. The simulations are challenging due to the enormous dynamic range in spatial and temporal coordinates, as well as the enormous computer resources required. In CosmoGrid we dealt with the computational requirements by connecting up to four supercomputers via an optical network and make them operate as a single machine. This was challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as three of them are located scattered across Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and the 'gridification' of the code enabled us to achieve an efficiency of up to $93\\%$ for this distributed intercontinental supercomputer. In this work, we find that high-performance computing on a grid can be done much more effectively if the sites involved are will...

  11. Proceedings of the first energy research power supercomputer users symposium

    Energy Technology Data Exchange (ETDEWEB)

    1991-01-01

    The Energy Research Power Supercomputer Users Symposium was arranged to showcase the richness of science that has been pursued and accomplished in this program through the use of supercomputers and now high performance parallel computers over the last year: this report is the collection of the presentations given at the Symposium. Power users'' were invited by the ER Supercomputer Access Committee to show that the use of these computational tools and the associated data communications network, ESNet, go beyond merely speeding up computations. Today the work often directly contributes to the advancement of the conceptual developments in their fields and the computational and network resources form the very infrastructure of today's science. The Symposium also provided an opportunity, which is rare in this day of network access to computing resources, for the invited users to compare and discuss their techniques and approaches with those used in other ER disciplines. The significance of new parallel architectures was highlighted by the interesting evening talk given by Dr. Stephen Orszag of Princeton University.

  12. Porting Ordinary Applications to Blue Gene/Q Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy; Katz, Daniel S.; Binkowski, T. Andrew; Zhong, Xiaoliang; Heinonen, Olle; Karpeyev, Dmitry; Wilde, Michael

    2015-08-31

    Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt's sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.

  13. Extracting the Textual and Temporal Structure of Supercomputing Logs

    Energy Technology Data Exchange (ETDEWEB)

    Jain, S; Singh, I; Chandra, A; Zhang, Z; Bronevetsky, G

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.

  14. Systematic Prediction of Scaffold Proteins Reveals New Design Principles in Scaffold-Mediated Signal Transduction

    Science.gov (United States)

    Hu, Jianfei; Neiswinger, Johnathan; Zhang, Jin; Zhu, Heng; Qian, Jiang

    2015-01-01

    Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process. PMID:26393507

  15. Systematic Prediction of Scaffold Proteins Reveals New Design Principles in Scaffold-Mediated Signal Transduction.

    Directory of Open Access Journals (Sweden)

    Jianfei Hu

    Full Text Available Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process.

  16. Prediction of protein-protein interactions in dengue virus coat proteins guided by low resolution cryoEM structures

    Directory of Open Access Journals (Sweden)

    Srinivasan Narayanaswamy

    2010-06-01

    Full Text Available Abstract Background Dengue virus along with the other members of the flaviviridae family has reemerged as deadly human pathogens. Understanding the mechanistic details of these infections can be highly rewarding in developing effective antivirals. During maturation of the virus inside the host cell, the coat proteins E and M undergo conformational changes, altering the morphology of the viral coat. However, due to low resolution nature of the available 3-D structures of viral assemblies, the atomic details of these changes are still elusive. Results In the present analysis, starting from Cα positions of low resolution cryo electron microscopic structures the residue level details of protein-protein interaction interfaces of dengue virus coat proteins have been predicted. By comparing the preexisting structures of virus in different phases of life cycle, the changes taking place in these predicted protein-protein interaction interfaces were followed as a function of maturation process of the virus. Besides changing the current notion about the presence of only homodimers in the mature viral coat, the present analysis indicated presence of a proline-rich motif at the protein-protein interaction interface of the coat protein. Investigating the conservation status of these seemingly functionally crucial residues across other members of flaviviridae family enabled dissecting common mechanisms used for infections by these viruses. Conclusions Thus, using computational approach the present analysis has provided better insights into the preexisting low resolution structures of virus assemblies, the findings of which can be made use of in designing effective antivirals against these deadly human pathogens.

  17. Comparing human-Salmonella with plant-Salmonella protein-protein interaction predictions

    Directory of Open Access Journals (Sweden)

    Sylvia eSchleker

    2015-01-01

    Full Text Available Salmonellosis is the most frequent food-borne disease world-wide and can be transmitted to humans by a variety of routes, especially via animal and plant products. Salmonella bacteria are believed to use not only animal and human but also plant hosts despite their evolutionary distance. This raises the question if Salmonella employs similar mechanisms in infection of these diverse hosts. Given that most of our understanding comes from its interaction with human hosts, we investigate here to what degree knowledge of Salmonella-human interactions can be transferred to the Salmonella-plant system. Reviewed are recent publications on analysis and prediction of Salmonella-host interactomes. Putative protein-protein interactions (PPIs between Salmonella and its human and Arabidopsis hosts were retrieved utilizing purely interolog-based approaches in which predictions were inferred based on available sequence and domain information of known PPIs, and machine learning approaches that integrate a larger set of useful information from different sources. Transfer learning is an especially suitable machine learning technique to predict plant host targets from the knowledge of human host targets. A comparison of the prediction results with transcriptomic data shows a clear overlap between the host proteins predicted to be targeted by PPIs and their gene ontology enrichment in both host species and regulation of gene expression. In particular, the cellular processes Salmonella interferes with in plants and humans are catabolic processes. The details of how these processes are targeted, however, are quite different between the two organisms, as expected based on their evolutionary and habitat differences. Possible implications of this observation on evolution of host-pathogen communication are discussed.

  18. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords

    Directory of Open Access Journals (Sweden)

    Shun Koyabu

    2015-01-01

    Full Text Available For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

  19. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.

    Science.gov (United States)

    Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao

    2015-01-01

    For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.

  20. Measurement and Prediction of Protein Phase Behaviour and Protein-Protein Interactions

    DEFF Research Database (Denmark)

    Faber, Cornelius

    2006-01-01

    blev desuden lavet overordnede fasediagrammer under udvalgte betingelser. Metoderne blev evalueret med det formål at reducere protein forbruget, også når der arbejdes med mindre rene enzymer. Til den eksperimentelle del af denne afhandling blev der brugt enzymkoncentrater af to rekombinante -amylaser...

  1. Positive maternal C-reactive protein predicts neonatal sepsis.

    Science.gov (United States)

    Jeon, Ji Hyun; Namgung, Ran; Park, Min Soo; Park, Koo In; Lee, Chul

    2014-01-01

    To evaluate the diagnostic performance of maternal inflammatory marker: C-reactive protein (CRP) in predicting early onset neonatal sepsis (that occurring within 72 hours after birth). 126 low birth weight newborns (gestation 32±3.2 wk, birth weight 1887±623 g) and their mothers were included. Neonates were divided into sepsis group (n=51) including both proven (positive blood culture) and suspected (negative blood culture but with more than 3 abnormal clinical signs), and controls (n=75). Mothers were subgrouped into CRP positive ≥1.22 mg/dL (n=48) and CRP negative neonatal sepsis according to maternal condition. Maternal CRP was significantly higher in neonatal sepsis group than in control (3.55±2.69 vs. 0.48±0.31 mg/dL, p=0.0001). Maternal CRP (cutoff value >1.22 mg/dL) had sensitivity 71% and specificity 84% for predicting neonatal sepsis. Maternal CRP positive group had more neonatal sepsis than CRP negative group (71% vs. 29%, pneonatal sepsis in maternal CRP positive group versus CRP negative group was 10.68 (95% confidence interval: 4.313-26.428, pneonatal sepsis significantly increased in the case of positive maternal CRP (≥1.22 mg/dL). In newborn of CRP positive mother, the clinician may be alerted to earlier evaluation for possible neonatal infection prior to development of sepsis.

  2. How proteins get in touch: interface prediction in the study of biomolecular complexes

    NARCIS (Netherlands)

    de Vries, S.J.|info:eu-repo/dai/nl/304837717; Bonvin, A.M.J.J.|info:eu-repo/dai/nl/113691238

    2008-01-01

    Protein-protein interface prediction is a booming field, with a substantial growth in the number of new methods being published the last two years. The increasing number of available three-dimensional structures of protein-protein complexes has enabled large-scale statistical analyses of protein

  3. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression

    Directory of Open Access Journals (Sweden)

    Vandepoele Klaas

    2009-06-01

    Full Text Available Abstract Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization and components (e.g. ARPs, actin-related proteins exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

  4. Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.

    Science.gov (United States)

    Shi, Ming-Guang; Xia, Jun-Feng; Li, Xue-Ling; Huang, De-Shuang

    2010-03-01

    Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.

  5. Topology prediction of helical transmembrane proteins: how far have we reached?

    Science.gov (United States)

    Tusnády, Gábor E; Simon, István

    2010-11-01

    Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.

  6. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  7. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    OpenAIRE

    A. Gunzinger; BÄumle, B.; Frey, M.; Klebl, M.; Kocheisen, M.; Kohler, P.; Morel, R.; Müller, U; Rosenthal, M

    1996-01-01

    At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH) in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication) has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of...

  8. The Evolutionary Computation Techniques for Protein Structure Prediction:A Survey

    Institute of Scientific and Technical Information of China (English)

    Zou Xiu-fen; Pan Zi-shu; Kang Li-shan; Zhang Chu-yu

    2003-01-01

    In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.

  9. Prediction of Protein-Protein Interaction By Metasample-Based Sparse Representation

    Directory of Open Access Journals (Sweden)

    Xiuquan Du

    2015-01-01

    Full Text Available Protein-protein interactions (PPIs play key roles in many cellular processes such as transcription regulation, cell metabolism, and endocrine function. Understanding these interactions takes a great promotion to the pathogenesis and treatment of various diseases. A large amount of data has been generated by experimental techniques; however, most of these data are usually incomplete or noisy, and the current biological experimental techniques are always very time-consuming and expensive. In this paper, we proposed a novel method (metasample-based sparse representation classification, MSRC for PPIs prediction. A group of metasamples are extracted from the original training samples and then use the l1-regularized least square method to express a new testing sample as the linear combination of these metasamples. PPIs prediction is achieved by using a discrimination function defined in the representation coefficients. The MSRC is applied to PPIs dataset; it achieves 84.9% sensitivity, and 94.55% specificity, which is slightly lower than support vector machine (SVM and much higher than naive Bayes (NB, neural networks (NN, and k-nearest neighbor (KNN. The result shows that the MSRC is efficient for PPIs prediction.

  10. Calciomics:prediction and analysis of EF-hand calcium binding proteins by protein engineering

    Institute of Scientific and Technical Information of China (English)

    YANG; Jenny; Jie

    2010-01-01

    Ca2+ plays a pivotal role in the physiology and biochemistry of prokaryotic and mammalian organisms.Viruses also utilize the universal Ca2+ signal to create a specific cellular environment to achieve coexistence with the host,and to propagate.In this paper we first describe our development of a grafting approach to understand site-specific Ca2+ binding properties of EF-hand proteins with a helix-loop-helix Ca2+ binding motif,then summarize our prediction and identification of EF-hand Ca2+ binding sites on a genome-wide scale in bacteria and virus,and next report the application of the grafting approach to probe the metal binding capability of predicted EF-hand motifs within the streptococcal hemoprotein receptor(Shr) of Streptococcus pyrogenes and the nonstructural protein 1(nsP1) of Sindbis virus.When methods such as the grafting approach are developed in conjunction with prediction algorithms we are better able to probe continuous Ca2+-binding sites that have been previously underrepresented due to the limitation of conventional methodology.

  11. PIE: an online prediction system for protein-protein interactions from text.

    Science.gov (United States)

    Kim, Sun; Shin, Soo-Yong; Lee, In-Hee; Kim, Soo-Jin; Sriram, Ram; Zhang, Byoung-Tak

    2008-07-01

    Protein-protein interaction (PPI) extraction has been an important research topic in bio-text mining area, since the PPI information is critical for understanding biological processes. However, there are very few open systems available on the Web and most of the systems focus on keyword searching based on predefined PPIs. PIE (Protein Interaction information Extraction system) is a configurable Web service to extract PPIs from literature, including user-provided papers as well as PubMed articles. After providing abstracts or papers, the prediction results are displayed in an easily readable form with essential, yet compact features. The PIE interface supports more features such as PDF file extraction, PubMed search tool and network communication, which are useful for biologists and bio-system developers. The PIE system utilizes natural language processing techniques and machine learning methodologies to predict PPI sentences, which results in high precision performance for Web users. PIE is freely available at http://bi.snu.ac.kr/pie/.

  12. Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Widener, Patrick (University of New Mexico); Jaconette, Steven (Northwestern University); Bridges, Patrick G. (University of New Mexico); Xia, Lei (Northwestern University); Dinda, Peter (Northwestern University); Cui, Zheng.; Lange, John (Northwestern University); Hudson, Trammell B.; Levenhagen, Michael J.; Pedretti, Kevin Thomas Tauke; Brightwell, Ronald Brian

    2009-09-01

    Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

  13. Large-scale integrated super-computing platform for next generation virtual drug discovery.

    Science.gov (United States)

    Mitchell, Wayne; Matsumoto, Shunji

    2011-08-01

    Traditional drug discovery starts by experimentally screening chemical libraries to find hit compounds that bind to protein targets, modulating their activity. Subsequent rounds of iterative chemical derivitization and rescreening are conducted to enhance the potency, selectivity, and pharmacological properties of hit compounds. Although computational docking of ligands to targets has been used to augment the empirical discovery process, its historical effectiveness has been limited because of the poor correlation of ligand dock scores and experimentally determined binding constants. Recent progress in super-computing, coupled to theoretical insights, allows the calculation of the Gibbs free energy, and therefore accurate binding constants, for usually large ligand-receptor systems. This advance extends the potential of virtual drug discovery. A specific embodiment of the technology, integrating de novo, abstract fragment based drug design, sophisticated molecular simulation, and the ability to calculate thermodynamic binding constants with unprecedented accuracy, are discussed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. Numerical simulations of astrophysical problems on massively parallel supercomputers

    Science.gov (United States)

    Kulikov, Igor; Chernykh, Igor; Glinsky, Boris

    2016-10-01

    In this paper, we propose the last version of the numerical model for simulation of astrophysical objects dynamics, and a new realization of our AstroPhi code for Intel Xeon Phi based RSC PetaStream supercomputers. The co-design of a computational model for the description of astrophysical objects is described. The parallel implementation and scalability tests of the AstroPhi code are presented. We achieve a 73% weak scaling efficiency with using of 256x Intel Xeon Phi accelerators with 61440 threads.

  15. AENEAS A Custom-built Parallel Supercomputer for Quantum Gravity

    CERN Document Server

    Hamber, H W

    1998-01-01

    Accurate Quantum Gravity calculations, based on the simplicial lattice formulation, are computationally very demanding and require vast amounts of computer resources. A custom-made 64-node parallel supercomputer capable of performing up to $2 \\times 10^{10}$ floating point operations per second has been assembled entirely out of commodity components, and has been operational for the last ten months. It will allow the numerical computation of a variety of quantities of physical interest in quantum gravity and related field theories, including the estimate of the critical exponents in the vicinity of the ultraviolet fixed point to an accuracy of a few percent.

  16. The Epc-N domain: a predicted protein-protein interaction domain found in select chromatin associated proteins

    Directory of Open Access Journals (Sweden)

    Perry Jason

    2006-01-01

    Full Text Available Abstract Background An underlying tenet of the epigenetic code hypothesis is the existence of protein domains that can recognize various chromatin structures. To date, two major candidates have emerged: (i the bromodomain, which can recognize certain acetylation marks and (ii the chromodomain, which can recognize certain methylation marks. Results The Epc-N (Enhancer of Polycomb-N-terminus domain is formally defined herein. This domain is conserved across eukaryotes and is predicted to form a right-handed orthogonal four-helix bundle with extended strands at both termini. The types of amino acid residues that define the Epc-N domain suggest a role in mediating protein-protein interactions, possibly specifically in the context of chromatin binding, and the types of proteins in which it is found (known components of histone acetyltransferase complexes strongly suggest a role in epigenetic structure formation and/or recognition. There appear to be two major Epc-N protein families that can be divided into four unique protein subfamilies. Two of these subfamilies (I and II may be related to one another in that subfamily I can be viewed as a plant-specific expansion of subfamily II. The other two subfamilies (III and IV appear to be related to one another by duplication events in a primordial fungal-metazoan-mycetozoan ancestor. Subfamilies III and IV are further defined by the presence of an evolutionarily conserved five-center-zinc-binding motif in the loop connecting the second and third helices of the four-helix bundle. This motif appears to consist of a PHD followed by a mononuclear Zn knuckle, followed by a PHD-like derivative, and will thus be referred to as the PZPM. All non-Epc-N proteins studied thus far that contain the PZPM have been implicated in histone methylation and/or gene silencing. In addition, an unusual phyletic distribution of Epc-N-containing proteins is observed. Conclusion The data suggest that the Epc-N domain is a protein-protein

  17. Prediction of protein hydration sites from sequence by modular neural networks

    DEFF Research Database (Denmark)

    Ehrlich, L.; Reczko, M.; Bohr, Henrik;

    1998-01-01

    The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with t...... provide insight into the mutual interdependencies between the location of ordered water sites and the structural and chemical characteristics of the protein residues.......The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with two...... structure and solvent accessibility and, using actual values of these properties, redidue hydration can be predicted to 77% accuracy with a Metthews coefficient of 0.43. However, predicted property data with an accuracy of 60-70% result in less than half the improvement in predictive performance observed...

  18. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder.

    Science.gov (United States)

    Peng, Zhenling; Kurgan, Lukasz

    2015-10-15

    Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein-protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/.

  19. Assessment of protein disorder region predictions in CASP10

    KAUST Repository

    Monastyrskyy, Bohdan

    2013-11-22

    The article presents the assessment of disorder region predictions submitted to CASP10. The evaluation is based on the three measures tested in previous CASPs: (i) balanced accuracy, (ii) the Matthews correlation coefficient for the binary predictions, and (iii) the area under the curve in the receiver operating characteristic (ROC) analysis of predictions using probability annotation. We also performed new analyses such as comparison of the submitted predictions with those obtained with a Naïve disorder prediction method and with predictions from the disorder prediction databases D2P2 and MobiDB. On average, the methods participating in CASP10 demonstrated slightly better performance than those in CASP9.

  20. Convolutional neural network architectures for predicting DNA–protein binding

    Science.gov (United States)

    Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.

    2016-01-01

    Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608

  1. Prediction of winter wheat grain protein content by ASTER image

    Science.gov (United States)

    Huang, Wenjiang; Song, Xiaoyu; Wang, Jihua; Wang, Zhijie; Zhao, Chunjiang

    2008-10-01

    The Advanced technology in space-borne determination of grain crude protein content (CP) by remote sensing can help optimize the strategies for buyers in aiding purchasing decisions, and help farmers to maximize the grain output by adjusting field nitrogen (N) fertilizer inputs. We performed field experiments to study the relationship between grain quality indicators and foliar nitrogen concentration (FNC). FNC at anthesis stage was significantly correlated with CP, while spectral vegetation index was significantly correlated to FNC. Based on the relationships among nitrogen reflectance index (NRI), FNC and CP, a model for CP prediction was developed. NRI was able to evaluate FNC with a higher coefficient of determination of R2=0.7302. The method developed in this study could contribute towards developing optimal procedures for evaluating wheat grain quality by ASTER image at anthesis stage. The RMSE was 0.893 % for ASTER image model, and the R2 was 0.7194. It is thus feasible to forecast grain quality by NRI derived from ASTER image.

  2. Understanding Viral Transmission Behavior via Protein Intrinsic Disorder Prediction: Coronaviruses

    Directory of Open Access Journals (Sweden)

    Gerard Kian-Meng Goh

    2012-01-01

    Full Text Available Besides being a common threat to farm animals and poultry, coronavirus (CoV was responsible for the human severe acute respiratory syndrome (SARS epidemic in 2002–4. However, many aspects of CoV behavior, including modes of its transmission, are yet to be fully understood. We show that the amount and the peculiarities of distribution of the protein intrinsic disorder in the viral shell can be used for the efficient analysis of the behavior and transmission modes of CoV. The proposed model allows categorization of the various CoVs by the peculiarities of disorder distribution in their membrane (M and nucleocapsid (N. This categorization enables quick identification of viruses with similar behaviors in transmission, regardless of genetic proximity. Based on this analysis, an empirical model for predicting the viral transmission behavior is developed. This model is able to explain some behavioral aspects of important coronaviruses that previously were not fully understood. The new predictor can be a useful tool for better epidemiological, clinical, and structural understanding of behavior of both newly emerging viruses and viruses that have been known for a long time. A potentially new vaccine strategy could involve searches for viral strains that are characterized by the evolutionary misfit between the peculiarities of the disorder distribution in their shells and their behavior.

  3. Evolutionary optimization of kernel weights improves protein complex comembership prediction.

    Science.gov (United States)

    Hulsman, Marc; Reinders, Marcel J T; de Ridder, Dick

    2009-01-01

    In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.

  4. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools.

    Science.gov (United States)

    Figueroa, Maximiliano; Sleutel, Mike; Vandevenne, Marylene; Parvizi, Gregory; Attout, Sophie; Jacquin, Olivier; Vandenameele, Julie; Fischer, Axel W; Damblon, Christian; Goormaghtigh, Erik; Valerio-Lepiniec, Marie; Urvoas, Agathe; Durand, Dominique; Pardon, Els; Steyaert, Jan; Minard, Philippe; Maes, Dominique; Meiler, Jens; Matagne, André; Martial, Joseph A; Van de Weerdt, Cécile

    2016-07-01

    Despite impressive successes in protein design, designing a well-folded protein of more 100 amino acids de novo remains a formidable challenge. Exploiting the promising biophysical features of the artificial protein Octarellin V, we improved this protein by directed evolution, thus creating a more stable and soluble protein: Octarellin V.1. Next, we obtained crystals of Octarellin V.1 in complex with crystallization chaperons and determined the tertiary structure. The experimental structure of Octarellin V.1 differs from its in silico design: the (αβα) sandwich architecture bears some resemblance to a Rossman-like fold instead of the intended TIM-barrel fold. This surprising result gave us a unique and attractive opportunity to test the state of the art in protein structure prediction, using this artificial protein free of any natural selection. We tested 13 automated webservers for protein structure prediction and found none of them to predict the actual structure. More than 50% of them predicted a TIM-barrel fold, i.e. the structure we set out to design more than 10years ago. In addition, local software runs that are human operated can sample a structure similar to the experimental one but fail in selecting it, suggesting that the scoring and ranking functions should be improved. We propose that artificial proteins could be used as tools to test the accuracy of protein structure prediction algorithms, because their lack of evolutionary pressure and unique sequences features.

  5. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models

    OpenAIRE

    2006-01-01

    Abstract Background The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other (homologous) proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution...

  6. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    Science.gov (United States)

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Towards A Multi Agent System Based Data Mining For Proteins Prediction And Classification

    Directory of Open Access Journals (Sweden)

    Mohammad Khaled Awwad Al-Maghasbeh

    2015-08-01

    Full Text Available Abstract To understand the structure function paradigm in this paper a new algorithm for proteins classification and prediction is proposed. It uses multi agent system technique that represents a new paradigm for conceptualizing designing and implementing software systems to predict and classify the protein structures. For classifying the proteins support vector machine SVM has been developed to extract feature from the proteins sequences. This paper describes a method for predicting and classifying secondary structure of proteins. Support vector machine SVM modules were developed using multi-agent system principle for predicting the proteins and its function and achieved maximum accuracy specificity sensitivity of 92 94.09and 91.59 respectively. The proposed algorithm provide a good understanding for proteins structure which affect positively on biological science specially on understanding the behavior and the relationships between proteins.

  8. Prediction and Classification of Human G-protein Coupled Receptors Based on Support Vector Machines

    Institute of Scientific and Technical Information of China (English)

    Yun-Fei Wang; Huan Chen; Yan-Hong Zhou

    2005-01-01

    A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRsand non-GPCRs has also been exploited to improve the prediction performance.The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.

  9. Genome-wide computational function prediction of Arabidopsis thaliana proteins by integration of multiple data sources

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Dijk, van A.D.J.; Ham, van R.C.H.J.; Braak, ter C.J.F.

    2011-01-01

    Although Arabidopsis thaliana is the best studied plant species, the biological role of one third of its proteins is still unknown. We developed a probabilistic protein function prediction method that integrates information from sequences, protein-protein interactions and gene expression. The method

  10. Predictions of hot spot residues at protein-protein interfaces using support vector machines.

    Directory of Open Access Journals (Sweden)

    Stefano Lise

    Full Text Available Protein-protein interactions are critically dependent on just a few 'hot spot' residues at the interface. Hot spots make a dominant contribution to the free energy of binding and they can disrupt the interaction if mutated to alanine. Here, we present HSPred, a support vector machine(SVM-based method to predict hot spot residues, given the structure of a complex. HSPred represents an improvement over a previously described approach (Lise et al, BMC Bioinformatics 2009, 10:365. It achieves higher accuracy by treating separately predictions involving either an arginine or a glutamic acid residue. These are the amino acid types on which the original model did not perform well. We have therefore developed two additional SVM classifiers, specifically optimised for these cases. HSPred reaches an overall precision and recall respectively of 61% and 69%, which roughly corresponds to a 10% improvement. An implementation of the described method is available as a web server at http://bioinf.cs.ucl.ac.uk/hspred. It is free to non-commercial users.

  11. Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs

    Directory of Open Access Journals (Sweden)

    Tozeren Aydin

    2009-05-01

    Full Text Available Abstract Background Host protein-protein interaction networks are altered by invading virus proteins, which create new interactions, and modify or destroy others. The resulting network topology favors excessive amounts of virus production in a stressed host cell network. Short linear peptide motifs common to both virus and host provide the basis for host network modification. Methods We focused our host-pathogen study on the binding and competing interactions of HIV-1 and human proteins. We showed that peptide motifs conserved across 70% of HIV-1 subtype B and C samples occurred in similar positions on HIV-1 proteins, and we documented protein domains that interact with these conserved motifs. We predicted which human proteins may be targeted by HIV-1 by taking pairs of human proteins that may interact via a motif conserved in HIV-1 and the corresponding interacting protein domain. Results Our predictions were enriched with host proteins known to interact with HIV-1 proteins ENV, NEF, and TAT (p-value Conclusion A list of host proteins highly enriched with those targeted by HIV-1 proteins can be obtained by searching for host protein motifs along virus protein sequences. The resulting set of host proteins predicted to be targeted by virus proteins will become more accurate with better annotations of motifs and domains. Nevertheless, our study validates the role of linear binding motifs shared by virus and host proteins as an important part of the crosstalk between virus and host.

  12. Documentation of an Imperative To Improve Methods for Predicting Membrane Protein Stability.

    Science.gov (United States)

    Kroncke, Brett M; Duran, Amanda M; Mendenhall, Jeffrey L; Meiler, Jens; Blume, Jeffrey D; Sanders, Charles R

    2016-09-13

    There is a compelling and growing need to accurately predict the impact of amino acid mutations on protein stability for problems in personalized medicine and other applications. Here the ability of 10 computational tools to accurately predict mutation-induced perturbation of folding stability (ΔΔG) for membrane proteins of known structure was assessed. All methods for predicting ΔΔG values performed significantly worse when applied to membrane proteins than when applied to soluble proteins, yielding estimated concordance, Pearson, and Spearman correlation coefficients of thermodynamic folding stability in membrane proteins.

  13. A special purpose silicon compiler for designing supercomputing VLSI systems

    Science.gov (United States)

    Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.

    1991-01-01

    Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.

  14. The TeraGyroid Experiment – Supercomputing 2003

    Directory of Open Access Journals (Sweden)

    R.J. Blake

    2005-01-01

    Full Text Available Amphiphiles are molecules with hydrophobic tails and hydrophilic heads. When dispersed in solvents, they self assemble into complex mesophases including the beautiful cubic gyroid phase. The goal of the TeraGyroid experiment was to study defect pathways and dynamics in these gyroids. The UK's supercomputing and USA's TeraGrid facilities were coupled together, through a dedicated high-speed network, into a single computational Grid for research work that peaked around the Supercomputing 2003 conference. The gyroids were modeled using lattice Boltzmann methods with parameter spaces explored using many 1283 and 3grid point simulations, this data being used to inform the world's largest three-dimensional time dependent simulation with 10243-grid points. The experiment generated some 2 TBytes of useful data. In terms of Grid technology the project demonstrated the migration of simulations (using Globus middleware to and fro across the Atlantic exploiting the availability of resources. Integration of the systems accelerated the time to insight. Distributed visualisation of the output datasets enabled the parameter space of the interactions within the complex fluid to be explored from a number of sites, informed by discourse over the Access Grid. The project was sponsored by EPSRC (UK and NSF (USA with trans-Atlantic optical bandwidth provided by British Telecommunications.

  15. Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents

    Energy Technology Data Exchange (ETDEWEB)

    Sanyal, Jibonananda [ORNL; New, Joshua Ryan [ORNL; Edwards, Richard [ORNL; Parker, Lynne Edwards [ORNL

    2014-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.

  16. Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

    Institute of Scientific and Technical Information of China (English)

    Feng Wang; Can-Qun Yang; Yun-Fei Du; Juan Chen; Hui-Zhan Yi; Wei-Xia Xu

    2011-01-01

    In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.

  17. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    Science.gov (United States)

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-03-13

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  18. Optimizing weights of protein energy function to improve ab initio protein structure prediction

    CERN Document Server

    Wang, Chao; Liu, Juntao; Zhang, Haicang; Ling, Bin; Li, Shuai Cheng; Zheng, Wei-Mou; Bu, Dongbo

    2013-01-01

    Predicting protein 3D structure from amino acid sequence remains as a challenge in the field of computational biology. If protein structure homologues are not found, one has to construct 3D structural conformations from the very beginning by the so-called ab initio approach, using some empirical energy functions. A successful algorithm in this category, Rosetta, creates an ensemble of decoy conformations by assembling selected best short fragments of known protein structures and then recognizes the native state as the highly populated one with a very low energy. Typically, an energy function is a combination of a variety of terms characterizing different structural features, say hydrophobic interactions, van der Waals force, hydrogen bonding, etc. It is critical for an energy function to be capable to distinguish native-like conformations from non-native ones and to drive most initial conformations assembled from fragments to a native-like one in a conformation search process. In this paper we propose a linea...

  19. Open source tool for prediction of genome wide protein-protein interaction network based on ortholog information

    Directory of Open Access Journals (Sweden)

    Pedamallu Chandra Sekhar

    2010-08-01

    Full Text Available Abstract Background Protein-protein interactions are crucially important for cellular processes. Knowledge of these interactions improves the understanding of cell cycle, metabolism, signaling, transport, and secretion. Information about interactions can hint at molecular causes of diseases, and can provide clues for new therapeutic approaches. Several (usually expensive and time consuming experimental methods can probe protein - protein interactions. Data sets, derived from such experiments make the development of prediction methods feasible, and make the creation of protein-protein interaction network predicting tools possible. Methods Here we report the development of a simple open source program module (OpenPPI_predictor that can generate a putative protein-protein interaction network for target genomes. This tool uses the orthologous interactome network data from a related, experimentally studied organism. Results Results from our predictions can be visualized using the Cytoscape visualization software, and can be piped to downstream processing algorithms. We have employed our program to predict protein-protein interaction network for the human parasite roundworm Brugia malayi, using interactome data from the free living nematode Caenorhabditis elegans. Availability The OpenPPI_predictor source code is available from http://tools.neb.com/~posfai/.

  20. Scoring protein relationships in functional interaction networks predicted from sequence data.

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available UNLABELLED: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY: Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.

  1. Comprehensive predictions of target proteins based on protein-chemical interaction using virtual screening and experimental verifications

    Directory of Open Access Journals (Sweden)

    Kobayashi Hiroki

    2012-04-01

    Full Text Available Abstract Background Identification of the target proteins of bioactive compounds is critical for elucidating the mode of action; however, target identification has been difficult in general, mostly due to the low sensitivity of detection using affinity chromatography followed by CBB staining and MS/MS analysis. Results We applied our protocol of predicting target proteins combining in silico screening and experimental verification for incednine, which inhibits the anti-apoptotic function of Bcl-xL by an unknown mechanism. One hundred eighty-two target protein candidates were computationally predicted to bind to incednine by the statistical prediction method, and the predictions were verified by in vitro binding of incednine to seven proteins, whose expression can be confirmed in our cell system. As a result, 40% accuracy of the computational predictions was achieved successfully, and we newly found 3 incednine-binding proteins. Conclusions This study revealed that our proposed protocol of predicting target protein combining in silico screening and experimental verification is useful, and provides new insight into a strategy for identifying target proteins of small molecules.

  2. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  3. Computational Framework for Prediction of Peptide Sequences That May Mediate Multiple Protein Interactions in Cancer-Associated Hub Proteins.

    Directory of Open Access Journals (Sweden)

    Debasree Sarkar

    Full Text Available A considerable proportion of protein-protein interactions (PPIs in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs, often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME were used to identify such peptides in three cancer-associated hub proteins-MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs. These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI

  4. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction

    Directory of Open Access Journals (Sweden)

    Chira Camelia

    2011-07-01

    Full Text Available Abstract Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  5. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    Directory of Open Access Journals (Sweden)

    Chen Ke

    2008-05-01

    Full Text Available Abstract Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is

  6. Automatic discovery of the communication network topology for building a supercomputer model

    Science.gov (United States)

    Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim

    2016-10-01

    The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.

  7. Evaluation of methods for predicting the topology of β-barrel outer membrane proteins and a consensus prediction method

    Directory of Open Access Journals (Sweden)

    Hamodrakas Stavros J

    2005-01-01

    Full Text Available Abstract Background Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs, on Neural Networks (NNs and on Support Vector Machines (SVMs. In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. Results We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies. Conclusions The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at http://bioinformatics.biol.uoa.gr/ConBBPRED.

  8. Prediction of protein hydration sites from sequence by modular neural networks

    DEFF Research Database (Denmark)

    Ehrlich, L.; Reczko, M.; Bohr, Henrik

    1998-01-01

    The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with t...

  9. Update on protein structure prediction: results of the 1995 IRBM workshop

    DEFF Research Database (Denmark)

    Hubbard, Tim; Tramontano, Anna; Hansen, Jan

    1996-01-01

    Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a number...

  10. Predict potential drug targets from the ion channel proteins based on SVM.

    Science.gov (United States)

    Huang, Chen; Zhang, Ruijie; Chen, Zhiqiang; Jiang, Yongshuai; Shang, Zhenwei; Sun, Peng; Zhang, Xuehong; Li, Xia

    2010-02-21

    The identification of molecular targets is a critical step in the drug discovery and development process. Ion channel proteins represent highly attractive drug targets implicated in a diverse range of disorders, in particular in the cardiovascular and central nervous systems. Due to the limits of experimental technique and low-throughput nature of patch-clamp electrophysiology, they remain a target class waiting to be exploited. In our study, we combined three types of protein features, primary sequence, secondary structure and subcellular localization to predict potential drug targets from ion channel proteins applying classical support vector machine (SVM) method. In addition, our prediction comprised two stages. In stage 1, we predicted ion channel target proteins based on whole-genome target protein characteristics. Firstly, we performed feature selection by Mann-Whitney U test, then made predictions to identify potential ion channel targets by SVM and designed a new evaluating indicator Q to prioritize results. In stage 2, we made a prediction based on known ion channel target protein characteristics. Genetic algorithm was used to select features and SVM was used to predict ion channel targets. Then, we integrated results of two stages, and found that five ion channel proteins appeared in both prediction results including CGMP-gated cation channel beta subunit and Gamma-aminobutyric acid receptor subunit alpha-5, etc., and four of which were relative to some nerve diseases. It suggests that these five proteins are potential targets for drug discovery and our prediction strategies are effective.

  11. Update on protein structure prediction: results of the 1995 IRBM workshop

    DEFF Research Database (Denmark)

    Hubbard, Tim; Tramontano, Anna; Hansen, Jan

    1996-01-01

    Computational tools for protein structure prediction are of great interest to molecular, structural and theoretical biologists due to a rapidly increasing number of protein sequences with no known structure. In October 1995, a workshop was held at IRBM to predict as much as possible about a numbe...

  12. How proteins get in touch: interface prediction in the study of biomolecular complexes

    OpenAIRE

    de Vries, S.J.; Bonvin, A.M.J.J.

    2008-01-01

    Protein-protein interface prediction is a booming field, with a substantial growth in the number of new methods being published the last two years. The increasing number of available three-dimensional structures of protein-protein complexes has enabled large-scale statistical analyses of protein interfaces, considering evolutionary, physicochemical and structural properties. Successful combinations of these properties have led to more accurate interface predictors in recent years. In addition...

  13. Computational Framework for Prediction of Peptide Sequences That May Mediate Multiple Protein Interactions in Cancer-Associated Hub Proteins

    Science.gov (United States)

    Sarkar, Debasree; Patra, Piya; Ghosh, Abhirupa; Saha, Sudipto

    2016-01-01

    A considerable proportion of protein-protein interactions (PPIs) in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs), often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME) were used to identify such peptides in three cancer-associated hub proteins—MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs). These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI modulators. PMID

  14. Multi-Instance Multilabel Learning with Weak-Label for Predicting Protein Function in Electricigens

    Directory of Open Access Journals (Sweden)

    Jian-Sheng Wu

    2015-01-01

    Full Text Available Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer from weak-label problem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation.

  15. Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements

    Directory of Open Access Journals (Sweden)

    Sze Sing-Hoi

    2008-07-01

    Full Text Available Abstract Background Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins. Results By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states. Conclusion Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold is available at http://faculty.cs.tamu.edu/shsze/ssfold.

  16. A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

    Directory of Open Access Journals (Sweden)

    Ramanathan Murali

    2008-09-01

    Full Text Available Abstract Background The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the functional analysis because the function of unknown proteins can be postulated on the basis of their interaction evidence with known proteins. The protein-protein interaction data sets have been enriched by high-throughput experimental methods. However, the functional analysis using the interaction data has a limitation in accuracy because of the presence of the false positive data experimentally generated and the interactions that are a lack of functional linkage. Results Protein-protein interaction data can be integrated with the functional knowledge existing in the Gene Ontology (GO database. We apply similarity measures to assess the functional similarity between interacting proteins. We present a probabilistic framework for predicting functions of unknown proteins based on the functional similarity. We use the leave-one-out cross validation to compare the performance. The experimental results demonstrate that our algorithm performs better than other competing methods in terms of prediction accuracy. In particular, it handles the high false positive rates of current interaction data well. Conclusion The experimentally determined protein-protein interactions are erroneous to uncover the functional associations among proteins. The performance of function prediction for uncharacterized proteins can be enhanced by the integration of multiple data sources available.

  17. Comparison of the 3D Protein Structure Prediction Algorithms

    OpenAIRE

    Fadhl M. Al-Akwaa,; Husam Elhetari

    2014-01-01

    Determining protein 3D structure is important to known protein functions. Protein structure could be determined experimentally and computationally. Experimental methods are expensive and time consuming whereas computational methods are the alternative solution. From the other hand, computational methods require strong computing power, assumed models and effective algorithms. In this paper we compare the performance of these algorithms. We find that Genetic Algorithm with impro...

  18. Structural and Function Prediction of Musa acuminata subsp. Malaccensis Protein

    Directory of Open Access Journals (Sweden)

    Anum Munir

    2016-03-01

    Full Text Available Hypothetical proteins (HPs are the proteins whose presence has been anticipated, yet in vivo function has not been built up. Illustrating the structural and functional privileged insights of these HPs might likewise prompt a superior comprehension of the protein-protein associations or networks in diverse types of life. Bananas (Musa acuminata spp., including sweet and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister grouped to the all-around considered Poales, which incorporate oats. Bananas are crucial for nourishment security in numerous tropical and subtropical nations and the most prominent organic product in industrialized nations. In the present study, the hypothetical protein of M. acuminata (Banana was chosen for analysis and modeling by distinctive bioinformatics apparatuses and databases. As indicated by primary and secondary structure analysis, XP_009393594.1 is a stable hydrophobic protein containing a noteworthy extent of α-helices; Homology modeling was done utilizing SWISS-MODEL server where the templates identity with XP_009393594.1 protein was less which demonstrated novelty of our protein. Ab initio strategy was conducted to produce its 3D structure. A few evaluations of quality assessment and validation parameters determined the generated protein model as stable with genuinely great quality. Functional analysis was completed by ProtFun 2.2, and KEGG (KAAS, recommended that the hypothetical protein is a transcription factor with cytoplasmic domain as zinc finger. The protein was observed to be vital for translation process, involved in metabolism, signaling and cellular processes, genetic information processing and Zinc ion binding. It is suggested that further test approval would help to anticipate the structures and functions of other uncharacterized proteins of different plants and living being.

  19. Improving protein protein interaction prediction based on phylogenetic information using a least-squares support vector machine.

    Science.gov (United States)

    Craig, Roger A; Liao, Li

    2007-12-01

    Predicting protein-protein interactions has become a key step of reverse-engineering biological networks to better understand cellular functions. The experimental methods in determining protein-protein interactions are time-consuming and costly, which has motivated vigorous development of computational approaches for predicting protein-protein interactions. A set of recently developed bioinformatics methods utilizes coevolutionary information of the interacting partners (e.g., as exhibited in the form of correlations between distance matrices, where, for each protein, a matrix stores the pairwise distances between the protein and its orthologs in a group of reference genomes). We proposed a novel method to account for the intra-matrix correlations in improving predictive accuracy. The distance matrices for a pair of proteins are transformed and concatenated into a phylogenetic vector. A least-squares support vector machine is trained and tested on pairs of proteins, represented as phylogenetic vectors, whose interactions are known. The intra-matrix correlations are accounted for by introducing a weighted linear kernel, which determines the dot product of two phylogenetic vectors. The performance, measured as receiver operator characteristic (ROC) score in cross-validation experiments, shows significant improvement of our method (ROC score 0.928) over that obtained by Pearson correlations (0.659).

  20. Prediction of GPCR-G Protein Coupling Specificity Using Features of Sequences and Biological Functions

    Institute of Scientific and Technical Information of China (English)

    Toshihide Ono; Haretsugu Hishigaki

    2006-01-01

    Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling specificity between GPCRs and G proteins. This method uses not only GPCR sequences but also the functional knowledge generated by natural language processing, and can achieve 92.2% prediction accuracy by using the C4.5 algorithm.Furthermore, rules related to GPCR-G protein coupling are generated. The combination of sequence analysis and text mining improves the prediction accuracy for GPCR-G protein coupling specificity, and also provides clues for understanding GPCR signaling.

  1. Structure-templated predictions of novel protein interactions from sequence information.

    Directory of Open Access Journals (Sweden)

    Doron Betel

    2007-09-01

    Full Text Available The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain-motif interactions from structural topology (D-MIST for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information.

  2. Structure-templated predictions of novel protein interactions from sequence information.

    Science.gov (United States)

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W V

    2007-09-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain-motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information.

  3. Sequence- and interactome-based prediction of viral protein hotspots targeting host proteins: a case study for HIV Nef.

    Directory of Open Access Journals (Sweden)

    Mahdi Sarmady

    Full Text Available Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk.

  4. How proteins get in touch: Interface prediction and docking of protein complexes

    NARCIS (Netherlands)

    de Vries, S.J.|info:eu-repo/dai/nl/304837717

    2009-01-01

    Proteins are the wheels and mill stones of the complex machinery that underlies human life. In carrying out their functions, proteins work in close association with other proteins, forming protein complexes. A huge network of protein-protein interactions enables the cell to respond quickly to

  5. Computational Approaches for Prediction of Pathogen-Host Protein-Protein Interactions

    Directory of Open Access Journals (Sweden)

    Esmaeil eNourani

    2015-02-01

    Full Text Available Infectious diseases are still among the major and prevalent health problems, mostly because of the drug resistance of novel variants of pathogens. Molecular interactions between pathogens and their hosts are the key part of the infection mechanisms. Novel antimicrobial therapeutics to fight drug resistance is only possible in case of a thorough understanding of pathogen-host interaction (PHI systems. Existing databases, which contain experimentally verified PHI data, suffer from scarcity of reported interactions due to the technically challenging and time consuming process of experiments. This has motivated many researchers to address the problem by proposing computational approaches for analysis and prediction of PHIs. The computational methods primarily utilize sequence information, protein structure and known interactions. Classic machine learning techniques are used when there are sufficient known interactions to be used as training data. On the opposite case, transfer and multi task learning methods are preferred. Here, we present an overview of these computational approaches for PHI prediction, discussing their weakness and abilities, with future directions.

  6. PredPlantPTS1: A Web Server for the Prediction of Plant Peroxisomal Proteins.

    Science.gov (United States)

    Reumann, Sigrun; Buchwald, Daniela; Lingner, Thomas

    2012-01-01

    Prediction of subcellular protein localization is essential to correctly assign unknown proteins to cell organelle-specific protein networks and to ultimately determine protein function. For metazoa, several computational approaches have been developed in the past decade to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1). However, plant-specific PTS1 protein prediction methods have been lacking up to now, and pre-existing methods generally were incapable of correctly predicting low-abundance plant proteins possessing non-canonical PTS1 patterns. Recently, we presented a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes) with high accuracy and which can correctly identify unknown targeting patterns, i.e., novel PTS1 tripeptides and tripeptide residues. Here we describe the first plant-specific web server PredPlantPTS1 for the prediction of plant PTS1 proteins using the above-mentioned underlying models. The server allows the submission of protein sequences from diverse spermatophytes and also performs well for mosses and algae. The easy-to-use web interface provides detailed output in terms of (i) the peroxisomal targeting probability of the given sequence, (ii) information whether a particular non-canonical PTS1 tripeptide has already been experimentally verified, and (iii) the prediction scores for the single C-terminal 14 amino acid residues. The latter allows identification of predicted residues that inhibit peroxisome targeting and which can be optimized using site-directed mutagenesis to raise the peroxisome targeting efficiency. The prediction server will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants. PredPlantPTS1 is freely accessible at ppp.gobics.de.

  7. PredPlantPTS1: a web server for the prediction of plant peroxisomal proteins

    Directory of Open Access Journals (Sweden)

    Sigrun eReumann

    2012-08-01

    Full Text Available Prediction of subcellular protein localization is essential to correctly assign unknown proteins to cell organelle-specific protein networks and to ultimately determine protein function. For metazoa, several computational approaches have been developed in the past decade to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1. However, plant-specific PTS1 protein prediction methods have been lacking up to now, and pre-existing methods generally were incapable of correctly predicting low-abundance plant proteins possessing non-canonical PTS1 patterns. Recently, we presented a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes with high accuracy and which can correctly identify unknown targeting patterns, i.e. novel PTS1 tripeptides and tripeptide residues. Here we describe the first plant-specific web server PredPlantPTS1 for the prediction of plant PTS1 proteins using the above-mentioned underlying models. The server allows the submission of protein sequences from diverse spermatophytes and also performs well for mosses and algae. The easy-to-use web interface provides detailed output in terms of (i the peroxisomal targeting probability of the given sequence, (ii information whether a particular non-canonical PTS1 tripeptide has already been experimentally verified, and (iii the prediction scores for the single C-terminal 14 amino acid residues. The latter allows identification of predicted residues that inhibit peroxisome targeting and which can be optimized using site-directed mutagenesis to raise the peroxisome targeting efficiency. The prediction server will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants. PredPlantPTS1 is freely accessible at ppp.gobics.de.

  8. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions.

    Science.gov (United States)

    Zhou, Hufeng; Gao, Shangzhi; Nguyen, Nam Ninh; Fan, Mengyuan; Jin, Jingjing; Liu, Bing; Zhao, Liang; Xiong, Geng; Tan, Min; Li, Shijun; Wong, Limsoon

    2014-04-08

    H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein domains from both

  9. RaptorX-Property: a web server for protein structure property prediction

    OpenAIRE

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-01-01

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent acce...

  10. StaRProtein, A Web Server for Prediction of the Stability of Repeat Proteins

    Science.gov (United States)

    Xu, Yongtao; Zhou, Xu; Huang, Meilan

    2015-01-01

    Repeat proteins have become increasingly important due to their capability to bind to almost any proteins and the potential as alternative therapy to monoclonal antibodies. In the past decade repeat proteins have been designed to mediate specific protein-protein interactions. The tetratricopeptide and ankyrin repeat proteins are two classes of helical repeat proteins that form different binding pockets to accommodate various partners. It is important to understand the factors that define folding and stability of repeat proteins in order to prioritize the most stable designed repeat proteins to further explore their potential binding affinities. Here we developed distance-dependant statistical potentials using two classes of alpha-helical repeat proteins, tetratricopeptide and ankyrin repeat proteins respectively, and evaluated their efficiency in predicting the stability of repeat proteins. We demonstrated that the repeat-specific statistical potentials based on these two classes of repeat proteins showed paramount accuracy compared with non-specific statistical potentials in: 1) discriminate correct vs. incorrect models 2) rank the stability of designed repeat proteins. In particular, the statistical scores correlate closely with the equilibrium unfolding free energies of repeat proteins and therefore would serve as a novel tool in quickly prioritizing the designed repeat proteins with high stability. StaRProtein web server was developed for predicting the stability of repeat proteins. PMID:25807112

  11. Integration of Titan supercomputer at OLCF with ATLAS Production System

    CERN Document Server

    Barreiro Megino, Fernando Harald; The ATLAS collaboration

    2017-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS ex- periment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this talk we will describe a project aimed at integration of ATLAS Production System with Titan supercom- puter at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modi ed PanDA Pilot framework for ...

  12. Lectures in Supercomputational Neurosciences Dynamics in Complex Brain Networks

    CERN Document Server

    Graben, Peter beim; Thiel, Marco; Kurths, Jürgen

    2008-01-01

    Computational Neuroscience is a burgeoning field of research where only the combined effort of neuroscientists, biologists, psychologists, physicists, mathematicians, computer scientists, engineers and other specialists, e.g. from linguistics and medicine, seem to be able to expand the limits of our knowledge. The present volume is an introduction, largely from the physicists' perspective, to the subject matter with in-depth contributions by system neuroscientists. A conceptual model for complex networks of neurons is introduced that incorporates many important features of the real brain, such as various types of neurons, various brain areas, inhibitory and excitatory coupling and the plasticity of the network. The computational implementation on supercomputers, which is introduced and discussed in detail in this book, will enable the readers to modify and adapt the algortihm for their own research. Worked-out examples of applications are presented for networks of Morris-Lecar neurons to model the cortical co...

  13. Modeling the weather with a data flow supercomputer

    Science.gov (United States)

    Dennis, J. B.; Gao, G.-R.; Todd, K. W.

    1984-01-01

    A static concept of data flow architecture is considered for a supercomputer for weather modeling. The machine level instructions are loaded into specific memory locations before computation is initiated, with only one instruction active at a time. The machine would have processing element, functional unit, array memory, memory routing and distribution routing network elements all contained on microprocessors. A value-oriented algorithmic language (VAL) would be employed and would have, as basic operations, simple functions deriving results from operand values. Details of the machine language format, computations with an array and file processing procedures are outlined. A global weather model is discussed in terms of a static architecture and the potential computation rate is analyzed. The results indicate that detailed design studies are warranted to quantify costs and parts fabrication requirements.

  14. Toward the Graphics Turing Scale on a Blue Gene Supercomputer

    CERN Document Server

    McGuigan, Michael

    2008-01-01

    We investigate raytracing performance that can be achieved on a class of Blue Gene supercomputers. We measure a 822 times speedup over a Pentium IV on a 6144 processor Blue Gene/L. We measure the computational performance as a function of number of processors and problem size to determine the scaling performance of the raytracing calculation on the Blue Gene. We find nontrivial scaling behavior at large number of processors. We discuss applications of this technology to scientific visualization with advanced lighting and high resolution. We utilize three racks of a Blue Gene/L in our calculations which is less than three percent of the the capacity of the worlds largest Blue Gene computer.

  15. Direct numerical simulation of turbulence using GPU accelerated supercomputers

    Science.gov (United States)

    Khajeh-Saeed, Ali; Blair Perot, J.

    2013-02-01

    Direct numerical simulations of turbulence are optimized for up to 192 graphics processors. The results from two large GPU clusters are compared to the performance of corresponding CPU clusters. A number of important algorithm changes are necessary to access the full computational power of graphics processors and these adaptations are discussed. It is shown that the handling of subdomain communication becomes even more critical when using GPU based supercomputers. The potential for overlap of MPI communication with GPU computation is analyzed and then optimized. Detailed timings reveal that the internal calculations are now so efficient that the operations related to MPI communication are the primary scaling bottleneck at all but the very largest problem sizes that can fit on the hardware. This work gives a glimpse of the CFD performance issues will dominate many hardware platform in the near future.

  16. Internal computational fluid mechanics on supercomputers for aerospace propulsion systems

    Science.gov (United States)

    Andersen, Bernhard H.; Benson, Thomas J.

    1987-01-01

    The accurate calculation of three-dimensional internal flowfields for application towards aerospace propulsion systems requires computational resources available only on supercomputers. A survey is presented of three-dimensional calculations of hypersonic, transonic, and subsonic internal flowfields conducted at the Lewis Research Center. A steady state Parabolized Navier-Stokes (PNS) solution of flow in a Mach 5.0, mixed compression inlet, a Navier-Stokes solution of flow in the vicinity of a terminal shock, and a PNS solution of flow in a diffusing S-bend with vortex generators are presented and discussed. All of these calculations were performed on either the NAS Cray-2 or the Lewis Research Center Cray XMP.

  17. Research on Several Prediction Methods of Membrane Protein Structure and Topology

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Since present prediction methods of membrane protein structureand topology made use of mixed data sets both from experiments and prediction as training and test sets, the reliability and accuracy of their prediction is still under debate. To benchmark the performance of these methods, this commentary uses a test set of membrane proteins created by European Bioinformatics Institute with either available 3-D structure or experimentally confirmed transmembrane regions. Then the prediction results are compared and the problems existing in these methods and important features for successful prediction are pointed out, which may help users to choose a more reliable prediction from different results. Based upon recent advances in membrane protein, possible means to improve topology prediction accuracy are discussed.

  18. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2006-06-01

    Full Text Available Abstract Background The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. Results A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. Conclusion This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  19. Solving global shallow water equations on heterogeneous supercomputers.

    Science.gov (United States)

    Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen

    2017-01-01

    The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved.

  20. Virtualizing Super-Computation On-Board Uas

    Science.gov (United States)

    Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.

    2015-04-01

    Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.

  1. A combined approach for genome wide protein function annotation/prediction

    DEFF Research Database (Denmark)

    Benso, Alfredo; Di Carlo, Stefano; Ur Rehman, Hafeez

    2013-01-01

    proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein...

  2. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    Directory of Open Access Journals (Sweden)

    Vijaykumar Yogesh Muley

    Full Text Available BACKGROUND: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. METHODS: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. CONCLUSIONS: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling

  3. The topology of the bacterial co-conserved protein network and its implications for predicting protein function

    Directory of Open Access Journals (Sweden)

    Leach Sonia M

    2008-06-01

    Full Text Available Abstract Background Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in E. coli K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins. Results Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the

  4. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  5. Struct2Net: a web service to predict protein-protein interactions using a structure-based approach.

    Science.gov (United States)

    Singh, Rohit; Park, Daniel; Xu, Jinbo; Hosur, Raghavendra; Berger, Bonnie

    2010-07-01

    Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach. Prediction of protein-protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome remains inadequate. We believe that Struct2Net is the first community-wide resource to provide structure-based PPI predictions that go beyond homology modeling. Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). Our structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried. The web service allows multiple querying options, aimed at maximizing flexibility. For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously. For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed. The web service is freely available at http://struct2net.csail.mit.edu.

  6. Predicting Human Protein Subcellular Locations by the Ensemble of Multiple Predictors via Protein-Protein Interaction Network with Edge Clustering Coefficients

    Science.gov (United States)

    Du, Pufeng; Wang, Lusheng

    2014-01-01

    One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods. PMID:24466278

  7. QSARs for Plasma Protein Binding: Source Data and Predictions

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset has all of the information used to create and evaluate 3 independent QSAR models for the fraction of a chemical unbound by plasma protein (Fub) for...

  8. Is protein structure prediction still an enigma? | Sobha | African ...

    African Journals Online (AJOL)

    They perform a wide array of functions including catalysis, structure formation, transport, body defense, etc. Understanding the functions of proteins is a fundamental problem in the discovery of ... This review comprehends the various recent

  9. Protein Function Prediction Based on Active Semi-sup ervised Learning

    Institute of Scientific and Technical Information of China (English)

    WANG Xuesong,CHENG Yuhu; LI Lijing

    2016-01-01

    In our study, the active learning and semi-supervised learning methods are comprehensively used for label delivery of proteins with known functions in Protein-protein interaction (PPI) network so as to predict the func-tions of unknown proteins. Because the real PPI network is generally observed with overlapping protein nodes with multiple functions, the mislabeling of overlapping protein may result in accumulation of prediction errors. For this reason, prior to executing the label delivery process of semi-supervised learning, the adjacency matrix is used to detect overlapping proteins. As the topological structure description of interactive relation between proteins, PPI network is observed with party hub protein nodes that play an important role, in co-expression with its neighborhood. Therefore, to reduce the manual labeling cost, party hub proteins most beneficial for improvement of prediction ac-curacy are selected for class labeling and the labeled party hub proteins are added into the labeled sample set for semi-supervised learning later. As the experimental results of real yeast PPI network show, the proposed algorithm can achieve high prediction accuracy with few labeled samples.

  10. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction.

    Science.gov (United States)

    Browne, Fiona; Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2009-01-26

    Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases) and non-interacting proteins (negative cases) are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. GRIP (Gold Reference dataset constructor from Information on Protein complexes) is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  11. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction

    Directory of Open Access Journals (Sweden)

    Zheng Huiru

    2009-01-01

    Full Text Available Abstract Background Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases and non-interacting proteins (negative cases are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. Results GRIP (Gold Reference dataset constructor from Information on Protein complexes is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. Conclusion GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  12. Calculation of Free Energy Landscape in Multi-Dimensions with Hamiltonian-Exchange Umbrella Sampling on Petascale Supercomputer.

    Science.gov (United States)

    Jiang, Wei; Luo, Yun; Maragliano, Luca; Roux, Benoît

    2012-11-13

    An extremely scalable computational strategy is described for calculations of the potential of mean force (PMF) in multidimensions on massively distributed supercomputers. The approach involves coupling thousands of umbrella sampling (US) simulation windows distributed to cover the space of order parameters with a Hamiltonian molecular dynamics replica-exchange (H-REMD) algorithm to enhance the sampling of each simulation. In the present application, US/H-REMD is carried out in a two-dimensional (2D) space and exchanges are attempted alternatively along the two axes corresponding to the two order parameters. The US/H-REMD strategy is implemented on the basis of parallel/parallel multiple copy protocol at the MPI level, and therefore can fully exploit computing power of large-scale supercomputers. Here the novel technique is illustrated using the leadership supercomputer IBM Blue Gene/P with an application to a typical biomolecular calculation of general interest, namely the binding of calcium ions to the small protein Calbindin D9k. The free energy landscape associated with two order parameters, the distance between the ion and its binding pocket and the root-mean-square deviation (rmsd) of the binding pocket relative the crystal structure, was calculated using the US/H-REMD method. The results are then used to estimate the absolute binding free energy of calcium ion to Calbindin D9k. The tests demonstrate that the 2D US/H-REMD scheme greatly accelerates the configurational sampling of the binding pocket, thereby improving the convergence of the potential of mean force calculation.

  13. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  14. Prediction of β-barrel membrane proteins by searching for restricted domains

    Directory of Open Access Journals (Sweden)

    Schleiff Enrico

    2005-10-01

    Full Text Available Abstract Background The identification of β-barrel membrane proteins out of a genomic/proteomic background is one of the rapidly developing fields in bioinformatics. Our main goal is the prediction of such proteins in genome/proteome wide analyses. Results For the prediction of β-barrel membrane proteins within prokaryotic proteomes a set of parameters was developed. We have focused on a procedure with a low false positive rate beside a procedure with lowest false prediction rate to obtain a high certainty for the predicted sequences. We demonstrate that the discrimination between β-barrel membrane proteins and other proteins is improved by analyzing a length limited region. The developed set of parameters is applied to the proteome of E. coli and the results are compared to four other described procedures. Conclusion Analyzing the β-barrel membrane proteins revealed the presence of a defined membrane inserted β-barrel region. This information can now be used to refine other prediction programs as well. So far, all tested programs fail to predict outer membrane proteins in the proteome of the prokaryote E. coli with high reliability. However, the reliability of the prediction is improved significantly by a combinatory approach of several programs. The consequences and usability of the developed scores are discussed.

  15. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine

    Directory of Open Access Journals (Sweden)

    Ravindra Kumar

    2017-09-01

    Full Text Available Background The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum. Methods This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins. Results In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83

  16. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.

    Directory of Open Access Journals (Sweden)

    Carlo Baldassi

    Full Text Available In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids, exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i the prediction of residue-residue contacts in proteins, and (ii the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.

  17. PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs

    Directory of Open Access Journals (Sweden)

    Greenblatt Jack

    2006-07-01

    Full Text Available Abstract Background Identification of protein interaction networks has received considerable attention in the post-genomic era. The currently available biochemical approaches used to detect protein-protein interactions are all time and labour intensive. Consequently there is a growing need for the development of computational tools that are capable of effectively identifying such interactions. Results Here we explain the development and implementation of a novel Protein-Protein Interaction Prediction Engine termed PIPE. This tool is capable of predicting protein-protein interactions for any target pair of the yeast Saccharomyces cerevisiae proteins from their primary structure and without the need for any additional information or predictions about the proteins. PIPE showed a sensitivity of 61% for detecting any yeast protein interaction with 89% specificity and an overall accuracy of 75%. This rate of success is comparable to those associated with the most commonly used biochemical techniques. Using PIPE, we identified a novel interaction between YGL227W (vid30 and YMR135C (gid8 yeast proteins. This lead us to the identification of a novel yeast complex that here we term vid30 complex (vid30c. The observed interaction was confirmed by tandem affinity purification (TAP tag, verifying the ability of PIPE to predict novel protein-protein interactions. We then used PIPE analysis to investigate the internal architecture of vid30c. It appeared from PIPE analysis that vid30c may consist of a core and a secondary component. Generation of yeast gene deletion strains combined with TAP tagging analysis indicated that the deletion of a member of the core component interfered with the formation of vid30c, however, deletion of a member of the secondary component had little effect (if any on the formation of vid30c. Also, PIPE can be used to analyse yeast proteins for which TAP tagging fails, thereby allowing us to predict protein interactions that are not

  18. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    Directory of Open Access Journals (Sweden)

    Yasser El-Manzalawy

    Full Text Available A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles. Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein

  19. A guideline to proteome-wide α-helical membrane protein topology predictions.

    Science.gov (United States)

    Tsirigos, Konstantinos D; Hennerdal, Aron; Käll, Lukas; Elofsson, Arne

    2012-08-01

    For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.

  20. Training set reduction methods for protein secondary structure prediction in single-sequence condition.

    Science.gov (United States)

    Aydin, Zafer; Altunbasak, Yucel; Pakatci, Isa Kemal; Erdogan, Hakan

    2007-01-01

    Orphan proteins are characterized by the lack of significant sequence similarity to database proteins. To infer the functional properties of the orphans, more elaborate techniques that utilize structural information are required. In this regard, the protein structure prediction gains considerable importance. Secondary structure prediction algorithms designed for orphan proteins (also known as single-sequence algorithms) cannot utilize multiple alignments or alignment profiles, which are derived from similar proteins. This is a limiting factor for the prediction accuracy. One way to improve the performance of a single-sequence algorithm is to perform re-training. In this approach, first, the models used by the algorithm are trained by a representative set of proteins and a secondary structure prediction is computed. Then, using a distance measure, the original training set is refined by removing proteins that are dissimilar to the given protein. This step is followed by the re-estimation of the model parameters and the prediction of the secondary structure. In this paper, we compare training set reduction methods that are used to re-train the hidden semi-Markov models employed by the IPSSP algorithm [1]. We found that the composition based reduction method has the highest performance compared to the alignment based and the Chou-Fasman based reduction methods. In addition, threshold-based reduction performed better than the reduction technique that selects the first 80% of the dataset proteins.

  1. Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams

    Directory of Open Access Journals (Sweden)

    Fernandez-Fuentes Narcis

    2011-08-01

    Full Text Available Abstract Background Protein binding site prediction by computational means can yield valuable information that complements and guides experimental approaches to determine the structure of protein complexes. Predictions become even more relevant and timely given the current resolution of protein interaction maps, where there is a very large and still expanding gap between the available information on: (i which proteins interact and (ii how proteins interact. Proteins interact through exposed residues that present differential physicochemical properties, and these can be exploited to identify protein interfaces. Results Here we present VORFFIP, a novel method for protein binding site prediction. The method makes use of broad set of heterogeneous data and defined of residue environment, by means of Voronoi Diagrams that are integrated by a two-steps Random Forest ensemble classifier. Four sets of residue features (structural, energy terms, sequence conservation, and crystallographic B-factors used in different combinations together with three definitions of residue environment (Voronoi Diagrams, sequence sliding window, and Euclidian distance have been analyzed in order to maximize the performance of the method. Conclusions The integration of different forms information such as structural features, energy term, evolutionary conservation and crystallographic B-factors, improves the performance of binding site prediction. Including the information of neighbouring residues also improves the prediction of protein interfaces. Among the different approaches that can be used to define the environment of exposed residues, Voronoi Diagrams provide the most accurate description. Finally, VORFFIP compares favourably to other methods reported in the recent literature.

  2. Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes

    Energy Technology Data Exchange (ETDEWEB)

    Dubois, David H [Los Alamos National Laboratory; Dubois, Andrew J [Los Alamos National Laboratory; Boorman, Thomas M [Los Alamos National Laboratory; Connor, Carolyn M [Los Alamos National Laboratory

    2009-03-10

    This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.

  3. Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes

    Energy Technology Data Exchange (ETDEWEB)

    Dubois, David H [Los Alamos National Laboratory; Dubois, Andrew J [Los Alamos National Laboratory; Boorman, Thomas M [Los Alamos National Laboratory; Connor, Carolyn M [Los Alamos National Laboratory

    2009-01-01

    This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.

  4. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    Directory of Open Access Journals (Sweden)

    A. Gunzinger

    1996-01-01

    Full Text Available At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of 400, and price is reduced by a factor of 100. Software development is a key issue of such parallel systems. This article focuses on the programming environment of the MUSIC system and on its applications.

  5. Recent Progress in Predicting Posttranslational Modification Sites in Proteins.

    Science.gov (United States)

    Xu, Yan; Chou, Kuo-Chen

    2016-01-01

    The posttranslational modification or PTM is a later but subtle step in protein biosynthesis via which to change the properties of a protein by adding a modified group to its one or more amino acid residues. PTMs are responsible for many significant biological processes, and meanwhile for many major diseases as well, such as cancer. Facing the avalanche of biological sequences generated in the post-genomic age, it is important for both basic research and drug development to timely identify the PTM sites in proteins. This Review is devoted to summarize the recent progresses in this area, with a focus on those predictors, which were developed based on the pseudo amino acid composition or PseAAC approach, and for which a publicly accessible web-server has been established. Meanwhile, the future challenge in this area has also been briefly addressed.

  6. A novel domain-based method for predicting the functional classes of proteins

    Institute of Scientific and Technical Information of China (English)

    YU Xiaojing; LIN Jiancheng; SHI Tieliu; LI Yixue

    2004-01-01

    Prediction of protein functions from known genomic sequences is an important mission of bioinformatics. One approach is to classify proteins into functional categories. We have therefore developed a method based on protein domain composition and the maximum likelihood estimation (MLE) algorithm to classify proteins according to functions. Using the Saccharomyces cerevisiae genome, we compared the effectiveness of the MLE approach with that of an intuitive and simple method. The MLE method outperformed the simple method, achieving an estimated specificity of 75.45% and an estimated sensitivity of 40.26%. These results indicate that domain is an important feature of proteins and is closely related to protein function.

  7. Predicting protein interactions via parsimonious network history inference.

    Science.gov (United States)

    Patro, Rob; Kingsford, Carl

    2013-07-01

    Reconstruction of the network-level evolutionary history of protein-protein interactions provides a principled way to relate interactions in several present-day networks. Here, we present a general framework for inferring such histories and demonstrate how it can be used to determine what interactions existed in the ancestral networks, which present-day interactions we might expect to exist based on evolutionary evidence and what information extant networks contain about the order of ancestral protein duplications. Our framework characterizes the space of likely parsimonious network histories. It results in a structure that can be used to find probabilities for a number of events associated with the histories. The framework is based on a directed hypergraph formulation of dynamic programming that we extend to enumerate many optimal and near-optimal solutions. The algorithm is applied to reconstructing ancestral interactions among bZIP transcription factors, imputing missing present-day interactions among the bZIPs and among proteins from five herpes viruses, and determining relative protein duplication order in the bZIP family. Our approach more accurately reconstructs ancestral interactions than existing approaches. In cross-validation tests, we find that our approach ranks the majority of the left-out present-day interactions among the top 2 and 17% of possible edges for the bZIP and herpes networks, respectively, making it a competitive approach for edge imputation. It also estimates relative bZIP protein duplication orders, using only interaction data and phylogenetic tree topology, which are significantly correlated with sequence-based estimates. The algorithm is implemented in C++, is open source and is available at http://www.cs.cmu.edu/ckingsf/software/parana2. Supplementary data are available at Bioinformatics online.

  8. Prediction of protein-protein interactions using chaos game representation and wavelet transform via the random forest algorithm.

    Science.gov (United States)

    Jia, J H; Liu, Z; Chen, X; Xiao, X; Liu, B X

    2015-10-02

    Studying the network of protein-protein interactions (PPIs) will provide valuable insights into the inner workings of cells. It is vitally important to develop an automated, high-throughput tool that efficiently predicts protein-protein interactions. This study proposes a new model for PPI prediction based on the concept of chaos game representation and the wavelet transform, which means that a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers. The advantage of using chaos game representation and the wavelet transform to formulate the protein sequence is that it can more effectively reflect its overall sequence-order characteristics than the conventional correlation factors. Using such a formulation frame to represent the protein sequences means that the random forest algorithm can be used to conduct the prediction. The results for a large-scale independent test dataset show that the proposed model can achieve an excellent performance with an accuracy value of about 0.86 and a geometry mean value of about 0.85. The model is therefore a useful supplementary tool for PPI predictions. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI.

  9. Prediction of protein composition of individual cow milk using mid-infrared spectroscopy

    OpenAIRE

    Paolo Carnier; Guido Di Martino; Alessio Cecchinato; Valentina Bonfatti; Massimo De Marchi

    2010-01-01

    This study investigated the application of mid-infrared spectroscopy for the prediction of protein composition in individual milk samples (n=1,336) of Simmental cows. Protein fractions were quantified by RP-HPLC and MIR data were recorded over the spectral range from 4,000 to 900 cm-1. Models were developed by partial least squares regression using untreated spectra. The most successful predictions were for protein, casein, αS1-casein, whey protein, and β-lactoglobulin contents. Th...

  10. Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies

    Directory of Open Access Journals (Sweden)

    Jennifer D. Atkins

    2015-08-01

    Full Text Available The role and function of a given protein is dependent on its structure. In recent years, however, numerous studies have highlighted the importance of unstructured, or disordered regions in governing a protein’s function. Disordered proteins have been found to play important roles in pivotal cellular functions, such as DNA binding and signalling cascades. Studying proteins with extended disordered regions is often problematic as they can be challenging to express, purify and crystallise. This means that interpretable experimental data on protein disorder is hard to generate. As a result, predictive computational tools have been developed with the aim of predicting the level and location of disorder within a protein. Currently, over 60 prediction servers exist, utilizing different methods for classifying disorder and different training sets. Here we review several good performing, publicly available prediction methods, comparing their application and discussing how disorder prediction servers can be used to aid the experimental solution of protein structure. The use of disorder prediction methods allows us to adopt a more targeted approach to experimental studies by accurately identifying the boundaries of ordered protein domains so that they may be investigated separately, thereby increasing the likelihood of their successful experimental solution.

  11. Predicting DNA-binding sites of proteins based on sequential and 3D structural information.

    Science.gov (United States)

    Li, Bi-Qing; Feng, Kai-Yan; Ding, Juan; Cai, Yu-Dong

    2014-06-01

    Protein-DNA interactions play important roles in many biological processes. To understand the molecular mechanisms of protein-DNA interaction, it is necessary to identify the DNA-binding sites in DNA-binding proteins. In the last decade, computational approaches have been developed to predict protein-DNA-binding sites based solely on protein sequences. In this study, we developed a novel predictor based on support vector machine algorithm coupled with the maximum relevance minimum redundancy method followed by incremental feature selection. We incorporated not only features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure, solvent accessibility, but also five three-dimensional (3D) structural features calculated from PDB data to predict the protein-DNA interaction sites. Feature analysis showed that 3D structural features indeed contributed to the prediction of DNA-binding site and it was demonstrated that the prediction performance was better with 3D structural features than without them. It was also shown via analysis of features from each site that the features of DNA-binding site itself contribute the most to the prediction. Our prediction method may become a useful tool for identifying the DNA-binding sites and the feature analysis described in this paper may provide useful insights for in-depth investigations into the mechanisms of protein-DNA interaction.

  12. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    Science.gov (United States)

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  13. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

    Science.gov (United States)

    Roche, Daniel B; Buenavista, Maria T; Tetchner, Stuart J; McGuffin, Liam J

    2011-07-01

    The IntFOLD server is a novel independent server that integrates several cutting edge methods for the prediction of structure and function from sequence. Our guiding principles behind the server development were as follows: (i) to provide a simple unified resource that makes our prediction software accessible to all and (ii) to produce integrated output for predictions that can be easily interpreted. The output for predictions is presented as a simple table that summarizes all results graphically via plots and annotated 3D models. The raw machine readable data files for each set of predictions are also provided for developers, which comply with the Critical Assessment of Methods for Protein Structure Prediction (CASP) data standards. The server comprises an integrated suite of five novel methods: nFOLD4, for tertiary structure prediction; ModFOLD 3.0, for model quality assessment; DISOclust 2.0, for disorder prediction; DomFOLD 2.0 for domain prediction; and FunFOLD 1.0, for ligand binding site prediction. Predictions from the IntFOLD server were found to be competitive in several categories in the recent CASP9 experiment. The IntFOLD server is available at the following web site: http://www.reading.ac.uk/bioinf/IntFOLD/.

  14. Dynamic modularity in protein interaction networks predicts breast cancer outcome

    DEFF Research Database (Denmark)

    Taylor, Ian W; Linding, Rune; Warde-Farley, David

    2009-01-01

    Changes in the biochemical wiring of oncogenic cells drives phenotypic transformations that directly affect disease outcome. Here we examine the dynamic structure of the human protein interaction network (interactome) to determine whether changes in the organization of the interactome can be used...

  15. Protein location prediction using atomic composition and global features of the amino acid sequence

    Energy Technology Data Exchange (ETDEWEB)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com [Centre for Bioinformatics, University of Kerala, Kariyavattom Campus, Thiruvananthapuram, Kerala (India); Nair, Achuthsankar S. [Centre for Bioinformatics, University of Kerala, Kariyavattom Campus, Thiruvananthapuram, Kerala (India)

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  16. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps.

    Science.gov (United States)

    Nabieva, Elena; Jim, Kam; Agarwal, Amit; Chazelle, Bernard; Singh, Mona

    2005-06-01

    Determining protein function is one of the most important problems in the post-genomic era. For the typical proteome, there are no functional annotations for one-third or more of its proteins. Recent high-throughput experiments have determined proteome-scale protein physical interaction maps for several organisms. These physical interactions are complemented by an abundance of data about other types of functional relationships between proteins, including genetic interactions, knowledge about co-expression and shared evolutionary history. Taken together, these pairwise linkages can be used to build whole-proteome protein interaction maps. We develop a network-flow based algorithm, FunctionalFlow, that exploits the underlying structure of protein interaction maps in order to predict protein function. In cross-validation testing on the yeast proteome, we show that FunctionalFlow has improved performance over previous methods in predicting the function of proteins with few (or no) annotated protein neighbors. By comparing several methods that use protein interaction maps to predict protein function, we demonstrate that FunctionalFlow performs well because it takes advantage of both network topology and some measure of locality. Finally, we show that performance can be improved substantially as we consider multiple data sources and use them to create weighted interaction networks. http://compbio.cs.princeton.edu/function

  17. A new method for predicting essential proteins based on dynamic network topology and complex information.

    Science.gov (United States)

    Luo, Jiawei; Kuang, Ling

    2014-10-01

    Predicting essential proteins is highly significant because organisms can not survive or develop even if only one of these proteins is missing. Improvements in high-throughput technologies have resulted in a large number of available protein-protein interactions. By taking advantage of these interaction data, researchers have proposed many computational methods to identify essential proteins at the network level. Most of these approaches focus on the topology of a static protein interaction network. However, the protein interaction network changes with time and condition. This important inherent dynamics of the protein interaction network is overlooked by previous methods. In this paper, we introduce a new method named CDLC to predict essential proteins by integrating dynamic local average connectivity and in-degree of proteins in complexes. CDLC is applied to the protein interaction network of Saccharomyces cerevisiae. The results show that CDLC outperforms five other methods (Degree Centrality (DC), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC), PeC and Co-Expression Weighted by Clustering coefficient (CoEWC)). In particular, CDLC could improve the prediction precision by more than 45% compared with DC methods. CDLC is also compared with the latest algorithm CEPPK, and a higher precision is achieved by CDLC. CDLC is available as Supplementary materials. The default settings of active threshold and alpha-parameter are 0.8 and 0.1, respectively.

  18. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions.

    Science.gov (United States)

    Gao, Mu; Skolnick, Jeffrey

    2008-07-01

    The structures of DNA-protein complexes have illuminated the diversity of DNA-protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA-protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on approximately 4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to approximately 1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.

  19. Predicting the Coupling Specificity of G-protein Coupled Receptors to G-proteins by Support Vector Machines

    Institute of Scientific and Technical Information of China (English)

    Cui-Ping Guan; Zhen-Ran Jiang; Yan-Hong Zhou

    2005-01-01

    G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-proteins using support vector machines. The testing results show that this method could obtain better prediction accuracy.

  20. Knowledge base and neural network approach for protein secondary structure prediction.

    Science.gov (United States)

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods.

  1. Geary autocorrelation and DCCA coefficient: Application to predict apoptosis protein subcellular localization via PSSM

    Science.gov (United States)

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2017-02-01

    Apoptosis is a fundamental process controlling normal tissue homeostasis by regulating a balance between cell proliferation and death. Predicting subcellular location of apoptosis proteins is very helpful for understanding its mechanism of programmed cell death. Prediction of apoptosis protein subcellular location is still a challenging and complicated task, and existing methods mainly based on protein primary sequences. In this paper, we propose a new position-specific scoring matrix (PSSM)-based model by using Geary autocorrelation function and detrended cross-correlation coefficient (DCCA coefficient). Then a 270-dimensional (270D) feature vector is constructed on three widely used datasets: ZD98, ZW225 and CL317, and support vector machine is adopted as classifier. The overall prediction accuracies are significantly improved by rigorous jackknife test. The results show that our model offers a reliable and effective PSSM-based tool for prediction of apoptosis protein subcellular localization.

  2. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data.

    Science.gov (United States)

    Bywater, Robert P

    2016-01-01

    Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.

  3. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  4. Using maximum entropy model to predict protein secondary structure with single sequence.

    Science.gov (United States)

    Ding, Yong-Sheng; Zhang, Tong-Liang; Gu, Quan; Zhao, Pei-Ying; Chou, Kuo-Chen

    2009-01-01

    Prediction of protein secondary structure is somewhat reminiscent of the efforts by many previous investigators but yet still worthy of revisiting it owing to its importance in protein science. Several studies indicate that the knowledge of protein structural classes can provide useful information towards the determination of protein secondary structure. Particularly, the performance of prediction algorithms developed recently have been improved rapidly by incorporating homologous multiple sequences alignment information. Unfortunately, this kind of information is not available for a significant amount of proteins. In view of this, it is necessary to develop the method based on the query protein sequence alone, the so-called single-sequence method. Here, we propose a novel single-sequence approach which is featured by that various kinds of contextual information are taken into account, and that a maximum entropy model classifier is used as the prediction engine. As a demonstration, cross-validation tests have been performed by the new method on datasets containing proteins from different structural classes, and the results thus obtained are quite promising, indicating that the new method may become an useful tool in protein science or at least play a complementary role to the existing protein secondary structure prediction methods.

  5. Investigation and prediction of protein precipitation by polyethylene glycol using quantitative structure-activity relationship models.

    Science.gov (United States)

    Hämmerling, Frank; Ladd Effio, Christopher; Andris, Sebastian; Kittelmann, Jörg; Hubbuch, Jürgen

    2017-01-10

    Precipitation of proteins is considered to be an effective purification method for proteins and has proven its potential to replace costly chromatography processes. Besides salts and polyelectrolytes, polymers, such as polyethylene glycol (PEG), are commonly used for precipitation applications under mild conditions. Process development, however, for protein precipitation steps still is based mainly on heuristic approaches and high-throughput experimentation due to a lack of understanding of the underlying mechanisms. In this work we apply quantitative structure-activity relationships (QSARs) to model two parameters, the discontinuity point m* and the β-value, that describe the complete precipitation curve of a protein under defined conditions. The generated QSAR models are sensitive to the protein type, pH, and ionic strength. It was found that the discontinuity point m* is mainly dependent on protein molecular structure properties and electrostatic surface properties, whereas the β-value is influenced by the variance in electrostatics and hydrophobicity on the protein surface. The models for m* and the β-value exhibit a good correlation between observed and predicted data with a coefficient of determination of R(2)≥0.90 and, hence, are able to accurately predict precipitation curves for proteins. The predictive capabilities were demonstrated for a set of combinations of protein type, pH, and ionic strength not included in the generation of the models and good agreement between predicted and experimental data was achieved.

  6. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    Science.gov (United States)

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.

  7. Orthology prediction methods: a quality assessment using curated protein families.

    Science.gov (United States)

    Trachana, Kalliopi; Larsson, Tomas A; Powell, Sean; Chen, Wei-Hua; Doerks, Tobias; Muller, Jean; Bork, Peer

    2011-10-01

    The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community. Copyright © 2011 WILEY Periodicals, Inc.

  8. Community-wide Evaluation of Methods for Predicting the Effect of Mutations on Protein-Protein Interactions

    Science.gov (United States)

    Moretti, Rocco; Fleishman, Sarel J.; Agius, Rudi; Torchala, Mieczyslaw; Bates, Paul A.; Kastritis, Panagiotis L.; Rodrigues, João P. G. L. M.; Trellet, Mikaël; Bonvin, Alexandre M. J. J.; Cui, Meng; Rooman, Marianne; Gillis, Dimitri; Dehouck, Yves; Moal, Iain; Romero-Durana, Miguel; Perez-Cano, Laura; Pallara, Chiara; Jimenez, Brian; Fernandez-Recio, Juan; Flores, Samuel; Pacella, Michael; Kilambi, Krishna Praneeth; Gray, Jeffrey J.; Popov, Petr; Grudinin, Sergei; Esquivel-Rodríguez, Juan; Kihara, Daisuke; Zhao, Nan; Korkin, Dmitry; Zhu, Xiaolei; Demerdash, Omar N. A.; Mitchell, Julie C.; Kanamori, Eiji; Tsuchiya, Yuko; Nakamura, Haruki; Lee, Hasup; Park, Hahnbeom; Seok, Chaok; Sarmiento, Jamica; Liang, Shide; Teraguchi, Shusuke; Standley, Daron M.; Shimoyama, Hiromitsu; Terashi, Genki; Takeda-Shitaka, Mayuko; Iwadate, Mitsuo; Umeyama, Hideaki; Beglov, Dmitri; Hall, David R.; Kozakov, Dima; Vajda, Sandor; Pierce, Brian G.; Hwang, Howook; Vreven, Thom; Weng, Zhiping; Huang, Yangyu; Li, Haotian; Yang, Xiufeng; Ji, Xiaofeng; Liu, Shiyong; Xiao, Yi; Zacharias, Martin; Qin, Sanbo; Zhou, Huan-Xiang; Huang, Sheng-You; Zou, Xiaoqin; Velankar, Sameer; Janin, Joël; Wodak, Shoshana J.; Baker, David

    2014-01-01

    Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side chain sampling and backbone relaxation, and evaluated packing, electrostatic and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of methodological improvement. PMID:23843247

  9. Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks

    DEFF Research Database (Denmark)

    Helles, Glennie; Fonseca, Rasmus

    2009-01-01

    Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments...... make up nearly 40\\% of proteins, and they do not have any apparent recurrent patterns which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do...... seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been...

  10. PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

    Science.gov (United States)

    2009-07-01

    Clusters 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7...Fold recognition/threading [34] [60] PSIPRED Jones Secondary structure prediction [21] [61] Rosetta Baker Ab initio folder [41] [62] SCOP/ ASTRAL Chothia...the folds of the models that result from the Rosetta code, the structures of the top few models are compared against ASTRAL PDB-style coordinates of

  11. Consensus virtual screening approaches to predict protein ligands.

    Science.gov (United States)

    Kukol, Andreas

    2011-09-01

    In order to exploit the advantages of receptor-based virtual screening, namely time/cost saving and specificity, it is important to rely on algorithms that predict a high number of active ligands at the top ranks of a small molecule database. Towards that goal consensus methods combining the results of several docking algorithms were developed and compared against the individual algorithms. Furthermore, a recently proposed rescoring method based on drug efficiency indices was evaluated. Among AutoDock Vina 1.0, AutoDock 4.2 and GemDock, AutoDock Vina was the best performing single method in predicting high affinity ligands from a database of known ligands and decoys. The rescoring of predicted binding energies with the water/octanol partition coefficient did not lead to an improvement averaged over ten receptor targets. Various consensus algorithms were investigated and a simple combination of AutoDock and AutoDock Vina results gave the most consistent performance that showed early enrichment of known ligands for all receptor targets investigated. In case a number of ligands is known for a specific target, every method proposed in this study should be evaluated. Copyright © 2011 Elsevier Masson SAS. All rights reserved.

  12. PHD--an automatic mail server for protein secondary structure prediction.

    Science.gov (United States)

    Rost, B; Sander, C; Schneider, R

    1994-02-01

    By the middle of 1993, > 30,000 protein sequences has been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology. For the remaining 21,000 sequences, secondary structure prediction provides a rough estimate of structural features. Predictions in three states range between 35% (random) and 88% (homology modelling) overall accuracy. Using information about evolutionary conservation as contained in multiple sequence alignments, the secondary structure of 4700 protein sequences was predicted by the automatic e-mail server PHD. For proteins with at least one known homologue, the method has an expected overall three-state accuracy of 71.4% for proteins with at least one known homologue (evaluated on 126 unique protein chains).

  13. Can Computationally Designed Protein Sequences Improve Secondary Structure Prediction?

    Science.gov (United States)

    2011-01-01

    with the structural classification of proteins ( SCOP ) database of known structural domains (Kuhlman and Baker, 2000; Rohl et al., 2004). Secondary...reported in the literature. Methods In this work, the Astral SCOP 1.75 (Murzin et al., 1995; Hubbard et al., 1999) structural domain database filtered...entry matching the query test sequence can be left out. A total of 6511 SCOP 1.75 domains were used after some domains were discarded due to large

  14. Requirements for supercomputing in energy research: The transition to massively parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  15. Novel Supercomputing Approaches for High Performance Linear Algebra Using FPGAs Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Supercomputing plays a major role in many areas of science and engineering, and it has had tremendous impact for decades in areas such as aerospace, defense, energy,...

  16. Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding

    KAUST Repository

    Cannistraci, Carlo

    2013-06-21

    Motivation: Most functions within the cell emerge thanks to protein-protein interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI networks present significant levels of noise and incompleteness. Predicting interactions using only PPI-network topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unreliable.Methods: Network embedding emphasizes the relations between network proteins embedded in a low-dimensional space, in which protein pairs that are closer to each other represent good candidate interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum curvilinear embedding (MCE), and then adopted shortest path (SP) in the reduced space to assign likelihood scores to candidate interactions. Furthermore, we introduce (i) a new valid variation of MCE, named non-centred MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding dimension; and (iii) two new randomized procedures for evaluating predictions.Results: We compared our method against several unsupervised and supervisedly tuned embedding approaches and node neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link prediction.Conclusion: Minimum curvilinearity is a valuable non-linear framework that we successfully applied to the embedding of protein networks for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the non-linear patterns hidden behind the protein network topology, and can be exploited for predicting new protein links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules. The

  17. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  18. A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

    Institute of Scientific and Technical Information of China (English)

    DONG; Qiwen; WANG; Xiaolong; LIN; Lei; GUAN; Yi

    2005-01-01

    A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun. hit. edu. cn: 81/demos/biology/index.html.

  19. Building a better fragment library for de novo protein structure prediction.

    Directory of Open Access Journals (Sweden)

    Saulo H P de Oliveira

    Full Text Available Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10. We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".

  20. An Overview of the Prediction of Protein DNA-Binding Sites

    Directory of Open Access Journals (Sweden)

    Jingna Si

    2015-03-01

    Full Text Available Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.

  1. Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models

    Science.gov (United States)

    Shin, Woong-Hee; Kang, Xuejiao; Zhang, Jian; Kihara, Daisuke

    2017-01-01

    Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP). PMID:28074879

  2. Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method.

    Science.gov (United States)

    Nouretdinov, Ilia; Gammerman, Alex; Qi, Yanjun; Klein-Seetharaman, Judith

    2012-01-01

    Identifying protein-protein interactions (PPI's) is critical for understanding virtually all cellular molecular mechanisms. Previously, predicting PPI's was treated as a binary classification task and has commonly been solved in a supervised setting which requires a positive labeled set of known PPI's and a negative labeled set of non-interacting protein pairs. In those methods, the learner provides the likelihood of the predicted interaction, but without a confidence level associated with each prediction. Here, we apply a conformal prediction framework to make predictions and estimate confidence of the predictions. The conformal predictor uses a function measuring relative 'strangeness' interacting pairs to check whether prediction of a new example added to the sequence of already known PPI's would conform to the 'exchangeability' assumption: distribution of interacting pairs is invariant with any permutations of the pairs. In fact, this is the only assumption we make about the data. Another advantage is that the user can control a number of errors by providing a desirable confidence level. This feature of CP is very useful for a ranking list of possible interactive pairs. In this paper, the conformal method has been developed to deal with just one class - class interactive proteins - while there is not clearly defined of 'non-interactive'pairs. The confidence level helps the biologist in the interpretation of the results, and better assists the choices of pairs for experimental validation. We apply the proposed conformal framework to improve the identification of interacting pairs between HIV-1 and human proteins.

  3. Hidden markov model for the prediction of transmembrane proteins using MATLAB.

    Science.gov (United States)

    Chaturvedi, Navaneet; Shanker, Sudhanshu; Singh, Vinay Kumar; Sinha, Dhiraj; Pandey, Paras Nath

    2011-01-01

    Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy. The database is available for free at bioinfonavneet@gmail.comvinaysingh@bhu.ac.in.

  4. SUPERCOMPUTERS FOR AIDING ECONOMIC PROCESSES WITH REFERENCE TO THE FINANCIAL SECTOR

    Directory of Open Access Journals (Sweden)

    Jerzy Balicki

    2014-12-01

    Full Text Available The article discusses the use of supercomputers to support business processes with particular emphasis on the financial sector. A reference was made to the selected projects that support economic development. In particular, we propose the use of supercomputers to perform artificial intel-ligence methods in banking. The proposed methods combined with modern technology enables a significant increase in the competitiveness of enterprises and banks by adding new functionality.

  5. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    Directory of Open Access Journals (Sweden)

    J Ramiro Lorenzo

    Full Text Available Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  6. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases.

    Science.gov (United States)

    Parasuram, Ramya; Mills, Caitlyn L; Wang, Zhouxi; Somasundaram, Saroja; Beuning, Penny J; Ondrechen, Mary Jo

    2016-01-15

    Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural

  7. Protein structure prediction: combining de novo modeling with sparse experimental data.

    Science.gov (United States)

    Latek, Dorota; Ekonomiuk, Dariusz; Kolinski, Andrzej

    2007-07-30

    Routine structure prediction of new folds is still a challenging task for computational biology. The challenge is not only in the proper determination of overall fold but also in building models of acceptable resolution, useful for modeling the drug interactions and protein-protein complexes. In this work we propose and test a comprehensive approach to protein structure modeling supported by sparse, and relatively easy to obtain, experimental data. We focus on chemical shift-based restraints from NMR, although other sparse restraints could be easily included. In particular, we demonstrate that combining the typical NMR software with artificial intelligence-based prediction of secondary structure enhances significantly the accuracy of the restraints for molecular modeling. The computational procedure is based on the reduced representation approach implemented in the CABS modeling software, which proved to be a versatile tool for protein structure prediction during the CASP (CASP stands for critical assessment of techniques for protein structure prediction) experiments (see http://predictioncenter/CASP6/org). The method is successfully tested on a small set of representative globular proteins of different size and topology, including the two CASP6 targets, for which the required NMR data already exist. The method is implemented in a semi-automated pipeline applicable to a large scale structural annotation of genomic data. Here, we limit the computations to relatively small set. This enabled, without a loss of generality, a detailed discussion of various factors determining accuracy of the proposed approach to the protein structure prediction.

  8. Predicting disordered regions in proteins using the profiles of amino acid indices

    Science.gov (United States)

    Han, Pengfei; Zhang, Xiuzhen; Feng, Zhi-Ping

    2009-01-01

    Background Intrinsically unstructured or disordered proteins are common and functionally important. Prediction of disordered regions in proteins can provide useful information for understanding protein function and for high-throughput determination of protein structures. Results In this paper, algorithms are presented to predict long and short disordered regions in proteins, namely the long disordered region prediction algorithm DRaai-L and the short disordered region prediction algorithm DRaai-S. These algorithms are developed based on the Random Forest machine learning model and the profiles of amino acid indices representing various physiochemical and biochemical properties of the 20 amino acids. Conclusion Experiments on DisProt3.6 and CASP7 demonstrate that some sets of the amino acid indices have strong association with the ordered and disordered status of residues. Our algorithms based on the profiles of these amino acid indices as input features to predict disordered regions in proteins outperform that based on amino acid composition and reduced amino acid composition, and also outperform many existing algorithms. Our studies suggest that the profiles of amino acid indices combined with the Random Forest learning model is an important complementary method for pinpointing disordered regions in proteins. PMID:19208144

  9. Evaluation of multiple protein docking structures using correctly predicted pairwise subunits

    Directory of Open Access Journals (Sweden)

    Esquivel-Rodríguez Juan

    2012-03-01

    Full Text Available Abstract Background Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights. Methods We generated a series of predicted models (decoys of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys. Results and conclusion We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.

  10. A novel VLSI processor architecture for supercomputing arrays

    Science.gov (United States)

    Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

    1993-01-01

    Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.

  11. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    Science.gov (United States)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Ryabinkin, E.; Wenaus, T.

    2016-02-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF), is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF's Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  12. Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware

    CERN Document Server

    Brunner, Robert J; Myers, Adam D

    2007-01-01

    The objective of our research is to demonstrate the practical usage and orders of magnitude speedup of real-world applications by using alternative technologies to support high performance computing. Currently, the main barrier to the widespread adoption of this technology is the lack of development tools and case studies that typically impede non-specialists that might otherwise develop applications that could leverage these technologies. By partnering with the Innovative Systems Laboratory at the National Center for Supercomputing, we have obtained access to several novel technologies, including several Field-Programmable Gate Array (FPGA) systems, NVidia Graphics Processing Units (GPUs), and the STI Cell BE platform. Our goal is to not only demonstrate the capabilities of these systems, but to also serve as guides for others to follow in our path. To date, we have explored the efficacy of the SRC-6 MAP-C and MAP-E and SGI RASC Athena and RC100 reconfigurable computing platforms in supporting a two-point co...

  13. Numerical infinities and infinitesimals in a new supercomputing framework

    Science.gov (United States)

    Sergeyev, Yaroslav D.

    2016-06-01

    Traditional computers are able to work numerically with finite numbers only. The Infinity Computer patented recently in USA and EU gets over this limitation. In fact, it is a computational device of a new kind able to work numerically not only with finite quantities but with infinities and infinitesimals, as well. The new supercomputing methodology is not related to non-standard analysis and does not use either Cantor's infinite cardinals or ordinals. It is founded on Euclid's Common Notion 5 saying `The whole is greater than the part'. This postulate is applied to all numbers (finite, infinite, and infinitesimal) and to all sets and processes (finite and infinite). It is shown that it becomes possible to write down finite, infinite, and infinitesimal numbers by a finite number of symbols as numerals belonging to a positional numeral system with an infinite radix described by a specific ad hoc introduced axiom. Numerous examples of the usage of the introduced computational tools are given during the lecture. In particular, algorithms for solving optimization problems and ODEs are considered among the computational applications of the Infinity Computer. Numerical experiments executed on a software prototype of the Infinity Computer are discussed.

  14. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  15. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Science.gov (United States)

    Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.; Hassan, Amr H.

    2011-01-01

    General-purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplifying the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.

  16. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    CERN Document Server

    Fluke, Christopher J; Barsdell, Benjamin R; Hassan, Amr H

    2010-01-01

    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, s...

  17. Using the multistage cube network topology in parallel supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Siegel, H.J.; Nation, W.G. (Purdue Univ., Lafayette, IN (USA). School of Electrical Engineering); Kruskal, C.P. (Maryland Univ., College Park, MD (USA). Dept. of Computer Science); Napolitano, L.M. Jr. (Sandia National Labs., Livermore, CA (USA))

    1989-12-01

    A variety of approaches to designing the interconnection network to support communications among the processors and memories of supercomputers employing large-scale parallel processing have been proposed and/or implemented. These approaches are often based on the multistage cube topology. This topology is the subject of much ongoing research and study because of the ways in which the multistage cube can be used. The attributes of the topology that make it useful are described. These include O(N log{sub 2} N) cost for an N input/output network, decentralized control, a variety of implementation options, good data permuting capability to support single instruction stream/multiple data stream (SIMD) parallelism, good throughput to support multiple instruction stream/multiple data stream (MIMD) parallelism, and ability to be partitioned into independent subnetworks to support reconfigurable systems. Examples of existing systems that use multistage cube networks are overviewed. The multistage cube topology can be converted into a single-stage network by associating with each switch in the network a processor (and a memory). Properties of systems that use the multistage cube network in this way are also examined.

  18. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    Directory of Open Access Journals (Sweden)

    De K.

    2016-01-01

    Full Text Available The Large Hadron Collider (LHC, operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed AnalysisWorkload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF, is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF’s Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  19. Supercomputers ready for use as discovery machines for neuroscience

    Directory of Open Access Journals (Sweden)

    Moritz eHelias

    2012-11-01

    Full Text Available NEST is a widely used tool to simulate biological spiking neural networks. Here we explain theimprovements, guided by a mathematical model of memory consumption, that enable us to exploitfor the first time the computational power of the K supercomputer for neuroscience. Multi-threadedcomponents for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling.K is capable of simulating networks corresponding to a brain area with 10^8 neurons and 10^12 synapsesin the worst case scenario of random connectivity; for larger networks of the brain its hierarchicalorganization can be exploited to constrain the number of communicating computer nodes. Wediscuss the limits of the software technology, comparing maximum-□lling scaling plots for K andthe JUGENE BG/P system. The usability of these machines for network simulations has becomecomparable to running simulations on a single PC. Turn-around times in the range of minutes evenfor the largest systems enable a quasi-interactive working style and render simulations on this scalea practical tool for computational neuroscience.

  20. Supercomputers ready for use as discovery machines for neuroscience.

    Science.gov (United States)

    Helias, Moritz; Kunkel, Susanne; Masumoto, Gen; Igarashi, Jun; Eppler, Jochen Martin; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus

    2012-01-01

    NEST is a widely used tool to simulate biological spiking neural networks. Here we explain the improvements, guided by a mathematical model of memory consumption, that enable us to exploit for the first time the computational power of the K supercomputer for neuroscience. Multi-threaded components for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling. K is capable of simulating networks corresponding to a brain area with 10(8) neurons and 10(12) synapses in the worst case scenario of random connectivity; for larger networks of the brain its hierarchical organization can be exploited to constrain the number of communicating computer nodes. We discuss the limits of the software technology, comparing maximum filling scaling plots for K and the JUGENE BG/P system. The usability of these machines for network simulations has become comparable to running simulations on a single PC. Turn-around times in the range of minutes even for the largest systems enable a quasi interactive working style and render simulations on this scale a practical tool for computational neuroscience.

  1. Protein distance constraints predicted by neural networks and probability density functions

    DEFF Research Database (Denmark)

    Lund, Ole; Frimand, Kenneth; Gorodkin, Jan

    1997-01-01

    We predict interatomic C-α distances by two independent data driven methods. The first method uses statistically derived probability distributions of the pairwise distance between two amino acids, whilst the latter method consists of a neural network prediction approach equipped with windows taking....... The predictions are based on a data set derived using a new threshold similarity. We show that distances in proteins are predicted more accurately by neural networks than by probability density functions. We show that the accuracy of the predictions can be further increased by using sequence profiles. A threading...

  2. Kinase-specific prediction of protein phosphorylation sites

    DEFF Research Database (Denmark)

    Miller, Martin Lee; Blom, Nikolaj

    2009-01-01

    -substrate specificity. Here, we briefly describe the available resources for predicting kinase-specific phosphorylation from sequence properties. We address the strengths and weaknesses of these resources, which are based on methods ranging from simple consensus patterns to more advanced machine-learning algorithms....... Furthermore, a protocol for the use of the artificial neural network based predictors, NetPhos and NetPhosK, is provided. Finally, we point to possible developments with the intention of providing the community with improved and additional phosphorylation predictors for large-scale modeling of cellular...... signaling networks....

  3. An update of the DEF database of protein fold class predictions

    DEFF Research Database (Denmark)

    Reczko, Martin; Karras, Dimitris; Bohr, Henrik

    1997-01-01

    An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure...

  4. Predict drug-protein interaction in cellular networking.

    Science.gov (United States)

    Xiao, Xuan; Min, Jian-Liang; Wang, Pu; Chou, Kuo-Chen

    2013-01-01

    Involved with many diseases such as cancer, diabetes, neurodegenerative, inflammatory and respiratory disorders, GPCRs (G-protein-coupled receptors) are the most frequent targets for drug development: over 50% of all prescription drugs currently on the market are actually acting by targeting GPCRs directly or indirectly. Found in every living thing and nearly all cells, ion channels play crucial roles for many vital functions in life, such as heartbeat, sensory transduction, and central nervous system response. Their dysfunction may have significant impact to human health, and hence ion channels are deemed as "the next GPCRs". To develop GPCR-targeting or ion-channel-targeting drugs, the first important step is to identify the interactions between potential drug compounds with the two kinds of protein receptors in the cellular networking. In this minireview, we are to introduce two predictors. One is called iGPCR-Drug accessible at http://www.jci-bioinfo.cn/iGPCR-Drug/; the other called iCDI-PseFpt at http://www.jci-bioinfo.cn/iCDI-PseFpt. The former is for identifying the interactions of drug compounds with GPCRs; while the latter for that with ion channels. In both predictors, the drug compound was formulated by the two-dimensional molecular fingerprint, and the protein receptor by the pseudo amino acid composition generated with the grey model theory, while the operation engine was the fuzzy K-nearest neighbor algorithm. For the convenience of most experimental pharmaceutical and medical scientists, a step-bystep guide is provided on how to use each of the two web-servers to get the desired results without the need to follow the complicated mathematics involved originally for their establishment.

  5. Glial and neuronal proteins in serum predict outcome after severe traumatic brain injury.

    NARCIS (Netherlands)

    Vos, P.E.; Lamers, K.J.B.; Hendriks, J.C.M.; Haaren, M. van; Beems, T.; Zimmerman, C.; Geel, W.J.A. van; Reus, H.P.M. de; Biert, J.; Verbeek, M.M.

    2004-01-01

    OBJECTIVE: To study the ability of glial (glial fibrillary acidic protein [GFAP] and S100b) and neuronal (neuron specific enolase [NSE]) protein levels in peripheral blood to predict outcome after severe traumatic brain injury. METHODS: Eighty-five patients with severe traumatic brain injury (admiss

  6. Prediction of human protein function from post-translational modifications and localization features

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Blom, Nikolaj;

    2002-01-01

    a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects...

  7. NetPhosYeast: prediction of protein phosphorylation sites in yeast

    DEFF Research Database (Denmark)

    Ingrell, C.R.; Miller, Martin Lee; Jensen, O.N.

    2007-01-01

    We here present a neural network-based method for the prediction of protein phosphorylation sites in yeast-an important model organism for basic research. Existing protein phosphorylation site predictors are primarily based on mammalian data and show reduced sensitivity on yeast phosphorylation s...

  8. Predicting apoptosis protein subcellular location with PseAAC by incorporating tripeptide composition.

    Science.gov (United States)

    Liao, Bo; Jiang, Jun-Bao; Zeng, Qing-Guang; Zhu, Wen

    2011-11-01

    The function of the protein is closely correlated with its subcellular localization. Probing into the mechanism of protein sorting and predicting protein subcellular location can provide important clues or insights for understanding the function of proteins. In this paper, we introduce a new PseAAC approach to encode the protein sequence based on the physicochemical properties of amino acid residues. Each of the protein samples was defined as a 146D (dimensional) vector including the 20 amino acid composition components and 126 adjacent triune residues contents. To evaluate the effectiveness of this encoding scheme, we did jackknife tests on three datasets using the support vector machine algorithm. The total prediction accuracies are 84.9%, 91.2%, and 92.6%, respectively. The satisfactory results indicate that our method could be a useful tool in the area of bioinformatics and proteomics.

  9. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm.

    Science.gov (United States)

    Kumar, Prateek; Henikoff, Steven; Ng, Pauline C

    2009-01-01

    The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the function of the protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function. To assess the effect of a substitution, SIFT assumes that important positions in a protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect protein function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the protein sequence. The protocol typically takes 5-20 min, depending on the input. SIFT is available as an online tool (http://sift.jcvi.org).

  10. DOCK/PIERR: web server for structure prediction of protein-protein complexes.

    Science.gov (United States)

    Viswanath, Shruthi; Ravikant, D V S; Elber, Ron

    2014-01-01

    In protein docking we aim to find the structure of the complex formed when two proteins interact. Protein-protein interactions are crucial for cell function. Here we discuss the usage of DOCK/PIERR. In DOCK/PIERR, a uniformly discrete sampling of orientations of one protein with respect to the other, are scored, followed by clustering, refinement, and reranking of structures. The novelty of this method lies in the scoring functions used. These are obtained by examining hundreds of millions of correctly and incorrectly docked structures, using an algorithm based on mathematical programming, with provable convergence properties.

  11. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize.

    Directory of Open Access Journals (Sweden)

    Matt eGeisler

    2015-06-01

    Full Text Available Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6,004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize.

  12. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins

    Energy Technology Data Exchange (ETDEWEB)

    Karp, Jerome M.; Erylimaz, Ertan; Cowburn, David, E-mail: cowburn@cowburnlab.org, E-mail: David.cowburn@einstein.yu.edu [Albert Einstein College of Medicine of Yeshiva University, Department of Biochemistry (United States)

    2015-01-15

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.

  13. A Study on Protein Residue Contacts Prediction by Recurrent Neural Network

    Institute of Scientific and Technical Information of China (English)

    Liu Gui-xia; Zhu Yuan-xian; Zhou Wen-gang; Huang Yan-xin; Zhou Chun-guang; Wang Rong-xing

    2005-01-01

    A new method was described for using a recurrent neural network with bias units to predict contact maps in proteins.The main inputs to the neural network include residues pairwise, residue classification according to hydrophobicity, polar,acidic, basic and secondary structure information and residue separation between two residues. In our work, a dataset was used which was composed of 53 globulin proteins of known 3D structure. An average predictive accuracy of 0. 29 was obtained. Our results demonstrate the viability of the approach for predicting contact maps.

  14. Intrinsic Disorder in Transmembrane Proteins: Roles in Signaling and Topology Prediction.

    Directory of Open Access Journals (Sweden)

    Jérôme Bürgi

    Full Text Available Intrinsically disordered regions (IDRs are peculiar stretches of amino acids that lack stable conformations in solution. Intrinsic Disorder containing Proteins (IDP are defined by the presence of at least one large IDR and have been linked to multiple cellular processes including cell signaling, DNA binding and cancer. Here we used computational analyses and publicly available databases to deepen insight into the prevalence and function of IDRs specifically in transmembrane proteins, which are somewhat neglected in most studies. We found that 50% of transmembrane proteins have at least one IDR of 30 amino acids or more. Interestingly, these domains preferentially localize to the cytoplasmic side especially of multi-pass transmembrane proteins, suggesting that disorder prediction could increase the confidence of topology prediction algorithms. This was supported by the successful prediction of the topology of the uncharacterized multi-pass transmembrane protein TMEM117, as confirmed experimentally. Pathway analysis indicated that IDPs are enriched in cell projection and axons and appear to play an important role in cell adhesion, signaling and ion binding. In addition, we found that IDP are enriched in phosphorylation sites, a crucial post translational modification in signal transduction, when compared to fully ordered proteins and to be implicated in more protein-protein interaction events. Accordingly, IDPs were highly enriched in short protein binding regions called Molecular Recognition Features (MoRFs. Altogether our analyses strongly support the notion that the transmembrane IDPs act as hubs in cellular signal events.

  15. Simplified Method for Predicting a Functional Class of Proteins in Transcription Factor Complexes

    KAUST Repository

    Piatek, Marek J.

    2013-07-12

    Background:Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species.Results:Based on experimentally validated interactions between human TFs with different TFs, TcoFs and other nuclear proteins, our two classification systems (implemented as a web-based application) achieve high accuracies in distinguishing TFs and TcoFs from other nuclear proteins, and TFs from TcoFs respectively.Conclusion:As demonstrated, given the fact that two proteins are capable of forming direct physical interactions and using only information about their sequence composition, we have developed a completely new method for predicting a functional class of TF interacting protein partners

  16. Prediction of heat-induced polymerization of different globular food proteins in mixtures with wheat gluten.

    Science.gov (United States)

    Lambrecht, Marlies A; Rombouts, Ine; De Ketelaere, Bart; Delcour, Jan A

    2017-04-15

    Egg, soy or whey protein co-exists with wheat gluten in different food products. Different protein types impact each other during heat treatment. A positive co-protein effect occurs when heat-induced polymerization of a mixture of proteins is more intense than that of the isolated proteins. The intrinsic protein characteristics of globular proteins which enhance polymerization in mixtures with gluten are unknown. In this report, a model was developed to predict potential co-protein effects in mixtures of gluten and globular proteins during heating at 100°C. A negative co-protein effect with addition of lysozyme, no co-protein effect with soy glycinin or egg yolk and positive co-protein effects with bovine serum albumin, (S-)ovalbumin, egg white, whole egg, defatted egg yolk, wheat albumins and wheat globulins were detected. The level of accessible free sulfhydryl groups and the surface hydrophobicity of unfolded globular proteins were the main characteristics in determining the co-protein effects in gluten mixtures.

  17. [Prediction of protein subcellular locations by ensemble of improved K-nearest neighbor].

    Science.gov (United States)

    Xue, Wei; Wang, Xiongfei; Zhao, Nan; Yang, Rongli; Hong, Xiaoyu

    2017-04-25

    Adaboost algorithm with improved K-nearest neighbor classifiers is proposed to predict protein subcellular locations. Improved K-nearest neighbor classifier uses three sequence feature vectors including amino acid composition, dipeptide and pseudo amino acid composition of protein sequence. K-nearest neighbor uses Blast in classification stage. The overall success rates by the jackknife test on two data sets of CH317 and Gram1253 are 92.4% and 93.1%. Adaboost algorithm with the novel K-nearest neighbor improved by Blast is an effective method for predicting subcellular locations of proteins.

  18. An Improved Method of Predicting Extinction Coefficients for the Determination of Protein Concentration.

    Science.gov (United States)

    Hilario, Eric C; Stern, Alan; Wang, Charlie H; Vargas, Yenny W; Morgan, Charles J; Swartz, Trevor E; Patapoff, Thomas W

    2017-01-01

    Concentration determination is an important method of protein characterization required in the development of protein therapeutics. There are many known methods for determining the concentration of a protein solution, but the easiest to implement in a manufacturing setting is absorption spectroscopy in the ultraviolet region. For typical proteins composed of the standard amino acids, absorption at wavelengths near 280 nm is due to the three amino acid chromophores tryptophan, tyrosine, and phenylalanine in addition to a contribution from disulfide bonds. According to the Beer-Lambert law, absorbance is proportional to concentration and path length, with the proportionality constant being the extinction coefficient. Typically the extinction coefficient of proteins is experimentally determined by measuring a solution absorbance then experimentally determining the concentration, a measurement with some inherent variability depending on the method used. In this study, extinction coefficients were calculated based on the measured absorbance of model compounds of the four amino acid chromophores. These calculated values for an unfolded protein were then compared with an experimental concentration determination based on enzymatic digestion of proteins. The experimentally determined extinction coefficient for the native proteins was consistently found to be 1.05 times the calculated value for the unfolded proteins for a wide range of proteins with good accuracy and precision under well-controlled experimental conditions. The value of 1.05 times the calculated value was termed the predicted extinction coefficient. Statistical analysis shows that the differences between predicted and experimentally determined coefficients are scattered randomly, indicating no systematic bias between the values among the proteins measured. The predicted extinction coefficient was found to be accurate and not subject to the inherent variability of experimental methods. We propose the use of a

  19. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study w