WorldWideScience

Sample records for non-redundant protein database

  1. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2005-01-01

    The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

  2. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

    Science.gov (United States)

    Pruitt, Kim D; Tatusova, Tatiana; Maglott, Donna R

    2007-01-01

    NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2,879,860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence.

  3. Ion pairs in non-redundant protein structures

    Indian Academy of Sciences (India)

    B A Gowri Shankar; R Sarani; Daliah Michael; P Mridula; C Vasuki; G Sowmiya; B Vasundhar; P Sudha; J Jeyakanthan; D Velmurugan; K Sekar

    2007-06-01

    Ion pairs contribute to several functions including the activity of catalytic triads, fusion of viral membranes, stability in thermophilic proteins and solvent–protein interactions. Furthermore, they have the ability to affect the stability of protein structures and are also a part of the forces that act to hold monomers together. This paper deals with the possible ion pair combinations and networks in 25% and 90% non-redundant protein chains. Different types of ion pairs present in various secondary structural elements are analysed. The ion pairs existing between different subunits of multisubunit protein structures are also computed and the results of various analyses are presented in detail. The protein structures used in the analysis are solved using X-ray crystallography, whose resolution is better than or equal to 1.5 Å and R-factor better than or equal to 20%. This study can, therefore, be useful for analyses of many protein functions. It also provides insights into the better understanding of the architecture of protein structure.

  4. Non-redundant patent sequence databases with value-added annotations at two levels.

    Science.gov (United States)

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  5. Non-redundant patent sequence databases with value-added annotations at two levels

    Science.gov (United States)

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134

  6. Overlap and diversity in antimicrobial peptide databases: compiling a non-redundant set of sequences.

    Science.gov (United States)

    Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Tellez-Ibarra, Roberto; Llorente-Quesada, Monica T; Salgado, Jesús; Barigye, Stephen J; Liu, Jun

    2015-08-01

    The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are included in CAMP_Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Non-redundant unique interface structures as templates for modeling protein interactions

    OpenAIRE

    Engin Cukuroglu; Attila Gursoy; Ruth Nussinov; Ozlem Keskin

    2014-01-01

    Non-Redundant Unique Interface Structures as Templates for Modeling Protein Interactions Engin Cukuroglu1, Attila Gursoy1*, Ruth Nussinov2,3, Ozlem Keskin1* 1 Center for Computational Biology and Bioinformatics and College of Engineering, Koc University, Istanbul, Turkey, 2 National Cancer Institute, Cancer and Inflammation Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., National Cancer Institute, Frederick, Maryland, United States of ...

  8. Gene Expression Responses to FUS, EWS, and TAF15 Reduction and Stress Granule Sequestration Analyses Identifies FET-Protein Non-Redundant Functions

    DEFF Research Database (Denmark)

    Blechingberg, Jenny; Luo, Yonglun; Bolund, Lars;

    2012-01-01

    in amyotrophic lateral sclerosis, frontotemporal lobar degeneration, and trinucleotide repeat expansion diseases. We here describe a comparative characterization of FET-protein localization and gene regulatory functions. We show that FUS and TAF15 locate to cellular stress granules to a larger extend than EWS...

  9. The critical role of protein arginine methyltransferase prmt8 in zebrafish embryonic and neural development is non-redundant with its paralogue prmt1.

    Directory of Open Access Journals (Sweden)

    Yu-ling Lin

    Full Text Available Protein arginine methyltransferase (PRMT 1 is the most conserved and widely distributed PRMT in eukaryotes. PRMT8 is a vertebrate-restricted paralogue of PRMT1 with an extra N-terminal sequence and brain-specific expression. We use zebrafish (Danio rerio as a vertebrate model to study PRMT8 function and putative redundancy with PRMT1. The transcripts of zebrafish prmt8 were specifically expressed in adult zebrafish brain and ubiquitously expressed from zygotic to early segmentation stage before the neuronal development. Whole-mount in situ hybridization revealed ubiquitous prmt8 expression pattern during early embryonic stages, similar to that of prmt1. Knockdown of prmt8 with antisense morpholino oligonucleotide phenocopied prmt1-knockdown, with convergence/extension defects at gastrulation. Other abnormalities observed later include short body axis, curled tails, small and malformed brain and eyes. Catalytically inactive prmt8 failed to complement the morphants, indicating the importance of methyltransferase activity. Full-length prmt8 but not prmt1 cRNA can rescue the phenotypic changes. Nevertheless, cRNA encoding Prmt1 fused with the N-terminus of Prmt8 can rescue the prmt8 morphants. In contrast, N-terminus- deleted but not full-length prmt8 cRNA can rescue the prmt1 morphants as efficiently as prmt1 cRNA. Abnormal brain morphologies illustrated with brain markers and loss of fluorescent neurons in a transgenic fish upon prmt8 knockdown confirm the critical roles of prmt8 in neural development. In summery, our study is the first report showing the expression and function of prmt8 in early zebrafish embryogenesis. Our results indicate that prmt8 may play important roles non-overlapping with prmt1 in embryonic and neural development depending on its specific N-terminus.

  10. Micromorphic continua: non-redundant formulations

    Science.gov (United States)

    Romano, Giovanni; Barretta, Raffaele; Diaco, Marina

    2016-11-01

    The kinematics of generalized continua is investigated and key points concerning the definition of overall tangent strain measure are put into evidence. It is shown that classical measures adopted in the literature for micromorphic continua do not obey a constraint qualification requirement, to be fulfilled for well-posedness in optimization theory, and are therefore termed redundant. Redundancy of continua with latent microstructure and of constrained Cosserat continua is also assessed. A simplest, non-redundant, kinematic model of micromorphic continua, is proposed by dropping the microcurvature field. The equilibrium conditions and the related variational linear elastostatic problem are formulated and briefly discussed. The simplest model involves a reduced number of state variables and of elastic constitutive coefficients, when compared with other models of micromorphic continua, being still capable of enriching the Cauchy continuum model in a significant way.

  11. Protein-Protein Interaction Databases

    DEFF Research Database (Denmark)

    Szklarczyk, Damian; Jensen, Lars Juhl

    2015-01-01

    of research are explored. Here we present an overview of the most widely used protein-protein interaction databases and the methods they employ to gather, combine, and predict interactions. We also point out the trade-off between comprehensiveness and accuracy and the main pitfall scientists have to be aware......Years of meticulous curation of scientific literature and increasingly reliable computational predictions have resulted in creation of vast databases of protein interaction data. Over the years, these repositories have become a basic framework in which experiments are analyzed and new directions...

  12. Protein Model Database

    Energy Technology Data Exchange (ETDEWEB)

    Fidelis, K; Adzhubej, A; Kryshtafovych, A; Daniluk, P

    2005-02-23

    The phenomenal success of the genome sequencing projects reveals the power of completeness in revolutionizing biological science. Currently it is possible to sequence entire organisms at a time, allowing for a systemic rather than fractional view of their organization and the various genome-encoded functions. There is an international plan to move towards a similar goal in the area of protein structure. This will not be achieved by experiment alone, but rather by a combination of efforts in crystallography, NMR spectroscopy, and computational modeling. Only a small fraction of structures are expected to be identified experimentally, the remainder to be modeled. Presently there is no organized infrastructure to critically evaluate and present these data to the biological community. The goal of the Protein Model Database project is to create such infrastructure, including (1) public database of theoretically derived protein structures; (2) reliable annotation of protein model quality, (3) novel structure analysis tools, and (4) access to the highest quality modeling techniques available.

  13. SynProt: A Database for Proteins of Detergent-Resistant Synaptic Protein Preparations

    Science.gov (United States)

    Pielot, Rainer; Smalla, Karl-Heinz; Müller, Anke; Landgraf, Peter; Lehmann, Anne-Christin; Eisenschmidt, Elke; Haus, Utz-Uwe; Weismantel, Robert; Gundelfinger, Eckart D.; Dieterich, Daniela C.

    2012-01-01

    Chemical synapses are highly specialized cell–cell contacts for communication between neurons in the CNS characterized by complex and dynamic protein networks at both synaptic membranes. The cytomatrix at the active zone (CAZ) organizes the apparatus for the regulated release of transmitters from the presynapse. At the postsynaptic side, the postsynaptic density constitutes the machinery for detection, integration, and transduction of the transmitter signal. Both pre- and postsynaptic protein networks represent the molecular substrates for synaptic plasticity. Their function can be altered both by regulating their composition and by post-translational modification of their components. For a comprehensive understanding of synaptic networks the entire ensemble of synaptic proteins has to be considered. To support this, we established a comprehensive database for synaptic junction proteins (SynProt database) primarily based on proteomics data obtained from biochemical preparations of detergent-resistant synaptic junctions. The database currently contains 2,788 non-redundant entries of rat, mouse, and some human proteins, which mainly have been manually extracted from 12 proteomic studies and annotated for synaptic subcellular localization. Each dataset is completed with manually added information including protein classifiers as well as automatically retrieved and updated information from public databases (UniProt and PubMed). We intend that the database will be used to support modeling of synaptic protein networks and rational experimental design. PMID:22737123

  14. Imaging protoplanets: observing transition disks with non-redundant masking

    CERN Document Server

    Sallum, Steph; Close, Laird M; Hinz, Philip M; Follette, Katherine B; Kratter, Kaitlin; Skemer, Andrew J; Bailey, Vanessa P; Briguglio, Runa; Defrere, Denis; Macintosh, Bruce A; Males, Jared R; Morzinski, Katie M; Puglisi, Alfio T; Rodigas, Timothy J; Spalding, Eckhart; Tuthill, Peter G; Vaz, Amali; Weinberger, Alycia; Xomperio, Marco

    2016-01-01

    Transition disks, protoplanetary disks with inner clearings, are promising objects in which to directly image forming planets. The high contrast imaging technique of non-redundant masking is well posed to detect planetary mass companions at several to tens of AU in nearby transition disks. We present non-redundant masking observations of the T Cha and LkCa 15 transition disks, both of which host posited sub-stellar mass companions. However, due to a loss of information intrinsic to the technique, observations of extended sources (e.g. scattered light from disks) can be misinterpreted as moving companions. We discuss tests to distinguish between these two scenarios, with applications to the T Cha and LkCa 15 observations. We argue that a static, forward-scattering disk can explain the T Cha data, while LkCa 15 is best explained by multiple orbiting companions.

  15. Mining Non-Redundant High Order Correlations in Binary Data.

    Science.gov (United States)

    Zhang, Xiang; Pan, Feng; Wang, Wei; Nobel, Andrew

    2008-08-01

    Many approaches have been proposed to find correlations in binary data. Usually, these methods focus on pair-wise correlations. In biology applications, it is important to find correlations that involve more than just two features. Moreover, a set of strongly correlated features should be non-redundant in the sense that the correlation is strong only when all the interacting features are considered together. Removing any feature will greatly reduce the correlation.In this paper, we explore the problem of finding non-redundant high order correlations in binary data. The high order correlations are formalized using multi-information, a generalization of pairwise mutual information. To reduce the redundancy, we require any subset of a strongly correlated feature subset to be weakly correlated. Such feature subsets are referred to as Non-redundant Interacting Feature Subsets (NIFS). Finding all NIFSs is computationally challenging, because in addition to enumerating feature combinations, we also need to check all their subsets for redundancy. We study several properties of NIFSs and show that these properties are useful in developing efficient algorithms. We further develop two sets of upper and lower bounds on the correlations, which can be incorporated in the algorithm to prune the search space. A simple and effective pruning strategy based on pair-wise mutual information is also developed to further prune the search space. The efficiency and effectiveness of our approach are demonstrated through extensive experiments on synthetic and real-life datasets.

  16. Database of Interacting Proteins (DIP)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent...

  17. Database Description - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Yeast Interacting Proteins Database Database Description General information of database Database name Yeast... Interacting Proteins Database Alternative name - Creator Creator Name: Takashi Ito* Creator Affiliation: Di...-4-7136-3989 FAX: +81-4-7136-3979 E-mail : Database classification Metabolic and Signaling Pathways - Protei...n-protein interactions Organism Taxonomy Name: Saccharomyces cerevisiae Taxonomy ID: 4932 Database descripti...ive yeast two-hybrid analysis of budding yeast proteins. Features and manner of utilization of database Prot

  18. SynProt: A Comprehensive Database for Proteins of the Detergent-Resistant Synaptic Junctions Fraction

    Directory of Open Access Journals (Sweden)

    Rainer ePielot

    2012-06-01

    Full Text Available Chemical synapses are highly specialized cell-cell contacts for communication between neurons in the CNS characterized by complex and dynamic protein networks at both synaptic membranes. The cytomatrix at the active zone (CAZ organizes the apparatus for the regulated release of transmitters from the presynapse. At the postsynaptic side, the postsynaptic density constitutes the machinery for detection, integration and transduction of the transmitter signal. Both pre- and postsynaptic protein networks represent the molecular substrates for synaptic plasticity. Their function can be altered both by regulating their composition and by post-translational modification of their components. For a comprehensive understanding of synaptic networks the entire ensemble of synaptic proteins has to be considered. To support this, we established a comprehensive database for synaptic junction proteins (SynProt database primarily based on proteomics data obtained from biochemical preparations of detergent-resistant synaptic junctions. The database currently contains 2,788 non-redundant entries of rat, mouse and some human proteins, which mainly have been manually extracted from twelve proteomic studies and annotated for synaptic subcellular localization. Each dataset is completed with manually added information including protein classifiers as well as automatically retrieved and updated information from public databases (UniProt and PubMed. We intend that the database will be used to support modeling of synaptic protein networks and rational experimental design.

  19. Extra Solar Planet Science With a Non Redundant Mask

    Science.gov (United States)

    Minto, Stefenie Nicolet; Sivaramakrishnan, Anand; Greenbaum, Alexandra; St. Laurent, Kathryn; Thatte, Deeparshi

    2017-01-01

    To detect faint planetary companions near a much brighter star, at the Resolution Limit of the James Webb Space Telescope (JWST) the Near-Infrared Imager and Slitless Spectrograph (NIRISS) will use a non-redundant aperture mask (NRM) for high contrast imaging. I simulated NIRISS data of stars with and without planets, and run these through the code that measures interferometric image properties to determine how sensitive planetary detection is to our knowledge of instrumental parameters, starting with the pixel scale. I measured the position angle, distance, and contrast ratio of the planet (with respect to the star) to characterize the binary pair. To organize this data I am creating programs that will automatically and systematically explore multi-dimensional instrument parameter spaces and binary characteristics. In the future my code will also be applied to explore any other parameters we can simulate.

  20. ATtRACT-a database of RNA-binding proteins and associated motifs.

    Science.gov (United States)

    Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

    2016-01-01

    RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es.

  1. A Frequent Closed Itemsets Lattice-based Approach for Mining Minimal Non-Redundant Association Rules

    CERN Document Server

    Vo, Bay

    2011-01-01

    There are many algorithms developed for improvement the time of mining frequent itemsets (FI) or frequent closed itemsets (FCI). However, the algorithms which deal with the time of generating association rules were not put in deep research. In reality, in case of a database containing many FI/FCI (from ten thousands up to millions), the time of generating association rules is much larger than that of mining FI/FCI. Therefore, this paper presents an application of frequent closed itemsets lattice (FCIL) for mining minimal non-redundant association rules (MNAR) to reduce a lot of time for generating rules. Firstly, we use CHARM-L for building FCIL. After that, based on FCIL, an algorithm for fast generating MNAR will be proposed. Experimental results show that the proposed algorithm is much faster than frequent itemsets lattice-based algorithm in the mining time.

  2. Medicago PhosphoProtein Database: a repository for Medicago truncatula phosphoprotein data

    Directory of Open Access Journals (Sweden)

    Christopher M. Rose

    2012-06-01

    Full Text Available The ability of legume crops to fix atmospheric nitrogen via a symbiotic association with soil rhizobia makes them an essential component of many agricultural systems. Initiation of this symbiosis requires protein phosphorylation-mediated signaling in response to rhizobial signals named Nod factors. Medicago truncatula (Medicago is the model system for studying legume biology, making the study of its phosphoproteome essential. Here, we describe the Medicago Phosphoprotein Database (http://phospho.medicago.wisc.edu, a repository built to house phosphoprotein, phosphopeptide, and phosphosite data specific to Medicago. Currently, the Medicago Phosphoprotein Database holds 3,457 unique phosphopeptides that contain 3,404 non-redundant sites of phosphorylation on 829 proteins. Through the web-based interface, users are allowed to browse identified proteins or search for proteins of interest. Furthermore, we allow users to conduct BLAST searches of the database using both peptide sequences and phosphorylation motifs as queries. The data contained within the database are available for download to be investigated at the user’s discretion. The Medicago Phosphoprotein Database will be updated continually with novel phosphoprotein and phosphopeptide identifications, with the intent of constructing an unparalleled compendium of large-scale Medicago phosphorylation data.

  3. The Pfam protein families database.

    Science.gov (United States)

    Finn, Robert D; Tate, John; Mistry, Jaina; Coggill, Penny C; Sammut, Stephen John; Hotz, Hans-Rudolf; Ceric, Goran; Forslund, Kristoffer; Eddy, Sean R; Sonnhammer, Erik L L; Bateman, Alex

    2008-01-01

    Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).

  4. Update History of This Database - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...Yeast Interacting Proteins Database Update History of This Database Date Update contents 2010/03/29 Yeast In...t This Database Database Description Download License Update History of This Database Site Policy | Contact Us Update History

  5. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Science.gov (United States)

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Discovering and Mining Links for Protein Databases

    Directory of Open Access Journals (Sweden)

    A.Immaculate Mercy

    2014-01-01

    Full Text Available This work introduces a link analysis procedure for discovering relationships in a protein database or a relational database generalizing simple correspondence analysis. It is based on extracting the links to the rela ted protein database and malfunctioned protein database. The datasets are trained in order to find out missing interactions and the sequences related to them. Further the analysis of links proceeds by performing a random walk defining a Markov chain. The e lements of interest are analysed through stochastic complementation which gives a reduced Markov chain. This reduced map is then analysed by projecting the elements of interest through Principal component analysis. Several Protein datasets are analysed using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.

  7. Discovering and Mining Links for Protein Databases

    Directory of Open Access Journals (Sweden)

    A. Immaculate Mercy

    2015-10-01

    Full Text Available protein database or a relational database generalizing simple correspondence analysis. It is based on extracting the links to the related protein database and malfunctioned protein database. The datasets are trained in order to find out missing interactions and the sequences related to them. Further the analysis of links proceeds by performing a random walk defining a Markov chain. The elements of interest are analysed through stochastic complementation which gives a reduced Markov chain. This reduced map is then analysed by projecting the elements of interest through Principal component analysis. Several Protein datasets are analysed using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.

  8. SENTRA, a database of signal transduction proteins.

    Energy Technology Data Exchange (ETDEWEB)

    D' Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL

    2000-01-01

    SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.

  9. Improving decoy databases for protein folding algorithms

    KAUST Repository

    Lindsey, Aaron

    2014-01-01

    Copyright © 2014 ACM. Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 17 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on a popular modern scoring function and show that they contain a greater number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.

  10. Annotation and retrieval in protein interaction databases

    Science.gov (United States)

    Cannataro, Mario; Hiram Guzzi, Pietro; Veltri, Pierangelo

    2014-06-01

    Biological databases have been developed with a special focus on the efficient retrieval of single records or the efficient computation of specialized bioinformatics algorithms against the overall database, such as in sequence alignment. The continuos production of biological knowledge spread on several biological databases and ontologies, such as Gene Ontology, and the availability of efficient techniques to handle such knowledge, such as annotation and semantic similarity measures, enable the development on novel bioinformatics applications that explicitly use and integrate such knowledge. After introducing the annotation process and the main semantic similarity measures, this paper shows how annotations and semantic similarity can be exploited to improve the extraction and analysis of biologically relevant data from protein interaction databases. As case studies, the paper presents two novel software tools, OntoPIN and CytoSeVis, both based on the use of Gene Ontology annotations, for the advanced querying of protein interaction databases and for the enhanced visualization of protein interaction networks.

  11. HCVpro: Hepatitis C virus protein interaction database

    KAUST Repository

    Kwofie, Samuel K.

    2011-12-01

    It is essential to catalog characterized hepatitis C virus (HCV) protein-protein interaction (PPI) data and the associated plethora of vital functional information to augment the search for therapies, vaccines and diagnostic biomarkers. In furtherance of these goals, we have developed the hepatitis C virus protein interaction database (HCVpro) by integrating manually verified hepatitis C virus-virus and virus-human protein interactions curated from literature and databases. HCVpro is a comprehensive and integrated HCV-specific knowledgebase housing consolidated information on PPIs, functional genomics and molecular data obtained from a variety of virus databases (VirHostNet, VirusMint, HCVdb and euHCVdb), and from BIND and other relevant biology repositories. HCVpro is further populated with information on hepatocellular carcinoma (HCC) related genes that are mapped onto their encoded cellular proteins. Incorporated proteins have been mapped onto Gene Ontologies, canonical pathways, Online Mendelian Inheritance in Man (OMIM) and extensively cross-referenced to other essential annotations. The database is enriched with exhaustive reviews on structure and functions of HCV proteins, current state of drug and vaccine development and links to recommended journal articles. Users can query the database using specific protein identifiers (IDs), chromosomal locations of a gene, interaction detection methods, indexed PubMed sources as well as HCVpro, BIND and VirusMint IDs. The use of HCVpro is free and the resource can be accessed via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. © 2011 Elsevier B.V.

  12. Proteomics: Protein Identification Using Online Databases

    Science.gov (United States)

    Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

    2012-01-01

    Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…

  13. Proteomics: Protein Identification Using Online Databases

    Science.gov (United States)

    Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

    2012-01-01

    Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…

  14. RPG: the Ribosomal Protein Gene database

    OpenAIRE

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and informa...

  15. Sentra, a database of signal transduction proteins.

    Energy Technology Data Exchange (ETDEWEB)

    Maltsev, N.; Marland, E.; Yu, G. X.; Bhatnagar, S.; Lusk, R.; Mathematics and Computer Science

    2002-01-01

    Sentra (http://www-wit.mcs.anl.gov/sentra) is a database of signal transduction proteins with the emphasis on microbial signal transduction. The database was updated to include classes of signal transduction systems modulated by either phosphorylation or methylation reactions such as PAS proteins and serine/threonine kinases, as well as the classical two-component histidine kinases and methyl-accepting chemotaxis proteins. Currently, Sentra contains signal transduction proteins from 43 completely sequenced prokaryotic genomes as well as sequences from SWISS-PROT and TrEMBL. Signal transduction proteins are annotated with information describing conserved domains, paralogous and orthologous sequences, and conserved chromosomal gene clusters. The newly developed user interface supports flexible search capabilities and extensive visualization of the data.

  16. Gene and protein nomenclature in public databases

    Directory of Open Access Journals (Sweden)

    Zimmer Ralf

    2006-08-01

    Full Text Available Abstract Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of

  17. Dynamics of social contagions with memory of non-redundant information

    CERN Document Server

    Wang, Wei; Zhang, Hai-Feng; Lai, Ying-Cheng

    2015-01-01

    A key ingredient in social contagion dynamics is reinforcement, as adopting a certain social behavior requires verification of its credibility and legitimacy. Memory of non-redundant information plays an important role in reinforcement, which so far has eluded theoretical analysis. We first propose a general social contagion model with reinforcement derived from non-redundant information memory. Then, we develop a unified edge-based compartmental theory to analyze this model, and a remarkable agreement with numerics is obtained on some specific models. Using a spreading threshold model as a specific example to understand the memory effect, in which each individual adopts a social behavior only when the cumulative pieces of information that the individual received from his/her neighbors exceeds an adoption threshold. Through analysis and numerical simulations, we find that the memory characteristic markedly affects the dynamics as quantified by the final adoption size. Strikingly, we uncover a transition pheno...

  18. Discovering Non-Redundant Association Rules using MinMax Approximation Rules

    OpenAIRE

    R. Vijaya Prakash; Dr. A. Govardhan3; Prof. SSVN. Sarma

    2012-01-01

    Frequent pattern mining is an important area of data mining used to generate the Association Rules. The extracted Frequent Patterns quality is a big concern, as it generates huge sets of rules and many of them are redundant. Mining Non-Redundant Frequent patterns is a big concern in the area of Association rule mining. In this paper we proposed a method to eliminate the redundant Frequent patterns using MinMax rule approach, to generate the quality Association Rules.

  19. Database of osmoregulated proteins in mammalian cells.

    Science.gov (United States)

    Grady, Cameron R; Knepper, Mark A; Burg, Maurice B; Ferraris, Joan D

    2014-10-28

    Biological information, even in highly specialized fields, is increasing at a volume that no single investigator can assimilate. The existence of this vast knowledge base creates the need for specialized computer databases to store and selectively sort the information. We have developed a manually curated database of the effects of hypertonicity on target proteins. Effects include changes in mRNA abundance and protein abundance, activity, phosphorylation state, binding, and cellular compartment. The biological information used in this database was derived from three research approaches: transcriptomic, proteomic, and reductionist (hypothesis-driven). The data are presented in the form of grammatical triplets consisting of subject, verb phrase, and object. The purpose of this format is to allow the data to be read from left to right as an English sentence. It is readable either by humans or by computers using natural language processing algorithms. An example of a data entry reads "Hypertonicity increases activity of ABL1 in HEK293." This database was created to provide access to a wealth of information on the effects of hypertonicity in a format that can be selectively sorted. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Physiological Reports published by Wiley Periodicals, Inc. on behalf of The Physiological Society and the American Physiological Society.

  20. Database for protein adsorption: update on developments

    Science.gov (United States)

    Paszek, Ewa; Vasina, Elena N.; Nicolau, Dan V.

    2008-12-01

    Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, protein hydrophobicity and spread of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable - the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided into two separate subsets representing protein adsorption on hydrophilic and hydrophobic surfaces. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the absorbed layer and the surface tension of the proteincovered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.

  1. Planetary system, star formation, and black hole science with non-redundant masking on space telescopes

    CERN Document Server

    Sivaramakrishna, Anand; Ireland, Michael; Lloyd, James; Perrin, Marshall; Soummer, Remi; McKernan, Barry; Ford, Saavik

    2009-01-01

    Non-redundant masking (NRM) is a high contrast, high resolution technique relevant to future space missions concerned with extrasolar planetary system and star formation, as well as general high angular resolution galactic and extragalactic astronomy. NRM enables the highest angular resolution science possible given the telescope's diameter and operating wavelength. It also provides precise information on a telescope's optical state. We must assess NRM contrast limits realistically to understand the science yield of NRM in space, and, simultaneously, develop NRM science for planet and star formation and extragalactic science in the UV-NIR, to help steer high resolution space-based astronomy in the coming decade.

  2. RPG: the Ribosomal Protein Gene database.

    Science.gov (United States)

    Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

    2004-01-01

    RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.

  3. Full Data of Yeast Interacting Proteins Database (Annotation Updated Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available st proteins and their interactions are required. Several sources including YPD (Yeast Proteome Database, Cos...ome database (WormPD): comprehensive resources for the organization and comparison of model organism protein

  4. Analysis of knockout mutants reveals non-redundant functions of poly(ADP-ribose)polymerase isoforms in Arabidopsis.

    Science.gov (United States)

    Pham, Phuong Anh; Wahl, Vanessa; Tohge, Takayuki; de Souza, Laise Rosado; Zhang, Youjun; Do, Phuc Thi; Olas, Justyna J; Stitt, Mark; Araújo, Wagner L; Fernie, Alisdair R

    2015-11-01

    The enzyme poly(ADP-ribose)polymerase (PARP) has a dual function being involved both in the poly(ADP-ribosyl)ation and being a constituent of the NAD(+) salvage pathway. To date most studies, both in plant and non-plant systems, have focused on the signaling role of PARP in poly(ADP-ribosyl)ation rather than any role that can be ascribed to its metabolic function. In order to address this question we here used a combination of expression, transcript and protein localization studies of all three PARP isoforms of Arabidopsis alongside physiological analysis of the corresponding mutants. Our analyses indicated that whilst all isoforms of PARP were localized to the nucleus they are also present in non-nuclear locations with parp1 and parp3 also localised in the cytosol, and parp2 also present in the mitochondria. We next isolated and characterized insertional knockout mutants of all three isoforms confirming a complete knockout in the full length transcript levels of the target genes as well as a reduced total leaf NAD hydrolase activity in the two isoforms (PARP1, PARP2) that are highly expressed in leaves. Physiological evaluation of the mutant lines revealed that they displayed distinctive metabolic and root growth characteristics albeit unaltered leaf morphology under optimal growth conditions. We therefore conclude that the PARP isoforms play non-redundant non-nuclear metabolic roles and that their function is highly important in rapidly growing tissues such as the shoot apical meristem, roots and seeds.

  5. A Brief Review of RNA-Protein Interaction Database Resources

    Directory of Open Access Journals (Sweden)

    Ying Yi

    2017-01-01

    Full Text Available RNA-protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA-protein interactions and binding sites from experiments and predictions, RNA-protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA-protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA-protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA-protein interactions.

  6. Artificial Intelligence in Prediction of Secondary Protein Structure Using CB513 Database

    Science.gov (United States)

    Avdagic, Zikrija; Purisevic, Elvir; Omanovic, Samir; Coralic, Zlatan

    2009-01-01

    In this paper we describe CB513 a non-redundant dataset, suitable for development of algorithms for prediction of secondary protein structure. A program was made in Borland Delphi for transforming data from our dataset to make it suitable for learning of neural network for prediction of secondary protein structure implemented in MATLAB Neural-Network Toolbox. Learning (training and testing) of neural network is researched with different sizes of windows, different number of neurons in the hidden layer and different number of training epochs, while using dataset CB513. PMID:21347158

  7. In-focus wavefront sensing using non-redundant mask-induced pupil diversity

    CERN Document Server

    Greenbaum, Alexandra

    2016-01-01

    Wavefront estimation using in-focus image data is critical to many applications. This data is invariant to a sign flip with complex conjugation of the complex amplitude in the pupil, making for a non-unique solution. Information from an in-focus image taken through a non-redundant pupil mask (NRM) can break this ambiguity, enabling the true aberration to be determined. We demonstrate this by priming a full pupil Gerchberg-Saxton phase retrieval with NRM fringe phase information. We apply our method to measure simulated aberrations on the segmented James Webb Space Telescope (JWST) mirror using full pupil and NRM data from its Near Infrared Imager and Slitless Spectrograph (NIRISS).

  8. Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments

    OpenAIRE

    Lies Baeten; Joke Reumers; Vicente Tur; François Stricher; Tom Lenaerts; Luis Serrano; Frederic Rousseau; Joost Schymkowitz

    2008-01-01

    As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more ...

  9. In-focus phase retrieval using JWST-NIRISS's non-redundant mask

    Science.gov (United States)

    Greenbaum, Alexandra Z.; Gamper, Noah; Sivaramakrishnan, Anand

    2016-07-01

    The James Webb Space Telescope's Near InfraRed Imager and Slitless Spectrograph (NIRISS) contains a 7-hole non-redundant mask (NRM) in its pupil. NIRISS's Aperture Masking Interferometry (AMI) mode is useful both for science as well as wavefront sensing. In-focus science detector NRM and full pupil images of unresolved stars can be used to measure the wavefront without any dedicated wavefront sensing hardware or any moving mirrors. Using routine science operational sequences, these images can be taken before or after any science visit. NRM fringe phases constrain Gerchberg-Saxton phase retrieval to disambiguate the algorithm's two-fold degeneracy. We summarize how consecutive masked and unmasked exposures provide enough information to reconstruct a wavefront with up to ˜1-2 rms radians of error. We present our latest progress on using this approach on laboratory experiments, and discuss those results in the context of contingency for JWST segment phasing. We discuss extending our method to ground-based AO systems and future space telescopes.

  10. An image-plane algorithm for JWST's non-redundant aperture mask data

    CERN Document Server

    Greenbaum, Alexandra Z; Sivaramakrishnan, Anand; Lacour, Sylvestre

    2014-01-01

    The high angular resolution technique of non-redundant masking (NRM) or aperture masking interferometry (AMI) has yielded images of faint protoplanetary companions of nearby stars from the ground. AMI on James Webb Space Telescope (JWST)'s Near Infrared Imager and Slitless Spectrograph (NIRISS) has a lower thermal background than ground-based facilites and does not suffer from atmospheric instability. NIRISS AMI images are likely to have 90 - 95% Strehl ratio between 2.77 and 4.8 micron. In this paper we quantify factors that limit the raw point source contrast of JWST NRM. We develop an analytic model of the NRM point spread function which includes different optical path delays (pistons) between mask holes and fit the model parameters with image plane data. It enables a straightforward way to exclude bad pixels, is suited to limited fields of view, and can incorporate effects such as intra-pixel sensitivity variations. We simulate various sources of noise to estimate their effect on the standard deviation of...

  11. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  12. PSSARD: protein sequence-structure analysis relational database.

    Science.gov (United States)

    Guruprasad, Kunchur; Srikanth, K; Babu, A V N

    2005-09-15

    We have implemented a relational database comprising a representative dataset of amino acid sequences and their associated secondary structure. The representative amino acid sequences were selected according to the PDB_SELECT program by choosing proteins corresponding to protein crystal structure data deposited in the protein data bank that share less than 25% overall pair-wise sequence identity. The secondary structure was extracted from the protein data bank website. The information content in the database includes the protein description, PDB code, crystal structure resolution, total number of amino acid residues in the protein chain, amino acid sequence, secondary structure conformation and its summary. The database is freely accessible from the website mentioned below and is useful to query on any of the above fields. The database is particularly useful to quickly retrieve amino acid sequences that are compatible to any super-secondary structure conformation from several proteins simultaneously.

  13. Database of ligand-induced domain movements in enzymes

    Directory of Open Access Journals (Sweden)

    Hayward Steven

    2009-03-01

    Full Text Available Abstract Background Conformational change induced by the binding of a substrate or coenzyme is a poorly understood stage in the process of enzyme catalysed reactions. For enzymes that exhibit a domain movement, the conformational change can be clearly characterized and therefore the opportunity exists to gain an understanding of the mechanisms involved. The development of the non-redundant database of protein domain movements contains examples of ligand-induced domain movements in enzymes, but this valuable data has remained unexploited. Description The domain movements in the non-redundant database of protein domain movements are those found by applying the DynDom program to pairs of crystallographic structures contained in Protein Data Bank files. For each pair of structures cross-checking ligands in their Protein Data Bank files with the KEGG-LIGAND database and using methods that search for ligands that contact the enzyme in one conformation but not the other, the non-redundant database of protein domain movements was refined down to a set of 203 enzymes where a domain movement is apparently triggered by the binding of a functional ligand. For these cases, ligand binding information, including hydrogen bonds and salt-bridges between the ligand and specific residues on the enzyme is presented in the context of dynamical information such as the regions that form the dynamic domains, the hinge bending residues, and the hinge axes. Conclusion The presentation at a single website of data on interactions between a ligand and specific residues on the enzyme alongside data on the movement that these interactions induce, should lead to new insights into the mechanisms of these enzymes in particular, and help in trying to understand the general process of ligand-induced domain closure in enzymes. The website can be found at: http://www.cmp.uea.ac.uk/dyndom/enzymeList.do

  14. AN IMAGE-PLANE ALGORITHM FOR JWST'S NON-REDUNDANT APERTURE MASK DATA

    Energy Technology Data Exchange (ETDEWEB)

    Greenbaum, Alexandra Z. [Johns Hopkins University Department of Physics and Astronomy 3400 North Charles, Baltimore, MD 21218 (United States); Pueyo, Laurent; Sivaramakrishnan, Anand [Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218 (United States); Lacour, Sylvestre [LESIA, CNRS/UMR-8109, Observatoire de Paris, UPMC, Université Paris Diderot 5 place Jules Janssen, 92195 Meudon (France)

    2015-01-10

    The high angular resolution technique of non-redundant masking (NRM) or aperture masking interferometry (AMI) has yielded images of faint protoplanetary companions of nearby stars from the ground. AMI on James Webb Space Telescope (JWST)'s Near Infrared Imager and Slitless Spectrograph (NIRISS) has a lower thermal background than ground-based facilities and does not suffer from atmospheric instability. NIRISS AMI images are likely to have 90%-95% Strehl ratio between 2.77 and 4.8 μm. In this paper we quantify factors that limit the raw point source contrast of JWST NRM. We develop an analytic model of the NRM point spread function which includes different optical path delays (pistons) between mask holes and fit the model parameters with image plane data. It enables a straightforward way to exclude bad pixels, is suited to limited fields of view, and can incorporate effects such as intra-pixel sensitivity variations. We simulate various sources of noise to estimate their effect on the standard deviation of closure phase, σ{sub CP} (a proxy for binary point source contrast). If σ{sub CP} < 10{sup –4} radians—a contrast ratio of 10 mag—young accreting gas giant planets (e.g., in the nearby Taurus star-forming region) could be imaged with JWST NIRISS. We show the feasibility of using NIRISS' NRM with the sub-Nyquist sampled F277W, which would enable some exoplanet chemistry characterization. In the presence of small piston errors, the dominant sources of closure phase error (depending on pixel sampling, and filter bandwidth) are flat field errors and unmodeled variations in intra-pixel sensitivity. The in-flight stability of NIRISS will determine how well these errors can be calibrated by observing a point source. Our results help develop efficient observing strategies for space-based NRM.

  15. Autonomous control system reconfiguration for spacecraft with non-redundant actuators

    Science.gov (United States)

    Grossman, Walter

    1995-01-01

    The Small Satellite Technology Initiative (SSTI) 'CLARK' spacecraft is required to be single-failure tolerant, i.e., no failure of any single component or subsystem shall result in complete mission loss. Fault tolerance is usually achieved by implementing redundant subsystems. Fault tolerant systems are therefore heavier and cost more to build and launch than non-redundent, non fault-tolerant spacecraft. The SSTI CLARK satellite Attitude Determination and Control System (ADACS) achieves single-fault tolerance without redundancy. The attitude determination system system uses a Kalman Filter which is inherently robust to loss of any single attitude sensor. The attitude control system uses three orthogonal reaction wheels for attitude control and three magnetic dipoles for momentum control. The nominal six-actuator control system functions by projecting the attitude correction torque onto the reaction wheels while a slower momentum management outer loop removes the excess momentum in the direction normal to the local B field. The actuators are not redundant so the nominal control law cannot be implemented in the event of a loss of a single actuator (dipole or reaction wheel). The spacecraft dynamical state (attitude, angular rate, and momentum) is controllable from any five-element subset of the six actuators. With loss of an actuator the instantaneous control authority may not span R(3) but the controllability gramian integral(limits between t,0) Phi(t, tau)B(tau )B(prime)(tau) Phi(prime)(t, tau)d tau retains full rank. Upon detection of an actuator failure the control torque is decomposed onto the remaining active axes. The attitude control torque is effected and the over-orbit momentum is controlled. The resulting control system performance approaches that of the nominal system.

  16. The PIR integrated protein databases and data retrieval system

    Directory of Open Access Journals (Sweden)

    H Huang

    2006-01-01

    Full Text Available The Protein Information Resource (PIR provides many databases and tools to support genomic and proteomic research. PIR is a member of UniProt—Universal Protein Resource—the central repository of protein sequence and function, which maintains UniProt Knowledgebase with extensively curated annotation, UniProt Reference databases to speed sequence searches, and UniProt Archive to reflect sequence history. PIR also provides PIRSF family classification system based on evolutionary relationships of full-length proteins, and iProClass integrated database of protein family, function, and structure. These databases are easily accessible from PIR web site using a centralized data retrieval system for information retrieval and knowledge discovery.

  17. DBMLoc: a Database of proteins with multiple subcellular localizations

    Directory of Open Access Journals (Sweden)

    Zhou Yun

    2008-02-01

    Full Text Available Abstract Background Subcellular localization information is one of the key features to protein function research. Locating to a specific subcellular compartment is essential for a protein to function efficiently. Proteins which have multiple localizations will provide more clues. This kind of proteins may take a high proportion, even more than 35%. Description We have developed a database of proteins with multiple subcellular localizations, designated DBMLoc. The initial release contains 10470 multiple subcellular localization-annotated entries. Annotations are collected from primary protein databases, specific subcellular localization databases and literature texts. All the protein entries are cross-referenced to GO annotations and SwissProt. Protein-protein interactions are also annotated. They are classified into 12 large subcellular localization categories based on GO hierarchical architecture and original annotations. Download, search and sequence BLAST tools are also available on the website. Conclusion DBMLoc is a protein database which collects proteins with more than one subcellular localization annotation. It is freely accessed at http://www.bioinfo.tsinghua.edu.cn/DBMLoc/index.htm.

  18. Non-redundant Aperture Masking Interferometry (AMI) and segment phasing with JWST-NIRISS

    Science.gov (United States)

    Sivaramakrishnan, Anand; Lafrenière, David; Ford, K. E. Saavik; McKernan, Barry; Cheetham, Anthony; Greenbaum, Alexandra Z.; Tuthill, Peter G.; Lloyd, James P.; Ireland, Michael J.; Doyon, René; Beaulieu, Mathilde; Martel, André; Koekemoer, Anton; Martinache, Frantz; Teuben, Peter

    2012-09-01

    The Aperture Masked Interferometry (AMI) mode on JWST-NIRISS is implemented as a 7-hole, 15% throughput, non-redundant mask (NRM) that operates with 5-8% bandwidth filters at 3.8, 4.3, and 4.8 microns. We present refined estimates of AMI's expected point-source contrast, using realizations of noise matched to JWST pointing requirements, NIRISS detector noise, and Rev-V JWST wavefront error models for the telescope and instrument. We describe our point-source binary data reduction algorithm, which we use as a standardized method to compare different observational strategies. For a 7.5 magnitude star we report a 10-a detection at between 8.7 and 9.2 magnitudes of contrast between 100 mas to 400 mas respectively, using closure phases and squared visibilities in the absence of bad pixels, but with various other noise sources. With 3% of the pixels unusable, the expected contrast drops by about 0.5 magnitudes. AMI should be able to reach targets as bright as M=5. There will be significant overlap between Gemini-GPI and ESO-SPHERE targets and AMI's search space, and a complementarity with NIRCam's coronagraph. We also illustrate synthesis imaging with AMI, demonstrating an imaging dynamic range of 25 at 100 mas scales. We tailor existing radio interferometric methods to retrieve a faint bar across a bright nucleus, and explain the similarities to synthesis imaging at radio wavelengths. Modest contrast observations of dusty accretion flows around AGNs will be feasible for NIRISS AMI. We show our early results of image-plane deconvolution as well. Finally, we report progress on an NRM-inspired approach to mitigate mission-level risk associated with JWST's specialized wavefront sensing hardware. By combining narrow band and medium band Nyquist-sampled images taken with a science camera we can sense JWST primary mirror segment tip-tilt to lOmas, and piston to a few nm. We can sense inter-segment piston errors of up to 5 coherence lengths of the broadest bandpass filter used

  19. How well are protein structures annotated in secondary databases?

    Science.gov (United States)

    Rother, Kristian; Michalsky, Elke; Leser, Ulf

    2005-09-01

    We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.

  20. Yeast Interacting Proteins Database: YJL199C, YJL199C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available d in closely related Saccharomyces species; protein detected in large-scale protein-protein interaction studies...cies; protein detected in large-scale protein-protein interaction studies Rows with this prey as prey (4) Ro...n; not conserved in closely related Saccharomyces species; protein detected in large-scale protein-protein interaction studies... species; protein detected in large-scale protein-protein interaction studies Rows with this prey as prey Ro

  1. LocSigDB: a database of protein localization signals

    OpenAIRE

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will hel...

  2. ARAMEMNON, a novel database for Arabidopsis integral membrane proteins

    DEFF Research Database (Denmark)

    Schwacke, Rainer; Schneider, Anja; van der Graaff, Eric

    2003-01-01

    A specialized database (DB) for Arabidopsis membrane proteins, ARAMEMNON, was designed that facilitates the interpretation of gene and protein sequence data by integrating features that are presently only available from individual sources. Using several publicly available prediction programs, put...... is accessible at the URL http://aramemnon.botanik.uni-koeln.de....

  3. cuticleDB: a relational database of Arthropod cuticular proteins

    Directory of Open Access Journals (Sweden)

    Willis Judith H

    2004-09-01

    Full Text Available Abstract Background The insect exoskeleton or cuticle is a bi-partite composite of proteins and chitin that provides protective, skeletal and structural functions. Little information is available about the molecular structure of this important complex that exhibits a helicoidal architecture. Scores of sequences of cuticular proteins have been obtained from direct protein sequencing, from cDNAs, and from genomic analyses. Most of these cuticular protein sequences contain motifs found only in arthropod proteins. Description cuticleDB is a relational database containing all structural proteins of Arthropod cuticle identified to date. Many come from direct sequencing of proteins isolated from cuticle and from sequences from cDNAs that share common features with these authentic cuticular proteins. It also includes proteins from the Drosophila melanogaster and the Anopheles gambiae genomes, that have been predicted to be cuticular proteins, based on a Pfam motif (PF00379 responsible for chitin binding in Arthropod cuticle. The total number of the database entries is 445: 370 derive from insects, 60 from Crustacea and 15 from Chelicerata. The database can be accessed from our web server at http://bioinformatics.biol.uoa.gr/cuticleDB. Conclusions CuticleDB was primarily designed to contain correct and full annotation of cuticular protein data. The database will be of help to future genome annotators. Users will be able to test hypotheses for the existence of known and also of yet unknown motifs in cuticular proteins. An analysis of motifs may contribute to understanding how proteins contribute to the physical properties of cuticle as well as to the precise nature of their interaction with chitin.

  4. Human protein reference database as a discovery resource for proteomics

    Science.gov (United States)

    Peri, Suraj; Navarro, J. Daniel; Kristiansen, Troels Z.; Amanchy, Ramars; Surendranath, Vineeth; Muthusamy, Babylakshmi; Gandhi, T. K. B.; Chandrika, K. N.; Deshpande, Nandan; Suresh, Shubha; Rashmi, B. P.; Shanker, K.; Padma, N.; Niranjan, Vidya; Harsha, H. C.; Talreja, Naveen; Vrushabendra, B. M.; Ramya, M. A.; Yatish, A. J.; Joy, Mary; Shivashankar, H. N.; Kavitha, M. P.; Menezes, Minal; Choudhury, Dipanwita Roy; Ghosh, Neelanjana; Saravana, R.; Chandran, Sreenath; Mohan, Sujatha; Jonnalagadda, Chandra Kiran; Prasad, C. K.; Kumar-Sinha, Chandan; Deshpande, Krishna S.; Pandey, Akhilesh

    2004-01-01

    The rapid pace at which genomic and proteomic data is being generated necessitates the development of tools and resources for managing data that allow integration of information from disparate sources. The Human Protein Reference Database (http://www.hprd.org) is a web-based resource based on open source technologies for protein information about several aspects of human proteins including protein–protein interactions, post-translational modifications, enzyme–substrate relationships and disease associations. This information was derived manually by a critical reading of the published literature by expert biologists and through bioinformatics analyses of the protein sequence. This database will assist in biomedical discoveries by serving as a resource of genomic and proteomic information and providing an integrated view of sequence, structure, function and protein networks in health and disease. PMID:14681466

  5. AMYPdb: A database dedicated to amyloid precursor proteins

    Directory of Open Access Journals (Sweden)

    Delamarche Christian

    2008-06-01

    Full Text Available Abstract Background Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. Results We therefore created a free online knowledge database (AMYPdb dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. Conclusion AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.

  6. LocSigDB: a database of protein localization signals.

    Science.gov (United States)

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/

  7. Manually Curated Database of Rice Proteins (MCDRP, a database of digitized experimental data on rice

    Directory of Open Access Journals (Sweden)

    Saurabh Raghuvanshi

    2016-11-01

    Full Text Available MCDRP or ‘Manually Curated Database of Rice Proteins’ is a database of digitized experimental datasets on rice proteins. Every aspect of the experimental data published in peer-reviewed research articles on rice biology has been digitized with the help of novel data curation models. These models use a semantic and structured arrangement of alpha-numeric notation, including several well known ontologies, to represent various aspect of the data. As a result data from more than 15,000 different experiments pertaining to about 2400 rice proteins has been digitized from over 540 published and peer-reviewed research articles. The database portal provides access to the digitized experimental data via search or browse functions. In essence, one can instantly access data from even a single data-point from a collection of thousands of the experimental datasets. On the other hand, one can easily access the digitized experimental data from multiple research articles on a rice protein. Based on the analysis and integration of the digitized experimental data, more than 800 different traits (molecular, biochemical or phenotypic have been precisely mapped onto the rice proteins along with the underlying experimental evidences. Similarly, over 4370 associations, based on experimental evidence, have been established between the rice proteins and various gene ontology terms. The database is being continuously updated and is freely available at www.genomeindia.org.in/biocuration.

  8. Yeast Interacting Proteins Database: YGL198W, YDR084C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available les; computational analysis of large-scale protein-protein interaction data suggests a possible role in vesi... GTPases, localized to late Golgi vesicles; computational analysis of large-scale protein-protein interactio

  9. Yeast Interacting Proteins Database: YDR425W, YGL161C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available icles; computational analysis of large-scale protein-protein interaction data sug...olgi vesicles; computational analysis of large-scale protein-protein interaction data suggests a possible ro

  10. Yeast Interacting Proteins Database: YPL070W, YOR155C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available utational analysis of large-scale protein-protein interaction data suggests a possible role in transcription...9 domain; computational analysis of large-scale protein-protein interaction data suggests a possible role in

  11. Yeast Interacting Proteins Database: YNL189W, YOR284W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ait as prey (0) YOR284W HUA2 Cytoplasmic protein of unknown function; computational analysis of large-scal...protein of unknown function; computational analysis of large-scale protein-protein interaction data suggests

  12. Yeast Interacting Proteins Database: YPL070W, YLR245C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available utational analysis of large-scale protein-protein interaction data suggests a possible role in transcription...Vps9 domain; computational analysis of large-scale protein-protein interaction data suggests a possible role

  13. Yeast Interacting Proteins Database: YPL070W, YPR193C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available utational analysis of large-scale protein-protein interaction data suggests a possible role in transcription...in; computational analysis of large-scale protein-protein interaction data suggests a possible role in trans

  14. Yeast Interacting Proteins Database: YGL161C, YDR084C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available les; computational analysis of large-scale protein-protein interaction data suggests a possible role in vesi...GTPases, localized to late Golgi vesicles; computational analysis of large-scale protein-protein interaction

  15. Yeast Interacting Proteins Database: YHR111W, YIL008W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YHR111W UBA4 Protein that activates Urm1p before its conjugation to proteins (urmyl...description Protein that activates Urm1p before its conjugation to proteins (urmylation); one target is the

  16. Yeast Interacting Proteins Database: YDL226C, YGL198W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available omputational analysis of large-scale protein-protein interaction data suggests a ... computational analysis of large-scale protein-protein interaction data suggests a possible role in vesicle-

  17. Yeast Interacting Proteins Database: YNL189W, YJL199C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available tein; not conserved in closely related Saccharomyces species; protein detected in large-scale protein-protein interaction studies...myces species; protein detected in large-scale protein-protein interaction studies Rows with this prey as pr

  18. Yeast Interacting Proteins Database: YGR268C, YER125W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available larity to that of Type I J-proteins; computational analysis of large-scale protein-protein interaction data ...equence similarity to that of Type I J-proteins; computational analysis of large-scale protein-protein inter

  19. Yeast Interacting Proteins Database: YOR124C, YGR268C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available that of Type I J-proteins; computational analysis of large-scale protein-protein interaction data suggests a...tational analysis of large-scale protein-protein interaction data suggests a possible role in actin patch as

  20. Yeast Interacting Proteins Database: YLR291C, YJL199C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ved in closely related Saccharomyces species; protein detected in large-scale protein-protein interaction studies...in large-scale protein-protein interaction studies Rows with this prey as prey Rows with this prey as prey (

  1. Yeast Interacting Proteins Database: YML064C, YJL199C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available y related Saccharomyces species; protein detected in large-scale protein-protein interaction studies Rows wi...in-protein interaction studies Rows with this prey as prey (4) Rows with this prey as bait (1) 28 6 3 4 0 0 ...d in closely related Saccharomyces species; protein detected in large-scale prote

  2. Yeast Interacting Proteins Database: YLR291C, YPL070W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPL070W MUK1 Cytoplasmic protein of unknown function containing a Vps9 domain; computation...rotein of unknown function containing a Vps9 domain; computational analysis of large-scale protein-protein i

  3. Yeast Interacting Proteins Database: YPL095C, YGL198W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available d to late Golgi vesicles; computational analysis of large-scale protein-protein interaction data suggests a ...gene name YIP4 Prey description Protein that interacts with Rab GTPases, localized to late Golgi vesicles; computation

  4. Yeast Interacting Proteins Database: YML109W, YGL190C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available sential regulatory subunit B of protein phosphatase 2A, which has multiple roles ...-essential regulatory subunit B of protein phosphatase 2A, which has multiple roles in mitosis and protein b

  5. Yeast Interacting Proteins Database: YML064C, YOR284W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available th this bait as prey (0) YOR284W HUA2 Cytoplasmic protein of unknown function; computational analysis of large-scale... unknown function; computational analysis of large-scale protein-protein interact

  6. Yeast Interacting Proteins Database: YIL008W, YHR111W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ait as prey (1) YHR111W UBA4 Protein that activates Urm1p before its conjugation ...4 Prey description Protein that activates Urm1p before its conjugation to proteins (urmylation); one target

  7. Yeast Interacting Proteins Database: YJR091C, YKL076C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...NA-binding proteins, interacts with mRNAs encoding membrane-associated proteins; involved in localizing the

  8. Yeast Interacting Proteins Database: YJR091C, YNR048W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...y of RNA-binding proteins, interacts with mRNAs encoding membrane-associated proteins; involved in localizing

  9. Yeast Interacting Proteins Database: YJR091C, YML015C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...y of RNA-binding proteins, interacts with mRNAs encoding membrane-associated proteins; involved in localizing

  10. Yeast Interacting Proteins Database: YDL239C, YPL070W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available Vps9 domain; computational analysis of large-scale protein-protein interaction data suggests a possible role...ey description Cytoplasmic protein of unknown function containing a Vps9 domain; computational analysis of large-scale

  11. Yeast Interacting Proteins Database: YDL226C, YJL151C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available s bait as prey (0) YJL151C SNA3 Integral membrane protein localized to vacuolar intralumenal vesicles, computation...intralumenal vesicles, computational analysis of large-scale protein-protein interaction data suggests a pos... gene name SNA3 Prey description Integral membrane protein localized to vacuolar

  12. Yeast Interacting Proteins Database: YEL043W, YOR164C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available on quantitative analysis of protein-protein interaction maps; may interact with ribosomes, based on co-purification studies...ing based on quantitative analysis of protein-protein interaction maps; may interact with ribosomes, based on co-purification studies

  13. Yeast Interacting Proteins Database: YHL002W, YNR006W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ycling of Golgi proteins and formation of lumenal membranes Rows with this bait as bait (1) Rows with this b...required for recycling Golgi proteins, forming lumenal membranes and sorting ubiquitinated proteins destined...on, as well as for recycling of Golgi proteins and formation of lumenal membranes...ith Hse1p; required for recycling Golgi proteins, forming lumenal membranes and sorting ubiquitinated protei

  14. Yeast Interacting Proteins Database: YOR047C, YKL038W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available racts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a...Bait description Protein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose senso...rs Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regulator of the tra

  15. Yeast Interacting Proteins Database: YOR284W, YOR284W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YOR284W HUA2 Cytoplasmic protein of unknown function; computational analysis of lar...it as bait (1) Rows with this bait as prey (4) YOR284W HUA2 Cytoplasmic protein of unknown function; computa...tein of unknown function; computational analysis of large-scale protein-protein i... HUA2 Prey description Cytoplasmic protein of unknown function; computational ana

  16. Yeast Interacting Proteins Database: YEL017W, YEL017W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available 17W GTT3 Protein of unknown function with a possible role in glutathione metabolism, as suggested by compu...Bait description Protein of unknown function with a possible role in glutathione metabolism, as suggested by comput...ion Protein of unknown function with a possible role in glutathione metabolism, as suggested by computationa...YEL017W GTT3 Protein of unknown function with a possible role in glutathione metabolism, as suggested by com...putational analysis of large-scale protein-protein interaction data; GFP-fusion pro

  17. Yeast Interacting Proteins Database: YDL121C, YDL100C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL121C - Putative protein of unknown function; green fluorescent protein (GFP)-fus...ion protein localizes to the endoplasmic retiuculum; YDL121C is not an essential protein Rows with this bait... as bait (1) Rows with this bait as prey (0) YDL100C GET3 Guanine nucleotide exchange factor for Gpa1p; ampl...his prey as prey (10) Rows with this prey as bait (2) 3 5 2 2 0 0 0 0 0 - - - - - 0 0 8 - Show YDL121C Bait ORF YDL...n; green fluorescent protein (GFP)-fusion protein localizes to the endoplasmic retiuculum; YDL121C is not an

  18. Yeast Interacting Proteins Database: YKR092C, YKL023W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available W - Putative protein of unknown function, predicted by computational methods to b...ait as prey (0) Prey ORF YKL023W Prey gene name - Prey description Putative protein of unknown function, predicted by computation

  19. Yeast Interacting Proteins Database: YLR291C, YOR284W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YOR284W HUA2 Cytoplasmic protein of unknown function; computational analysis of l...prey (0) Prey ORF YOR284W Prey gene name HUA2 Prey description Cytoplasmic protein of unknown function; computation

  20. Yeast Interacting Proteins Database: YLR373C, YGL190C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ase 2A, which has multiple roles in mitosis and protein biosynthesis; involved in regulation of mitotic exit...phosphatase 2A, which has multiple roles in mitosis and protein biosynthesis; involved in regulation of mito

  1. Yeast Interacting Proteins Database: YPL204W, YHR185C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available Sporulation protein required for prospore membrane formation at selected spindle poles...n Sporulation protein required for prospore membrane formation at selected spindle poles, ensures functional

  2. Yeast Interacting Proteins Database: YJR091C, YEL013W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes...ed proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes increased sens

  3. Yeast Interacting Proteins Database: YPL070W, YBR176W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available utational analysis of large-scale protein-protein interaction data suggests a possible role in transcription...otein of unknown function containing a Vps9 domain; computational analysis of large-scale

  4. Yeast Interacting Proteins Database: YDR084C, YGL198W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available with Rab GTPases, localized to late Golgi vesicles; computational analysis of large-scale...omputational analysis of large-scale protein-protein interaction data suggests a possible role in vesicle-me

  5. Yeast Interacting Proteins Database: YMR077C, YML015C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available lumen; cytoplasmic protein recruited to endosomal membranes Rows with this bait as bait (3) Rows with this b...o the multivesicular body pathway to the lysosomal/vacuolar lumen; cytoplasmic protein recruited to endosomal membranes

  6. Yeast Interacting Proteins Database: YMR316W, YER125W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YMR316W DIA1 Protein of unknown function, involved in invasive and pseudohyphal gro... of unknown function, involved in invasive and pseudohyphal growth; green fluorescent protein (GFP)-fusion p

  7. Yeast Interacting Proteins Database: YPR148C, YDL237W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPR148C - Protein of unknown function that may interact with ribosomes, based on co-purification experiments... with ribosomes, based on co-purification experiments; green fluorescent protein

  8. Yeast Interacting Proteins Database: YER081W, YOR318C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YOR318C - Dubious open reading frame unlikely to encode a protein, based on available experimental and comparative...likely to encode a protein, based on available experimental and comparative seque

  9. Yeast Interacting Proteins Database: YBL033C, YNL105W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available reading frame unlikely to encode a protein, based on available experimental and comparative sequence data; p...a protein, based on available experimental and comparative sequence data; partial

  10. Yeast Interacting Proteins Database: YER081W, YPR126C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPR126C - Dubious open reading frame unlikely to encode a functional protein, based on available experimental and comparative...ubious open reading frame unlikely to encode a functional protein, based on available experimental and comparative

  11. Yeast Interacting Proteins Database: YLR263W, YDR510W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YLR263W RED1 Protein component of the axial elements of the synaptonemal complex, i...ait gene name RED1 Bait description Protein component of the axial elements of the synaptonemal complex, inv

  12. Yeast Interacting Proteins Database: YNL152W, YMR032W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YNL152W INN1 Essential protein that associates with the contractile actomyosin ring... Bait description Essential protein that associates with the contractile actomyosin ring, required for ingre

  13. Yeast Interacting Proteins Database: YKL002W, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthes...ng of integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly s

  14. Yeast Interacting Proteins Database: YJR091C, YKL002W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available g of integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly sy... integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthe

  15. Yeast Interacting Proteins Database: YLR347C, YLR377C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ucleoporins to mediate nuclear import of NLS-containing cargo proteins via the nuclear pore complex; regulat...0p; interacts with nucleoporins to mediate nuclear import of NLS-containing cargo proteins via the nuclear p

  16. Yeast Interacting Proteins Database: YLR347C, YBR176W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ucleoporins to mediate nuclear import of NLS-containing cargo proteins via the nuclear pore complex; regulat...p; interacts with nucleoporins to mediate nuclear import of NLS-containing cargo proteins via the nuclear po

  17. Yeast Interacting Proteins Database: YML064C, YKL103C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available he peptidase family M18; often used as a marker protein in studies of autophagy a... to the peptidase family M18; often used as a marker protein in studies of autophagy and cytosol to vacuole

  18. Yeast Interacting Proteins Database: YHR114W, YDL134C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available y (0) YDL134C PPH21 Catalytic subunit of protein phosphatase 2A, functionally red...gene name PPH21 Prey description Catalytic subunit of protein phosphatase 2A, functionally redundant with Pp

  19. Yeast Interacting Proteins Database: YLR288C, YLR125W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available prey (1) YLR125W - Putative protein of unknown function; mutant has decreased Ty3... name - Prey description Putative protein of unknown function; mutant has decreased Ty3 transposition; YLR12

  20. Yeast Interacting Proteins Database: YPL002C, YJR102C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ndent sorting of proteins into the endosome; appears to be functionally related to SNF7; involved in glucose...x, which is involved in ubiquitin-dependent sorting of proteins into the endosome; appears

  1. Yeast Interacting Proteins Database: YDL089W, YML008C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rDNA repeat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein...peat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein localiz

  2. Yeast Interacting Proteins Database: YPL059W, YIL105C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available oxidoreductase; mitochondrial matrix protein involved in the synthesis/assembly of iron-sulfur centers; mono...oreductase; mitochondrial matrix protein involved in the synthesis/assembly of iron-sulfur centers; monothio

  3. Yeast Interacting Proteins Database: YJR091C, YOR014W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...ssociated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes increas

  4. Yeast Interacting Proteins Database: YJR091C, YLR059C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre... mRNAs encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; ove

  5. Yeast Interacting Proteins Database: YJR091C, YOR317W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...NAs encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overex

  6. Yeast Interacting Proteins Database: YPL077C, YLR423C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPL077C - Putative protein of unknown function; regulates PIS1 expression; mutant display...Bait description Putative protein of unknown function; regulates PIS1 expression; mutant display

  7. Yeast Interacting Proteins Database: YPR029C, YFR043C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available his bait as prey (1) YFR043C IRC6 Putative protein of unknown function; null mutant displays increased level...C6 Prey description Putative protein of unknown function; null mutant displays increased levels of spontaneo

  8. Yeast Interacting Proteins Database: YPR040W, YDL188C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPR040W TIP41 Protein that interacts physically and genetically with Tap42p, which ...ait ORF YPR040W Bait gene name TIP41 Bait description Protein that interacts physically and genetically

  9. Yeast Interacting Proteins Database: YPR040W, YDL134C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPR040W TIP41 Protein that interacts physically and genetically with Tap42p, which ...Bait ORF YPR040W Bait gene name TIP41 Bait description Protein that interacts physically and genetically

  10. Yeast Interacting Proteins Database: YGR086C, YKL142W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ption induced under cell wall stress; protein levels are reduced under anaerobic conditions; originally thou...iption induced under cell wall stress; protein levels are reduced under anaerobic conditions; originally tho

  11. Yeast Interacting Proteins Database: YPL204W, YOL149W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPL204W HRR25 Protein kinase involved in regulating diverse events including vesicu...tion Protein kinase involved in regulating diverse events including vesicular trafficking, DNA repair, and c

  12. Yeast Interacting Proteins Database: YPL204W, YER095W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPL204W HRR25 Protein kinase involved in regulating diverse events including vesicu... gene name HRR25 Bait description Protein kinase involved in regulating diverse events including vesicular t

  13. Yeast Interacting Proteins Database: YPR103W, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available tein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors...gulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf

  14. Yeast Interacting Proteins Database: YCL020W, YDR510W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available R510W SMT3 Ubiquitin-like protein of the SUMO family, conjugated to lysine residu... name SMT3 Prey description Ubiquitin-like protein of the SUMO family, conjugated to lysine residues of targ

  15. Yeast Interacting Proteins Database: YGL145W, YNL258C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ripheral membrane protein required for Golgi-to-ER retrograde traffic; component ... membrane protein required for Golgi-to-ER retrograde traffic; component of the ER target site that interact

  16. Yeast Interacting Proteins Database: YNL258C, YGL145W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YNL258C DSL1 Peripheral membrane protein required for Golgi-to-ER retrograde traffi...t description Peripheral membrane protein required for Golgi-to-ER retrograde traffic; component of the ER t

  17. Yeast Interacting Proteins Database: YHR114W, YLR112W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available y (0) YLR112W - Dubious open reading frame unlikely to encode a protein, based on...e name - Prey description Dubious open reading frame unlikely to encode a protein, based on available experi

  18. Yeast Interacting Proteins Database: YNL258C, YKR022C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ity (BRITE) - Alternative path with 1 intervening protein (YPD) 0 Alternative path with 2 intervening proteins (YPD) 0 IST hit 3 IST hit in the opposite bait/prey orientation - ...

  19. The Protein Identifier Cross-Referencing (PICR service: reconciling protein identifiers across multiple source databases

    Directory of Open Access Journals (Sweden)

    Leinonen Rasko

    2007-10-01

    Full Text Available Abstract Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR service, a web application that provides interactive and programmatic (SOAP and REST access to a mapping algorithm that uses the UniProt Archive (UniParc as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV or Microsoft Excel (XLS files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR

  20. Yeast Interacting Proteins Database: YGL198W, YGL161C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YGL198W YIP4 Protein that interacts with Rab GTPases, localized to late Golgi vesicles; comput...that interacts with Rab GTPases, localized to late Golgi vesicles; computational ...eracts with Rab GTPases, localized to late Golgi vesicles; computational analysis of large-scale protein-pro...ized to late Golgi vesicles; computational analysis of large-scale protein-protein interaction data suggests

  1. Yeast Interacting Proteins Database: YGL161C, YGL198W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YGL161C YIP5 Protein that interacts with Rab GTPases, localized to late Golgi vesicles; comput...that interacts with Rab GTPases, localized to late Golgi vesicles; computational ...eracts with Rab GTPases, localized to late Golgi vesicles; computational analysis of large-scale protein-pro...ized to late Golgi vesicles; computational analysis of large-scale protein-protein interaction data suggests

  2. Yeast Interacting Proteins Database: YJR091C, YKL113C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...r of the Puf family of RNA-binding proteins, interacts with mRNAs encoding membrane-associated proteins; involved in localizing

  3. Yeast Interacting Proteins Database: YJR091C, YDR389W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...d proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes increased sensi...scription Member of the Puf family of RNA-binding proteins, interacts with mRNAs encoding membrane-associate

  4. Yeast Interacting Proteins Database: YJR091C, YDL147W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...g proteins, interacts with mRNAs encoding membrane-associated proteins; involved in localizing the Arp2/3 co

  5. Yeast Interacting Proteins Database: YOR037W, YCL056C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available (GFP)-fusion protein localizes to the cytoplasm in a punctate pattern; null mutant displays decreased thermo...e pattern; null mutant displays decreased thermotolerance Rows with this prey as prey Rows with this prey as... of unknown function; green fluorescent protein (GFP)-fusion protein localizes to the cytoplasm in a punctat

  6. Yeast Interacting Proteins Database: YGL237C, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding prote... expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein

  7. Yeast Interacting Proteins Database: YOR358W, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; act...rotein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regulator o

  8. Yeast Interacting Proteins Database: YGL127C, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ith protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regula...rotein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors

  9. Yeast Interacting Proteins Database: YLR295C, YDR510W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available 7) Rows with this bait as prey (0) YDR510W SMT3 Ubiquitin-like protein of the SUMO family, conjugated to lys...uitin-like protein of the SUMO family, conjugated to lysine residues of target proteins; regulates chromatid

  10. PPT-DB: the protein property prediction and testing database.

    Science.gov (United States)

    Wishart, David S; Arndt, David; Berjanskii, Mark; Guo, An Chi; Shi, Yi; Shrivastava, Savita; Zhou, Jianjun; Zhou, You; Lin, Guohui

    2008-01-01

    The protein property prediction and testing database (PPT-DB) is a database housing nearly 30 carefully curated databases, each of which contains commonly predicted protein property information. These properties include both structural (i.e. secondary structure, contact order, disulfide pairing) and dynamic (i.e. order parameters, B-factors, folding rates) features that have been measured, derived or tabulated from a variety of sources. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable and easily queried repository of predictable or 'derived' protein property data. In this role, PPT-DB can serve as a one-stop, fully standardized repository for developers to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish to create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users may query PPT-DB with a sequence of interest and have a specific property predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. PPT-DB exploits the well-known fact that protein structure and dynamic properties are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 85-95% correct (for categorical predictions, such as secondary structure) or exhibit correlations of >0.80 (for numeric predictions, such as accessible surface area). This performance is 10-20% better than what is typically obtained from standard 'ab initio' predictions. PPT-DB, its prediction utilities and all of its contents are available at http://www.pptdb.ca.

  11. Composition of Overlapping Protein-Protein and Protein-Ligand Interfaces.

    Directory of Open Access Journals (Sweden)

    Ruzianisra Mohamed

    Full Text Available Protein-protein interactions (PPIs play a major role in many biological processes and they represent an important class of targets for therapeutic intervention. However, targeting PPIs is challenging because often no convenient natural substrates are available as starting point for small-molecule design. Here, we explored the characteristics of protein interfaces in five non-redundant datasets of 174 protein-protein (PP complexes, and 161 protein-ligand (PL complexes from the ABC database, 436 PP complexes, and 196 PL complexes from the PIBASE database and a dataset of 89 PL complexes from the Timbal database. In all cases, the small molecule ligands must bind at the respective PP interface. We observed similar amino acid frequencies in all three datasets. Remarkably, also the characteristics of PP contacts and overlapping PL contacts are highly similar.

  12. Yeast Interacting Proteins Database: YER179W, YER179W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YER179W DMC1 Meiosis-specific protein required for repair of double-strand breaks and pairing... double-strand breaks and pairing between homologous chromosomes; homolog of Rad5...ific protein required for repair of double-strand breaks and pairing between homo...name DMC1 Prey description Meiosis-specific protein required for repair of double-strand breaks and pairin

  13. Yeast Interacting Proteins Database: YPL003W, YPR066W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YPL003W ULA1 Protein that acts together with Uba3p to activate Rub1p before its con... (1) Rows with this bait as prey (0) YPR066W UBA3 Protein that acts together with Ula1p to activate Rub1p before...it gene name ULA1 Bait description Protein that acts together with Uba3p to activate Rub1p before...together with Ula1p to activate Rub1p before its conjugation to proteins (neddylation), which may play a rol

  14. Yeast Interacting Proteins Database: YMR077C, YLR417W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available lumen; cytoplasmic protein recruited to endosomal membranes Rows with this bait as bait (3) Rows with this b...-dependent sorting of proteins into the endosome Rows with this prey as prey (3) Rows with this prey as bait...oplasmic protein recruited to endosomal membranes Rows with this bait as bait Row...s with this bait as bait (3) Rows with this bait as prey Rows with this bait as prey (0) Prey ORF YLR417W Pr... proteins into the endosome Rows with this prey as prey Rows with this prey as prey (3) Row

  15. Yeast Interacting Proteins Database: YNL086W, YGL172W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ion protein localizes to endosomes Rows with this bait as bait (3) Rows with this bait as prey (2) YGL172W N...clear export of ribosomes Rows with this prey as prey (3) Rows with this prey as ...n Putative protein of unknown function; green fluorescent protein (GFP)-fusion protein localizes to endosomes Row...s with this bait as bait Rows with this bait as bait (3) Rows with this bait as prey Row...he Nsp1p-Nup57p-Nup49p-Nic96p subcomplex of the nuclear pore complex (NPC), required for nuclear export of ribosomes Row

  16. Yeast Interacting Proteins Database: YMR077C, YJR102C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available lumen; cytoplasmic protein recruited to endosomal membranes Rows with this bait as bait (3) Rows with this b...d in ubiquitin-dependent sorting of proteins into the endosome Rows with this prey as prey (3) Rows with thi...lar lumen; cytoplasmic protein recruited to endosomal membranes Rows with this bait as bait Rows with this bait as bait (3) Row...s with this bait as prey Rows with this bait as prey (0) Prey ...iquitin-dependent sorting of proteins into the endosome Rows with this prey as prey Rows with this prey as prey (3) Row

  17. Yeast Interacting Proteins Database: YKL103C, YKL103C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available he peptidase family M18; often used as a marker protein in studies of autophagy and cytosol to vacuole targe...; often used as a marker protein in studies of autophagy and cytosol to vacuole targeting (CVT) pathway Rows...e yscI; zinc metalloproteinase that belongs to the peptidase family M18; often used as a marker protein in studies...t belongs to the peptidase family M18; often used as a marker protein in studies of autophagy and cytosol to

  18. Yeast Interacting Proteins Database: YDR425W, YGL198W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available with this bait as prey (0) YGL198W YIP4 Protein that interacts with Rab GTPases, localized to late Golgi vesicles; computation...IP4 Prey description Protein that interacts with Rab GTPases, localized to late Golgi vesicles; computatio

  19. Yeast Interacting Proteins Database: YMR025W, YGR120C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available or removal of the ubiquitin-like protein Rub1p from Cdc53p (cullin); involved in adaptation to pheromone sig... (cullin); involved in adaptation to pheromone signaling Rows with this bait as bait Rows with this bait as ...signalosome, which is required for deneddylation, or removal of the ubiquitin-like protein Rub1p from Cdc53p

  20. Yeast Interacting Proteins Database: YKR100C, YDL100C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YKR100C SKG1 Transmembrane protein with a role in cell wall polymer composition; lo...position; localizes on the inner surface of the plasma membrane at the bud and in t...RF YKR100C Bait gene name SKG1 Bait description Transmembrane protein with a role in cell wall polymer com

  1. Yeast Interacting Proteins Database: YER081W, YDR194C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDR194C MSS116 DEAD-box protein required for efficient splicing of mitochondrial Group I and II introns; non...e name MSS116 Prey description DEAD-box protein required for efficient splicing of mitochondrial Group I and

  2. Yeast Interacting Proteins Database: YMR047C, YDR229W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available R229W IVY1 Phospholipid-binding protein that interacts with both Ypt7p and Vps33p, may partially...holipid-binding protein that interacts with both Ypt7p and Vps33p, may partially counteract the action of Vp

  3. Yeast Interacting Proteins Database: YHR180W, YDL100C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available on Dubious open reading frame unlikely to encode a protein, based on available experimental and comparativ...YHR180W - Dubious open reading frame unlikely to encode a protein, based on available experimental and compa...rative sequence data Rows with this bait as bait (1) Rows with this bait as prey (0

  4. Yeast Interacting Proteins Database: YDR271C, YOR128C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available n Dubious open reading frame unlikely to encode a protein, based on available experimental and comparative...YDR271C - Dubious open reading frame unlikely to encode a protein, based on available experimental and compa...rative sequence data; partially overlaps the verified ORF CCC2/YDR270W Rows with th

  5. Yeast Interacting Proteins Database: YOR264W, YCR086W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YOR264W DSE3 Daughter cell-specific protein, may help establish daughter fate Rows ...0 42 - Show YOR264W Bait ORF YOR264W Bait gene name DSE3 Bait description Daughter cell-specific protein, ma

  6. Yeast Interacting Proteins Database: YML064C, YBR072W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available th this bait as prey (0) YBR072W HSP26 Small heat shock protein (sHSP) with chaperone activity; forms hollow...tein (sHSP) with chaperone activity; forms hollow, sphere-shaped oligomers that suppress unfolded proteins a

  7. Yeast Interacting Proteins Database: YKR007W, YGR163W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available (1) YGR163W GTR2 Putative GTP binding protein that negatively regulates Ran/Tc4 ...163W Prey gene name GTR2 Prey description Putative GTP binding protein that negatively regulates Ran/Tc4 GTP

  8. Yeast Interacting Proteins Database: YDL167C, YBR212W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available as bait (2) Rows with this bait as prey (0) YBR212W NGR1 RNA binding protein that negatively regulates grow...ption RNA binding protein that negatively regulates growth rate; interacts with the 3' UTR of the mitochondr

  9. Yeast Interacting Proteins Database: YNL189W, YKL103C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available that belongs to the peptidase family M18; often used as a marker protein in studies of autophagy and cytoso...amily M18; often used as a marker protein in studies of autophagy and cytosol to vacuole targeting (CVT) pat

  10. Yeast Interacting Proteins Database: YDL239C, YKL103C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available that belongs to the peptidase family M18; often used as a marker protein in studies of autophagy and cytosol...ily M18; often used as a marker protein in studies of autophagy and cytosol to vacuole targeting (CVT) pathw

  11. Yeast Interacting Proteins Database: YDR311W, YKL103C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ngs to the peptidase family M18; often used as a marker protein in studies of aut...ase that belongs to the peptidase family M18; often used as a marker protein in studies of autophagy and cyt

  12. Yeast Interacting Proteins Database: YPL105C, YDR429C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available co-purification experiments; authentic, non-tagged protein is detected in highly purified mitochondria in high-throughput studies...entic, non-tagged protein is detected in highly purified mitochondria in high-throughput studies Rows with t

  13. Yeast Interacting Proteins Database: YPL002C, YLR417W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ndent sorting of proteins into the endosome; appears to be functionally related to SNF7; involved in glucose...ESCRT-II complex, which is involved in ubiquitin-dependent sorting of proteins into the endosome; appears to be functionally

  14. Yeast Interacting Proteins Database: YOR111W, YDL161W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available with this bait as prey (2) YDL161W ENT1 Epsin-like protein involved in endocytosis and actin patch assembly and functionally...-like protein involved in endocytosis and actin patch assembly and functionally redundant with Ent2p; binds

  15. Yeast Interacting Proteins Database: YHR114W, YDR422C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available substrate specificity; vacuolar protein containing KIS (Kinase-Interacting Sequence) and ASC (Association w...strate specificity; vacuolar protein containing KIS (Kinase-Interacting Sequence) and ASC (Association with ...e 4 CuraGen (0 or 1) 0 S. Fields (0 or 1) 0 Association (0 or 1,YPD) 0 Complex (0

  16. Yeast Interacting Proteins Database: YDL089W, YDR233C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rDNA repeat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein...DNA repeat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein l

  17. Yeast Interacting Proteins Database: YDL089W, YPR028W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rDNA repeat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein...ility; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein localizes to the

  18. Yeast Interacting Proteins Database: YJR091C, YDL013W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...h mRNAs encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; ov

  19. Yeast Interacting Proteins Database: YJR091C, YHR026W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpre...s with mRNAs encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondri

  20. Yeast Interacting Proteins Database: YBR108W, YDR388W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YBR108W AIM3 Protein interacting with Rvs167p; null mutant is viable and displays e...l mutant is viable and displays elevated frequency of mitochondrial genome loss R...8 - Show YBR108W Bait ORF YBR108W Bait gene name AIM3 Bait description Protein interacting with Rvs167p; nul

  1. Yeast Interacting Proteins Database: YBR108W, YGR136W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YBR108W AIM3 Protein interacting with Rvs167p; null mutant is viable and displays e...w YBR108W Bait ORF YBR108W Bait gene name AIM3 Bait description Protein interacting with Rvs167p; null mutant is viable and display

  2. Yeast Interacting Proteins Database: YMR280C, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available olved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensor... glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, an

  3. Yeast Interacting Proteins Database: YOR302W, YOR047C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rol of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt...tein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt1

  4. Yeast Interacting Proteins Database: YNL189W, YDR510W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ait as prey (0) YDR510W SMT3 Ubiquitin-like protein of the SUMO family, conjugated...s prey (0) Prey ORF YDR510W Prey gene name SMT3 Prey description Ubiquitin-like protein of the SUMO family, conjugated

  5. Yeast Interacting Proteins Database: YKL043W, YDR510W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available s prey (0) YDR510W SMT3 Ubiquitin-like protein of the SUMO family, conjugated to ... Prey ORF YDR510W Prey gene name SMT3 Prey description Ubiquitin-like protein of the SUMO family, conjugated

  6. Yeast Interacting Proteins Database: YLR295C, YJR083C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available 7) Rows with this bait as prey (0) YJR083C ACF4 Protein of unknown function, computational analysis of large-scale...me ACF4 Prey description Protein of unknown function, computational analysis of large-scale

  7. Yeast Interacting Proteins Database: YHR129C, YMR294W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YHR129C ARP1 Actin-related protein of the dynactin complex; required for spindle orientation...ein of the dynactin complex; required for spindle orientation and nuclear migrati...PD) 1 Alternative path with 2 intervening proteins (YPD) 2 IST hit 21 IST hit in the opposite bait/prey orientation 7 ...

  8. Yeast Interacting Proteins Database: YJR008W, YHR129C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available the dynactin complex; required for spindle orientation and nuclear migration; put...dynactin complex; required for spindle orientation and nuclear migration; putative ortholog of mammalian cen...ervening protein (YPD) 0 Alternative path with 2 intervening proteins (YPD) 0 IST hit 3 IST hit in the opposite bait/prey orientation - ...

  9. ARAMEMNON, a Novel Database for Arabidopsis Integral Membrane Proteins1

    Science.gov (United States)

    Schwacke, Rainer; Schneider, Anja; van der Graaff, Eric; Fischer, Karsten; Catoni, Elisabetta; Desimone, Marcelo; Frommer, Wolf B.; Flügge, Ulf-Ingo; Kunze, Reinhard

    2003-01-01

    A specialized database (DB) for Arabidopsis membrane proteins, ARAMEMNON, was designed that facilitates the interpretation of gene and protein sequence data by integrating features that are presently only available from individual sources. Using several publicly available prediction programs, putative integral membrane proteins were identified among the approximately 25,500 proteins in the Arabidopsis genome DBs. By averaging the predictions from seven programs, approximately 6,500 proteins were classified as transmembrane (TM) candidate proteins. Some 1,800 of these contain at least four TM spans and are possibly linked to transport functions. The ARAMEMNON DB enables direct comparison of the predictions of seven different TM span computation programs and the predictions of subcellular localization by eight signal peptide recognition programs. A special function displays the proteins related to the query and dynamically generates a protein family structure. As a first set of proteins from other organisms, all of the approximately 700 putative membrane proteins were extracted from the genome of the cyanobacterium Synechocystis sp. and incorporated in the ARAMEMNON DB. The ARAMEMNON DB is accessible at the URL http://aramemnon.botanik.uni-koeln.de. PMID:12529511

  10. Benchmarking NMR experiments: a relational database of protein pulse sequences.

    Science.gov (United States)

    Senthamarai, Russell R P; Kuprov, Ilya; Pervushin, Konstantin

    2010-03-01

    Systematic benchmarking of multi-dimensional protein NMR experiments is a critical prerequisite for optimal allocation of NMR resources for structural analysis of challenging proteins, e.g. large proteins with limited solubility or proteins prone to aggregation. We propose a set of benchmarking parameters for essential protein NMR experiments organized into a lightweight (single XML file) relational database (RDB), which includes all the necessary auxiliaries (waveforms, decoupling sequences, calibration tables, setup algorithms and an RDB management system). The database is interfaced to the Spinach library (http://spindynamics.org), which enables accurate simulation and benchmarking of NMR experiments on large spin systems. A key feature is the ability to use a single user-specified spin system to simulate the majority of deposited solution state NMR experiments, thus providing the (hitherto unavailable) unified framework for pulse sequence evaluation. This development enables predicting relative sensitivity of deposited implementations of NMR experiments, thus providing a basis for comparison, optimization and, eventually, automation of NMR analysis. The benchmarking is demonstrated with two proteins, of 170 amino acids I domain of alphaXbeta2 Integrin and 440 amino acids NS3 helicase.

  11. Development of Deduced Protein Database Using Variable Bit Binary Encoding

    Directory of Open Access Journals (Sweden)

    B. Parvathavarthini

    2008-01-01

    Full Text Available A large amount of biological data is semi-structured and stored in any one the following file formats such as flat, XML and relational files. These databases must be integrated with the structured data available in relational or object-oriented databases. The sequence matching process is difficult in such file format, because string comparison takes more computation cost and time. To reduce the memory storage size of amino acid sequence in protein database, a novel probability-based variable bit length encoding technique has been introduced. The number of mapping of triplet CODON for every amino acid evaluates the probability value. Then, a binary tree has been constructed to assign unique bits of binary codes to each amino acid. This derived unique bit pattern of amino acid replaces the existing fixed byte representation. The proof of reduced protein database space has been discussed and it is found to be reduced between 42.86 to 87.17%. To validate our method, we have collected few amino acid sequences of major organisms like Sheep, Lambda phage and etc from NCBI and represented them using proposed method. The comparison shows that of minimum and maximum reduction in storage space are 43.30% and 72.86% respectively. In future the biological data can further be reduced by applying lossless compression on this deduced data.

  12. A protein domain interaction interface database: InterPare

    Directory of Open Access Journals (Sweden)

    Lee Jungsul

    2005-08-01

    Full Text Available Abstract Background Most proteins function by interacting with other molecules. Their interaction interfaces are highly conserved throughout evolution to avoid undesirable interactions that lead to fatal disorders in cells. Rational drug discovery includes computational methods to identify the interaction sites of lead compounds to the target molecules. Identifying and classifying protein interaction interfaces on a large scale can help researchers discover drug targets more efficiently. Description We introduce a large-scale protein domain interaction interface database called InterPare http://interpare.net. It contains both inter-chain (between chains interfaces and intra-chain (within chain interfaces. InterPare uses three methods to detect interfaces: 1 the geometric distance method for checking the distance between atoms that belong to different domains, 2 Accessible Surface Area (ASA, a method for detecting the buried region of a protein that is detached from a solvent when forming multimers or complexes, and 3 the Voronoi diagram, a computational geometry method that uses a mathematical definition of interface regions. InterPare includes visualization tools to display protein interior, surface, and interaction interfaces. It also provides statistics such as the amino acid propensities of queried protein according to its interior, surface, and interface region. The atom coordinates that belong to interface, surface, and interior regions can be downloaded from the website. Conclusion InterPare is an open and public database server for protein interaction interface information. It contains the large-scale interface data for proteins whose 3D-structures are known. As of November 2004, there were 10,583 (Geometric distance, 10,431 (ASA, and 11,010 (Voronoi diagram entries in the Protein Data Bank (PDB containing interfaces, according to the above three methods. In the case of the geometric distance method, there are 31,620 inter-chain domain

  13. Yeast Interacting Proteins Database: YJR091C, YMR067C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes...olved in localizing the Arp2/3 complex to mitochondria; overexpression causes increased sensitivity to benom

  14. Yeast Interacting Proteins Database: YMR154C, YLR025W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available the multivesicular body (MVB) pathway; recruited from the cytoplasm to endosomal membranes Rows with this pr...mbrane proteins into the multivesicular body (MVB) pathway; recruited from the cytoplasm to endosomal membranes

  15. Yeast Interacting Proteins Database: YER081W, YDR105C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDR105C TMS1 Vacuolar membrane protein of unknown function that is conserved in mammals; predicted to contai...tion that is conserved in mammals; predicted to contain eleven transmembrane heli

  16. Yeast Interacting Proteins Database: YMR047C, YDL065C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available L065C PEX19 Chaperone and import receptor for newly-synthesized class I peroxisom...scription Chaperone and import receptor for newly-synthesized class I peroxisomal membrane proteins (PMPs),

  17. Yeast Interacting Proteins Database: YCL029C, YER016W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rotubules and kinetochore, involved in sister chromatid separation; essential in polyploid cells but not in ...le-associated protein, component of the interface between microtubules and kinetochore, involved in sister chromatid separation

  18. Yeast Interacting Proteins Database: YCL032W, YLR423C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YCL032W STE50 Protein involved in mating response, invasive/filamentous growth, and...lved in mating response, invasive/filamentous growth, and osmotolerance, acts as an adaptor that links G pro

  19. Yeast Interacting Proteins Database: YPL114W, YMR133W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available iotic recombination; possibly involved in the coordination of recombination and m...tion Protein involved in early stages of meiotic recombination; possibly involved in the coordination of rec

  20. Yeast Interacting Proteins Database: YCL046W, YGL115W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available le experimental and comparative sequence data; partially overlaps the uncharacterized ORF YCL045C Rows with ...ading frame unlikely to encode a protein, based on available experimental and comparative sequence data; partially

  1. Yeast Interacting Proteins Database: YLR295C, YOL050C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available on available experimental and comparative sequence data; overlaps verified gene G...en reading frame unlikely to encode a protein, based on available experimental and comparative sequence data

  2. Yeast Interacting Proteins Database: YHR114W, YJL086C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available available experimental and comparative sequence data; partially overlaps the verified genes YJL085W/EXO70 a... reading frame unlikely to encode a protein, based on available experimental and comparative sequence data;

  3. Yeast Interacting Proteins Database: YJR091C, YDR008C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ailable experimental and comparative sequence data Rows with this prey as prey (1) Rows with this prey as ba... - Prey description Dubious open reading frame unlikely to encode a protein, based on available experimental and comparative

  4. Yeast Interacting Proteins Database: YKL002W, YLR423C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthes... into lumenal vesicles of multivesicular bodies, and for delivery of newly synthesized vacuolar enzymes to t

  5. Yeast Interacting Proteins Database: YFL003C, YDL154W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YFL003C MSH4 Protein involved in meiotic recombination, required for normal levels of crossing...in involved in meiotic recombination, required for normal levels of crossing over, colocalizes with Zip2p to

  6. Yeast Interacting Proteins Database: YJR091C, YOR265W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available encoding membrane-associated proteins; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes...ns; involved in localizing the Arp2/3 complex to mitochondria; overexpression causes increased sensitivity t

  7. Yeast Interacting Proteins Database: YKL166C, YIL033C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available tein kinase (PKA), a component of a signaling pathway that controls a variety of cellular processes, includi...dependent protein kinase (PKA), a component of a signaling pathway that controls a variety of cellular processes

  8. Yeast Interacting Proteins Database: YJL164C, YIL033C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available tein kinase (PKA), a component of a signaling pathway that controls a variety of cellular processes, includi...dependent protein kinase (PKA), a component of a signaling pathway that controls a variety of cellular processes

  9. Yeast Interacting Proteins Database: YBR239C, YPL133C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available cytoplasm and nucleus; null mutation affects periodicity of transcriptional and metabolic oscillation; play...ion; GFP-fusion protein localizes to the cytoplasm and nucleus; null mutation affects

  10. Yeast Interacting Proteins Database: YOR014W, YNL042W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available P3 Protein of unknown function, potential Cdc28p substrate; overproduction confers resistance to methylmercury...l Cdc28p substrate; overproduction confers resistance to methylmercury Rows with this prey as prey Rows with

  11. Yeast Interacting Proteins Database: YDR026C, YDL030W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDR026C - Protein of unknown function that may interact with ribosomes, based on co-purification...ein of unknown function that may interact with ribosomes, based on co-purification

  12. Yeast Interacting Proteins Database: YNL311C, YKL001C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YNL311C - Protein of unknown function that may interact with ribosomes, based on co-purification...nknown function that may interact with ribosomes, based on co-purification experi

  13. Yeast Interacting Proteins Database: YLR319C, YGL015C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YLR319C BUD6 Actin- and formin-interacting protein, involved in actin cable nucleation and polarized...in actin cable nucleation and polarized cell growth; isolated as bipolar budding mutant; potential Cdc28p su

  14. Yeast Interacting Proteins Database: YHR113W, YOL082W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available cargo proteins aminopeptidase I (Lap4p) and alpha-mannosidase (Ams1p) to the phagophore assembly site for packaging...vt) pathway; delivers cargo proteins aminopeptidase I (Lap4p) and alpha-mannosidase (Ams1p) to the phagophore assembly site for packa...ging into Cvt vesicles Rows with this prey as prey (6) Rows with this prey as bait

  15. Yeast Interacting Proteins Database: YCL032W, YLR362W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YCL032W STE50 Protein involved in mating response, invasive/filamentous growth, and...STE11 Signal transducing MEK kinase involved in pheromone response and pseudohyphal/invasive...2W Bait gene name STE50 Bait description Protein involved in mating response, invasive/filamentous growth, a... STE11 Prey description Signal transducing MEK kinase involved in pheromone response and pseudohyphal/invasive

  16. Yeast Interacting Proteins Database: YLR362W, YCL032W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YLR362W STE11 Signal transducing MEK kinase involved in pheromone response and pseudohyphal/invasive...ait as prey (1) YCL032W STE50 Protein involved in mating response, invasive/filam...2W Bait gene name STE11 Bait description Signal transducing MEK kinase involved in pheromone response and pseudohyphal/invasive...F YCL032W Prey gene name STE50 Prey description Protein involved in mating response, invasive/filamentous gr

  17. Yeast Interacting Proteins Database: YGR058W, YGR136W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available main; binds Las17p, which is a homolog of human Wiskott-Aldrich Syndrome protein involved in actin... patch assembly and actin polymerization Rows with this prey as prey (4) Rows with this pre...erminal SH3 domain; binds Las17p, which is a homolog of human Wiskott-Aldrich Syndrome protein involved in actin... patch assembly and actin polymerization Rows with this prey as prey Rows with this prey as prey (4) Row

  18. Yeast Interacting Proteins Database: YHR114W, YJL180C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YHR114W BZZ1 SH3 domain protein implicated in the regulation of actin polymerization, able to recruit actin... polymerization machinery through its SH3 domains, colocalizes with cortical actin p...YHR114W Bait gene name BZZ1 Bait description SH3 domain protein implicated in the regulation of actin polyme...rization, able to recruit actin polymerization machinery through its SH3 domains, colocalizes with cortical actin

  19. Yeast Interacting Proteins Database: YMR294W, YHR129C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available complex; required for spindle orientation and nuclear migration; putative ortholo...PD) 1 Alternative path with 2 intervening proteins (YPD) 2 IST hit 7 IST hit in the opposite bait/prey orientation 21 ... ...ne name ARP1 Prey description Actin-related protein of the dynactin complex; required for spindle orientat...ion and nuclear migration; putative ortholog of mammalian centractin Rows with this

  20. CREDO: a protein-ligand interaction database for drug discovery.

    Science.gov (United States)

    Schreyer, Adrian; Blundell, Tom

    2009-02-01

    Harnessing data from the growing number of protein-ligand complexes in the Protein Data Bank is an important task in drug discovery. In order to benefit from the abundance of three-dimensional structures, structural data must be integrated with sequence as well as chemical data and the protein-small molecule interactions characterized structurally at the inter-atomic level. In this study, we present CREDO, a new publicly available database of protein-ligand interactions, which represents contacts as structural interaction fingerprints, implements novel features and is completely scriptable through its application programming interface. Features of CREDO include implementation of molecular shape descriptors with ultrafast shape recognition, fragmentation of ligands in the Protein Data Bank, sequence-to-structure mapping and the identification of approved drugs. Selected analyses of these key features are presented to highlight a range of potential applications of CREDO. The CREDO dataset has been released into the public domain together with the application programming interface under a Creative Commons license at http://www-cryst.bioc.cam.ac.uk/credo. We believe that the free availability and numerous features of CREDO database will be useful not only for commercial but also for academia-driven drug discovery programmes.

  1. Exploring Protein Function Using the Saccharomyces Genome Database.

    Science.gov (United States)

    Wong, Edith D

    2017-01-01

    Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.

  2. MannDB: A microbial annotation database for protein characterization

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, C; Lam, M; Smith, J; Zemla, A; Dyer, M; Kuczmarski, T; Vitalis, E; Slezak, T

    2006-05-19

    MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high

  3. Yeast Interacting Proteins Database: YNL056W, YNL032W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available a1p and Siw14p; green fluorescent protein (GFP)-fusion protein localizes to the cytoplasm; YNL056W is not an essential gene Row...s with this bait as bait (2) Rows with this bait as prey (0) YNL032W SIW14 Tyrosine phosp...hatase that plays a role in actin filament organization and endocytosis; localized to the cytoplasm Row...s with this prey as prey (2) Rows with this prey as bait (0) 6 8 4 14 1 0 0 0 0 - - - ...in (GFP)-fusion protein localizes to the cytoplasm; YNL056W is not an essential gene Rows with this bait as bait Row

  4. Yeast Interacting Proteins Database: YDL108W, YGL134W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available TFIIH; involved in transcription initiation at RNA polymerase II promoters Rows with this bait as bait (1) Row...ependent protein kinase to its substrate Rows with this prey as prey (2) Rows wit... transcription initiation at RNA polymerase II promoters Rows with this bait as bait Row...s with this bait as bait (1) Rows with this bait as prey Rows with this bait as prey (0) Prey ORF YGL...endent protein kinase to its substrate Rows with this prey as prey Rows with this prey as prey (2) Row

  5. Yeast Interacting Proteins Database: YBR170C, YGR048W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available the proteasome for degradation Rows with this bait as bait (2) Rows with this bait as prey (1) YGR048W UFD1...roteins from the ER to the cytosol Rows with this prey as prey (1) Rows with this prey as bait (1) 15 10 5 1...hat recognizes ubiquitinated proteins in the endoplasmic reticulum and delivers them to the proteasome for degradation Row...s with this bait as bait Rows with this bait as bait (2) Rows with this bait as prey Row...esentation to the 26S proteasome for degradation; involved in transporting proteins from the ER to the cytosol Row

  6. Yeast Interacting Proteins Database: YOR059C, YGL053W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YOR059C - Hypothetical protein Rows with this bait as bait (1) Rows with this bait ...nce, a motif involved in COPII binding; forms a complex with Prp9p in the ER; member of DUP240 gene family Row...s with this prey as prey (1) Rows with this prey as bait (0) 5 6 3 4 0 0 0 0 0 ...- - - - - 0 0 5 - Show YOR059C Bait ORF YOR059C Bait gene name - Bait description Hypothetical protein Rows ...with this bait as bait Rows with this bait as bait (1) Rows with this bait as prey Rows with this bait as pr

  7. Yeast Interacting Proteins Database: YGR048W, YBR170C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available sporting proteins from the ER to the cytosol Rows with this bait as bait (1) Rows with this bait as prey (1)... to the proteasome for degradation Rows with this prey as prey (1) Rows with this prey as bait (2) 10 15 5 1...m the ER to the cytosol Rows with this bait as bait Rows with this bait as bait (1) Row...s with this bait as prey Rows with this bait as prey (1) Prey ORF YBR170C Prey gene name NPL4 Prey des...gnizes ubiquitinated proteins in the endoplasmic reticulum and delivers them to the proteasome for degradation Row

  8. Yeast Interacting Proteins Database: YNL189W, YEL066W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available may also play a role in regulation of protein degradation Rows with this bait as bait (55) Rows with this b...is last product liberated; similar to Hpa2p, acetylates histones weakly in vitro Rows with this prey as prey (3) Row...bstrate during import; may also play a role in regulation of protein degradation Row...s with this bait as bait Rows with this bait as bait (55) Rows with this bait as prey Rows with this bait...nd and CoA is last product liberated; similar to Hpa2p, acetylates histones weakly in vitro Row

  9. Yeast Interacting Proteins Database: YNR006W, YHL002W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available has Ubiquitin Interaction Motifs which bind ubiquitin (Ubi4p) Rows with this bait as bait (1) Rows with this..., as well as for recycling of Golgi proteins and formation of lumenal membranes Rows with this prey as prey (1) Row...ined for degradation; has Ubiquitin Interaction Motifs which bind ubiquitin (Ubi4p) Row...s with this bait as bait Rows with this bait as bait (1) Rows with this bait as prey Rows with this ba...degradation, as well as for recycling of Golgi proteins and formation of lumenal membranes Row

  10. Yeast Interacting Proteins Database: YBR187W, YNR032W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available st a possible role in ribosome biogenesis Rows with this bait as bait (1) Rows with this bait as prey (0) YN...ccumulation; interacts with Tap42p, which binds to and regulates other protein phosphatases Rows with this prey as prey (2) Row... and physical interactions suggest a possible role in ribosome biogenesis Rows with this bait as bait Rows w...ith this bait as bait (1) Rows with this bait as prey Rows with this bait as prey...quired for glycogen accumulation; interacts with Tap42p, which binds to and regulates other protein phosphatases Row

  11. Yeast Interacting Proteins Database: YNL020C, YGR241C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available actin cytoskeleton; involved in control of endocytosis Rows with this bait as bait (1) Rows with this bait ... to Yap1801p, member of the AP180 protein family Rows with this prey as prey (1) Row...skeleton; involved in control of endocytosis Rows with this bait as bait Rows wit...h this bait as bait (1) Rows with this bait as prey Rows with this bait as prey (0) Prey ORF YGR241C Prey ge...ogous to Yap1801p, member of the AP180 protein family Rows with this prey as prey Row

  12. Yeast Interacting Proteins Database: YJR055W, YPL193W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YJR055W HIT1 Protein of unknown function, required for growth at high temperature Row...s with this bait as bait (1) Rows with this bait as prey (0) YPL193W RSA1 Protein involved in the assembly... of 60S ribosomal subunits; functionally interacts with Dbp6p; functions in a late nucleoplasmic step of the assembly Row...s with this prey as prey (1) Rows with this prey as bait (0) 6 5 2 2...unknown function, required for growth at high temperature Rows with this bait as bait Rows with this bait as bait (1) Row

  13. Yeast Interacting Proteins Database: YDL239C, YPL255W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...it as prey (1) YPL255W BBP1 Protein required for the spindle pole body (SPB) dupl...ows with this prey as bait (0) 4 8 3 4 0 0 0 0 0 - - - - - 0 0 7 - Show YDL239C Bait ORF YDL...ediate assembly of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindl

  14. Yeast Interacting Proteins Database: YJL070C, YDR504C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ired for growth on nonfermentable carbon sources Rows with this prey as prey (1) Rows with this prey as bait...on Protein required for survival at high temperature during stationary phase; not required for growth on nonfermentable carbon source

  15. Yeast Interacting Proteins Database: YGR173W, YDR152W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ding protein Rows with this bait as bait (1) Rows with this bait as prey (0) YDR152W GIR2 Highly-acidic cyto...ws with this bait as prey Rows with this bait as prey (0) Prey ORF YDR152W Prey gene name GIR2 Prey description Highly

  16. Yeast Interacting Proteins Database: YGR196C, YBR260C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YGR196C FYV8 Protein of unknown function, required for survival upon exposure to K1...n, required for survival upon exposure to K1 killer toxin Rows with this bait as bait Rows with this bait as

  17. Yeast Interacting Proteins Database: YJR091C, YLR156W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ion with Jsn1p in a large-scale analysis Rows with this prey as prey (1) Rows with this prey as bait (0) 7 5...scription Putative protein of unknown function; exhibits a two-hybrid interaction with Jsn1p in a large-scale

  18. Yeast Interacting Proteins Database: YGR119C, YDL065C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available this bait as bait (5) Rows with this bait as prey (0) YDL065C PEX19 Chaperone and import receptor for newly-synthesized...y description Chaperone and import receptor for newly-synthesized class I peroxisomal membrane proteins (PMP

  19. Yeast Interacting Proteins Database: YGR218W, YDL065C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available PEX19 Chaperone and import receptor for newly-synthesized class I peroxisomal membrane proteins (PMPs), bin...ith this bait as prey (0) Prey ORF YDL065C Prey gene name PEX19 Prey description Chaperone and import receptor for newly-synthesized

  20. Yeast Interacting Proteins Database: YKL002W, YFL034C-B [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available eral genes involved in cell separation; Mob1p-like protein Rows with this prey as prey (2) Rows with this pr...tes the Ace2p in the daughter cell nucleus to direct daughter cell-specific transcription of several genes involved in cell separatio

  1. Yeast Interacting Proteins Database: YIR016W, YFL034C-B [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available the daughter cell nucleus to direct daughter cell-specific transcription of several genes involved in cell separation...ter cell-specific transcription of several genes involved in cell separation; Mob1p-like protein Rows with t

  2. Yeast Interacting Proteins Database: YKL103C, YOL082W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available go proteins aminopeptidase I (Lap4p) and alpha-mannosidase (Ams1p) to the phagophore assembly site for packaging...-mannosidase (Ams1p) to the phagophore assembly site for packaging into Cvt vesicles Rows with this prey as

  3. Yeast Interacting Proteins Database: YER081W, YPR136C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available mental and comparative sequence data; partially overlaps verified ORF RRP9 Rows w...tially overlaps verified ORF RRP9 Rows with this prey as...YPR136C - Dubious open reading frame unlikely to encode a protein, based on available experimental and comparative sequence data; par

  4. Yeast Interacting Proteins Database: YNL041C, YDR229W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available pholipid-binding protein that interacts with both Ypt7p and Vps33p, may partially...teracts with both Ypt7p and Vps33p, may partially counteract the action of Vps33p and vice versa, localizes

  5. Yeast Interacting Proteins Database: YEL005C, YGL079W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available endosome; identified as a transcriptional activator in a high-throughput yeast one-hybrid assay Rows with th...protein localizes to the endosome; identified as a transcriptional activator in a high-throughput yeast one-hybrid assay

  6. Yeast Interacting Proteins Database: YIL007C, YOR117W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YIL007C NAS2 Proteasome-interacting protein involved in the assembly of the base su... - - - - - 0 0 3 4 Show YIL007C Bait ORF YIL007C Bait gene name NAS2 Bait description Proteasome-interacti

  7. Yeast Interacting Proteins Database: YNL189W, YGL221C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available urified mitochondria in high-throughput studies Rows with this prey as prey (2) Rows with this prey as bait ...a factor (rpoD gene product); the authentic, non-tagged protein is detected in highly purified mitochondria in high-throughput studie

  8. Yeast Interacting Proteins Database: YDL044C, YLR386W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available of mitochondrial RNA polymerase (Rpo41p) and couples RNA processing and translation to transcription Rows wi...protein that interacts with an N-terminal region of mitochondrial RNA polymerase (Rpo41p) and couples RNA pr

  9. Yeast Interacting Proteins Database: YDL167C, YBL081W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available as bait (2) Rows with this bait as prey (0) YBL081W - Non-essential protein of unknown function; null mutation results in a decrease... function; null mutation results in a decrease in plasma membrane electron transp

  10. Yeast Interacting Proteins Database: YDL089W, YCR086W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available rDNA repeat stability; null mutant causes increase in unequal sister-chromatid exchange; GFP-fusion protein...p, Lrs4p; required for rDNA repeat stability; null mutant causes increase in unequal sister-chromatid exchan

  11. Yeast Interacting Proteins Database: YLR026C, YDR189W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ait as prey (0) YDR189W SLY1 Hydrophilic protein involved in vesicle trafficking between the ER and Golgi; S...it (1) Rows with this bait as prey Rows with this bait as prey (0) Prey ORF YDR189W Prey gene name SLY1 Prey description Hydrophilic

  12. Yeast Interacting Proteins Database: YLR295C, YDL118W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available tein of unconfirmed function; mutants are defective in telomere maintenance, and ...7) Rows with this bait as prey (0) YDL118W - Non-essential protein of unconfirmed function; mutants are defective in telomere mainten...ance, and are synthetically sick or lethal with alpha-sy

  13. Yeast Interacting Proteins Database: YHR166C, YLR451W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YHR166C CDC23 Subunit of the Anaphase-Promoting Complex/Cyclosome (APC/C), which is...ait description Subunit of the Anaphase-Promoting Complex/Cyclosome (APC/C), which is a ubiquitin-protein li

  14. Yeast Interacting Proteins Database: YJR091C, YFR036W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available 0) YFR036W CDC26 Subunit of the Anaphase-Promoting Complex/Cyclosome (APC/C), whi...rey description Subunit of the Anaphase-Promoting Complex/Cyclosome (APC/C), which is a ubiquitin-protein li

  15. Yeast Interacting Proteins Database: YGR223C, YOR089C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available micronucleophagy; predicted to fold as a seven-bladed beta-propeller; displays punctate cytoplasmic localiz...tion Phosphatidylinositol 3,5-bisphosphate-binding protein, plays a role in micronucleophagy; predicted to fold as a seven-blade

  16. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    Science.gov (United States)

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  17. PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

    Science.gov (United States)

    Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

    2017-09-01

    We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online.

  18. Yeast Interacting Proteins Database: YGR247W, YOR327C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ferred by null mutation or by overexpression Rows with this bait as bait (1) Rows with this bait as prey (0)...f R-type v-SNARE proteins Rows with this prey as prey (2) Rows with this prey as bait (0) 7 27 3 4 0 0 0 0 0...phosphate; may have a role in tRNA splicing; no detectable phenotype is conferred by null mutation or by overexpression Row...s with this bait as bait Rows with this bait as bait (1) Rows with this bait as prey Row...h the plasma membrane; member of the synaptobrevin/VAMP family of R-type v-SNARE proteins Rows with this prey as prey Row

  19. Yeast Interacting Proteins Database: YNL189W, YGL166W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available may also play a role in regulation of protein degradation Rows with this bait as bait (55) Rows with this b...cription of the metallothionein genes CUP1-1 and CUP1-2 in response to elevated copper concentrations Rows w...ith this prey as prey (1) Rows with this prey as bait (0) 67 47 3 5 0 0 0 0 0 - - - - - 0 0 5 - Show YNL189W...ation signal of the substrate during import; may also play a role in regulation of protein degradation Rows ...with this bait as bait Rows with this bait as bait (55) Rows with this bait as prey Row

  20. Yeast Interacting Proteins Database: YHL004W, YPL255W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YHL004W MRP4 Mitochondrial ribosomal protein of the small subunit Rows with this bait as bait (3) Row...2p and SPB components Spc29p and Kar1p; required for mitotic functions of Cdc5p Row...s with this prey as prey (4) Rows with this prey as bait (0) 9 8 2 2 0 0 0 0 0 - - - - - 0 0 6 - Show YHL0...04W Bait ORF YHL004W Bait gene name MRP4 Bait description Mitochondrial ribosomal protein of the small subunit Row...s with this bait as bait Rows with this bait as bait (3) Rows with this bait as prey Row

  1. Yeast Interacting Proteins Database: YOR167C, YEL015W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ical to Rps28Bp and has similarity to rat S28 ribosomal protein Rows with this bait as bait (2) Rows with th...calizes to cytoplasmic mRNA processing bodies Rows with this prey as prey (4) Rows with this prey as bait (3...8Bp and has similarity to rat S28 ribosomal protein Rows with this bait as bait Rows with this bait as bait (2) Row...s with this bait as prey Rows with this bait as prey (0) Prey ORF YEL015W Prey gene name EDC3 Prey de...NA decapping by specifically affecting the function of the decapping enzyme Dcp1p; localizes to cytoplasmic

  2. Yeast Interacting Proteins Database: YER071C, YDR366C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available in localizes to the cytoplasm in a punctate pattern Rows with this bait as bait (2) Rows with this bait as p...rey (0) YDR366C - Putative protein of unknown function Rows with this prey as prey (1) Rows with this prey a...calizes to the cytoplasm in a punctate pattern Rows with this bait as bait Rows with this bait as bait (2) Row...s with this bait as prey Rows with this bait as prey (0) Prey ORF YDR366C Prey ...gene name - Prey description Putative protein of unknown function Rows with this prey as prey Rows with this prey as prey (1) Row

  3. Yeast Interacting Proteins Database: YEL015W, YLR264W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available mRNA processing bodies Rows with this bait as bait (3) Rows with this bait as prey (4) YLR264W RPS28B Prote...d has similarity to rat S28 ribosomal protein Rows with this prey as prey (1) Rows with this prey as bait (0... by specifically affecting the function of the decapping enzyme Dcp1p; localizes to cytoplasmic mRNA processing bodies Row...s with this bait as bait Rows with this bait as bait (3) Rows with this bait as prey Rows with...otein component of the small (40S) ribosomal subunit; nearly identical to Rps28Ap and has similarity to rat S28 ribosomal protein Row

  4. Yeast Interacting Proteins Database: YOR285W, YDR233C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available racts with exocyst subunit Sec6p and with Yip3p; also interacts with Sbh1p; null mutant has an altered (most...tion ER membrane protein that interacts with exocyst subunit Sec6p and with Yip3p; also interacts with Sbh1p; null mutant has an alte...red (mostly cisternal) ER morphology; member of the RTNL

  5. Yeast Interacting Proteins Database: YDR020C, YDR020C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available a screen for mutants with increased levels of rDNA transcription; weak similarity with uridine kinases and ...e protein of unknown function; non-essential gene identified in a screen for mutants with increased levels...n of unknown function; non-essential gene identified in a screen for mutants with increased levels of rDNA t...sential gene identified in a screen for mutants with increased levels of rDNA transcription; weak similarity

  6. Yeast Interacting Proteins Database: YDL239C, YLR423C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...0 0 - - - - - 0 0 34 - Show YDL239C Bait ORF YDL239C Bait gene name ADY3 Bait des...ure at the leading edge of the prospore membrane via interaction with spindle pole body components; potentia

  7. Pentraxin 3, a non-redundant soluble pattern recognition receptor involved in innate immunity.

    Science.gov (United States)

    Mantovani, Alberto; Garlanda, Cecilia; Bottazzi, Barbara

    2003-06-01

    Pentraxin 3 (PTX3) is the first long pentraxin identified. Long pentraxins consist of a C-terminal pentraxin domain, which has sequence similarity to C-reactive protein (CRP) and serum amyloid P (SAP) component (the classic short pentraxins), and of an unrelated N-terminal portion. PTX3 is made by diverse cell types, most prominently endothelial cells, macrophages and dendritic cells, in response to primary inflammatory signals (e.g. interleukin-1 (IL-1), tumour necrosis factor (TNF), lipopolysaccharide (LPS)). It binds diverse ligands, including microbial moieties, C1q and apoptotic cells. Evidence suggests that PTX3 plays a role in the regulation of innate resistance to pathogens, inflammatory reactions, possibly clearance of self-components and female fertility.

  8. Yeast Interacting Proteins Database: YMR047C, YBR137W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available aryopherin Kap95p; homologous to Nup100p Rows with this bait as bait (15) Rows with this bait as prey (0) YB...tion Protein A (RPA); YBR137W is not an essential gene Rows with this prey as prey (4) Rows with this prey a...pherin Kap95p; homologous to Nup100p Rows with this bait as bait Rows with this bait as bait (15) Rows with this bait as prey Row...(RPA); YBR137W is not an essential gene Rows with this prey as prey Rows with this prey as prey (4) Rows with this prey as bait Row

  9. Yeast Interacting Proteins Database: YHR169W, YKL075C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available activity stimulated by association with Esp2p Rows with this bait as bait (1) Rows with this bait as prey (0...-fusion protein localizes to the cytoplasm; proposed to be involved in resistance to streptozotocin and camptothecin Row...s with this prey as prey (1) Rows with this prey as bait (0) 6 5 2 2 0 0 0 0 0 - - - - - 0 0 3 -...rRNA and assembly of 40S small ribosomal subunit; ATPase activity stimulated by association with Esp2p Rows ...with this bait as bait Rows with this bait as bait (1) Rows with this bait as prey Row

  10. Yeast Interacting Proteins Database: YNL273W, YMR048W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available ing gap repair of damaged DNA; interacts with the MCM helicase Rows with this bait as bait (1) Rows with thi...s bait as prey (0) YMR048W CSM3 Protein required for accurate chromosome segregation during meiosis Row...s with this prey as prey (1) Rows with this prey as bait (0) 4 3 2 2 0 0 0 0 0 - - - -...rk to promote sister chromatid cohesion after DNA damage, facilitating gap repair of damaged DNA; interacts with the MCM helicase Row...s with this bait as bait Rows with this bait as bait (1) Rows with this bait as prey Row

  11. Yeast Interacting Proteins Database: YBR254C, YKR068C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available oepiphyseal dysplasia tarda (SEDL) disorder Rows with this bait as bait (2) Rows with this bait as prey (0) ...targeting and fusion of ER to Golgi transport vesicles; component of the TRAPP (transport protein particle) complex Row...s with this prey as prey (1) Rows with this prey as bait (1) 9 9 5 20 0 0 1 1 0 - - - - - 1 1 4 -... fusion; mutations in the human homolog cause the spondyloepiphyseal dysplasia tarda (SEDL) disorder Rows with this bait as bait Row...s with this bait as bait (2) Rows with this bait as prey Row

  12. Yeast Interacting Proteins Database: YBR246W, YDR520C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available a screen for mutants with increased levels of rDNA transcription; null mutants display a weak carboxypeptid...ene identified in a screen for mutants with increased levels of rDNA transcription; similar to S. kluyveri U... description Putative protein of unknown function; non-essential gene identified in a screen for mutants with increased levels...ative Zn(II)2Cys6 motif containing transcription factor; non-essential gene identified in a screen for mutants with increased levels

  13. Yeast Interacting Proteins Database: YDL239C, YDR273W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...it as prey (1) YDR273W DON1 Meiosis-specific component of the spindle pole body, ...0 - - - - - 0 0 5 - Show YDL239C Bait ORF YDL239C Bait gene name ADY3 Bait descri... at the leading edge of the prospore membrane via interaction with spindle pole body components; potentially

  14. Yeast Interacting Proteins Database: YDL239C, YOR324C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p... as bait (0) 4 5 3 4 0 0 0 0 0 - - - - - 0 0 4 - Show YDL239C Bait ORF YDL239C Ba... a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle pole

  15. Yeast Interacting Proteins Database: YDL239C, YDR148C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...s prey as bait (0) 4 15 2 5 0 0 0 0 0 - - - - - 0 0 3 - Show YDL239C Bait ORF YDL...mbly of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindl

  16. Yeast Interacting Proteins Database: YDL239C, YAL028W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...(1) Rows with this prey as bait (0) 4 5 4 7 0 0 0 0 0 - - - - - 0 0 3 - Show YDL239C Bait ORF YDL... to mediate assembly of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindl

  17. Yeast Interacting Proteins Database: YDL239C, YBR072W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p... (3) Rows with this prey as bait (0) 4 52 1 1 0 0 0 0 0 - - - - - 0 0 3 - Show YDL239C Bait ORF YDL...ht to mediate assembly of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindl

  18. Yeast Interacting Proteins Database: YDL239C, YLR072W [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available YDL239C ADY3 Protein required for spore wall formation, thought to mediate assembly... of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p... with this prey as prey (1) Rows with this prey as bait (0) 4 5 4 7 0 0 0 0 0 - - - - - 0 0 4 - Show YDL239C Bait ORF YDL...pore membrane via interaction with spindle pole body components; potentially phosphorylated by Cdc28p Rows w

  19. ArachnoServer: a database of protein toxins from spiders

    Directory of Open Access Journals (Sweden)

    Kaas Quentin

    2009-08-01

    Full Text Available Abstract Background Venomous animals incapacitate their prey using complex venoms that can contain hundreds of unique protein toxins. The realisation that many of these toxins may have pharmaceutical and insecticidal potential due to their remarkable potency and selectivity against target receptors has led to an explosion in the number of new toxins being discovered and characterised. From an evolutionary perspective, spiders are the most successful venomous animals and they maintain by far the largest pool of toxic peptides. However, at present, there are no databases dedicated to spider toxins and hence it is difficult to realise their full potential as drugs, insecticides, and pharmacological probes. Description We have developed ArachnoServer, a manually curated database that provides detailed information about proteinaceous toxins from spiders. Key features of ArachnoServer include a new molecular target ontology designed especially for venom toxins, the most up-to-date taxonomic information available, and a powerful advanced search interface. Toxin information can be browsed through dynamic trees, and each toxin has a dedicated page summarising all available information about its sequence, structure, and biological activity. ArachnoServer currently manages 567 protein sequences, 334 nucleic acid sequences, and 51 protein structures. Conclusion ArachnoServer provides a single source of high-quality information about proteinaceous spider toxins that will be an invaluable resource for pharmacologists, neuroscientists, toxinologists, medicinal chemists, ion channel scientists, clinicians, and structural biologists. ArachnoServer is available online at http://www.arachnoserver.org.

  20. Yeast Interacting Proteins Database: YPR029C, YLR170C [Yeast Interacting Proteins Database

    Lifescience Database Archive (English)

    Full Text Available complex; binds clathrin; involved in vesicle mediated transport Rows with this bait as bait (3) Rows with t...t of the mammalian clathrin AP-1 complex Rows with this prey as prey (1) Rows with this prey as bait (1) 7 1...lathrin; involved in vesicle mediated transport Rows with this bait as bait Rows with this bait as bait (3) Row...s with this bait as prey Rows with this bait as prey (1) Prey ORF YLR170C Prey gene name APS1 Prey descri...olved in protein sorting at the trans-Golgi network; homolog of the sigma subunit of the mammalian clathrin AP-1 complex Row

  1. Databases

    Data.gov (United States)

    National Aeronautics and Space Administration — The databases of computational and experimental data from the first Aeroelastic Prediction Workshop are located here. The databases file names tell their contents by...

  2. NARG Algorithm of Extracting Non-redundant Association Rule in Concept Lattice%概念格上无冗余关联规则的提取算法NARG

    Institute of Scientific and Technical Information of China (English)

    苗茹; 沈夏炯; 胡小华

    2009-01-01

    Association roles are the very valuable kind of law in data mining. A large number of rules arc usually generated from database using ordinary mining algorithms. Especially when the minimal support and minimal confidence are reduced, the number of association rules rise rapidly. The key of eliminating redundant association rules is to reduce rules without losing data information. This paper presents a new algorithm called NARG to extract non-redundant association rules based on concept lattice and properties of redundant association rules. This algorithm can gain the minimal non-redundant set of association rules while effectively improve efficiency of extracting rules without losing any information of data.%在数据挖掘中,关联规则是很有价值的一类规律.普通的挖掘算法会产生大量的规则,尤其是当最小支持度和最小可信度减少时,关联规则的数目急剧上升.如何对规则进行约减而又不丢失数据信息是消除冗余关联规则的关键.根据概念格的理论和冗余关联规则的性质,提出在概念格上提取无冗余关联规则的NARG算法.该算法可以得到最小的无冗余的关联规则集,而且不丢失任何信息,可有效提高关联规则生成的效率.

  3. Development of human protein reference database as an initial platform for approaching systems biology in humans

    DEFF Research Database (Denmark)

    Peri, Suraj; Navarro, J Daniel; Amanchy, Ramars

    2003-01-01

    Human Protein Reference Database (HPRD) is an object database that integrates a wealth of information relevant to the function of human proteins in health and disease. Data pertaining to thousands of protein-protein interactions, posttranslational modifications, enzyme/substrate relationships, di...

  4. Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

    Science.gov (United States)

    Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

    2017-01-01

    Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

  5. Download - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...らダウンロード Joomla SEF URLs by Artio About This Database Database Description Download License Update History of

  6. ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites

    Directory of Open Access Journals (Sweden)

    Selbig Joachim

    2007-06-01

    Full Text Available Abstract Background In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD represents one such effort to make these data broadly available and to interconnect the different molecular levels of a biological system 1. As data interpretation in the light of already existing data becomes increasingly important, these initiatives are an essential part of current and future systems biology. Results A mass spectral library consisting of experimentally derived tryptic peptide product ion spectra was generated based on liquid chromatography coupled to ion trap mass spectrometry (LC-IT-MS. Protein samples derived from Arabidopsis thaliana, Chlamydomonas reinhardii, Medicago truncatula, and Sinorhizobium meliloti were analysed. With currently 4,557 manually validated spectra associated with 4,226 unique peptides from 1,367 proteins, the database serves as a continuously growing reference data set and can be used for protein identification and quantification in uncharacterized biological samples. For peptide identification, several algorithms were implemented based on a recently published study for peptide mass fingerprinting 2 and tested for false positive and negative rates. An algorithm which considers intensity distribution for match correlation scores was found to yield best results. For proof of concept, an LC-IT-MS analysis of a tryptic leaf protein digest was converted to mzData format and searched against the mass spectral library. The utility of the mass spectral library was also tested for the identification of phosphorylated tryptic peptides. We included in vivo phosphorylation sites of Arabidopsis thaliana proteins and the identification performance was found to be improved compared to genome-based search algorithms. Protein identification by Pro

  7. Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

    Directory of Open Access Journals (Sweden)

    Bradley Michael E

    2006-02-01

    Full Text Available Abstract Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1 multiple sequence alignments, 2 mapping of alignment sites to crystal structure sites, 3 phylogenetic trees, 4 inferred ancestral sequences at internal tree nodes, and 5 amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural

  8. Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

    Science.gov (United States)

    Bradley, Michael E; Benner, Steven A

    2006-01-01

    Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1) multiple sequence alignments, 2) mapping of alignment sites to crystal structure sites, 3) phylogenetic trees, 4) inferred ancestral sequences at internal tree nodes, and 5) amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural bioinformatics resources that

  9. License - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available freely redistribute part or whole of the data from this database; and freely create and distribute database and other derivative wor...ks based on part or whole of the data from this database, under the Standard Licens

  10. Protein Structural Change Data - PSCDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PSCDB Protein Structural Change Data Data detail Data name Protein Structural Change Data DO...History of This Database Site Policy | Contact Us Protein Structural Change Data - PSCDB | LSDB Archive ...

  11. Databases

    Directory of Open Access Journals (Sweden)

    Nick Ryan

    2004-01-01

    Full Text Available Databases are deeply embedded in archaeology, underpinning and supporting many aspects of the subject. However, as well as providing a means for storing, retrieving and modifying data, databases themselves must be a result of a detailed analysis and design process. This article looks at this process, and shows how the characteristics of data models affect the process of database design and implementation. The impact of the Internet on the development of databases is examined, and the article concludes with a discussion of a range of issues associated with the recording and management of archaeological data.

  12. A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary PSO.

    Science.gov (United States)

    Mandal, Monalisa; Mukhopadhyay, Anirban

    2014-01-01

    The purpose of feature selection is to identify the relevant and non-redundant features from a dataset. In this article, the feature selection problem is organized as a graph-theoretic problem where a feature-dissimilarity graph is shaped from the data matrix. The nodes represent features and the edges represent their dissimilarity. Both nodes and edges are given weight according to the feature's relevance and dissimilarity among the features, respectively. The problem of finding relevant and non-redundant features is then mapped into densest subgraph finding problem. We have proposed a multiobjective particle swarm optimization (PSO)-based algorithm that optimizes average node-weight and average edge-weight of the candidate subgraph simultaneously. The proposed algorithm is applied for identifying relevant and non-redundant disease-related genes from microarray gene expression data. The performance of the proposed method is compared with that of several other existing feature selection techniques on different real-life microarray gene expression datasets.

  13. A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary PSO.

    Directory of Open Access Journals (Sweden)

    Monalisa Mandal

    Full Text Available The purpose of feature selection is to identify the relevant and non-redundant features from a dataset. In this article, the feature selection problem is organized as a graph-theoretic problem where a feature-dissimilarity graph is shaped from the data matrix. The nodes represent features and the edges represent their dissimilarity. Both nodes and edges are given weight according to the feature's relevance and dissimilarity among the features, respectively. The problem of finding relevant and non-redundant features is then mapped into densest subgraph finding problem. We have proposed a multiobjective particle swarm optimization (PSO-based algorithm that optimizes average node-weight and average edge-weight of the candidate subgraph simultaneously. The proposed algorithm is applied for identifying relevant and non-redundant disease-related genes from microarray gene expression data. The performance of the proposed method is compared with that of several other existing feature selection techniques on different real-life microarray gene expression datasets.

  14. Core Data of Yeast Interacting Proteins Database (Annotation Updated Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available nteractions are required. Several sources including YPD (Yeast Proteome Database, Costanzo, M. C., Hogan, J....erse direction. *1 The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources

  15. A Database of Domain Definitions for Proteins with Complex Interdomain Geometry

    OpenAIRE

    Indraneel Majumdar; Kinch, Lisa N.; Grishin, Nick V.

    2009-01-01

    Protein structural domains are necessary for understanding evolution and protein folding, and may vary widely from functional and sequence based domains. Although, various structural domain databases exist, defining domains for some proteins is non-trivial, and definitions of their domain boundaries are not available. Here, we present a novel database of manually defined structural domains for a representative set of proteins from the SCOP "multi-domain proteins" class. (http://prodata.swmed....

  16. iPPI-DB: an online database of modulators of protein-protein interactions.

    Science.gov (United States)

    Labbé, Céline M; Kuenemann, Mélaine A; Zarzycka, Barbara; Vriend, Gert; Nicolaes, Gerry A F; Lagorce, David; Miteva, Maria A; Villoutreix, Bruno O; Sperandio, Olivier

    2016-01-01

    In order to boost the identification of low-molecular-weight drugs on protein-protein interactions (PPI), it is essential to properly collect and annotate experimental data about successful examples. This provides the scientific community with the necessary information to derive trends about privileged physicochemical properties and chemotypes that maximize the likelihood of promoting a given chemical probe to the most advanced stages of development. To this end we have developed iPPI-DB (freely accessible at http://www.ippidb.cdithem.fr), a database that contains the structure, some physicochemical characteristics, the pharmacological data and the profile of the PPI targets of several hundreds modulators of protein-protein interactions. iPPI-DB is accessible through a web application and can be queried according to two general approaches: using physicochemical/pharmacological criteria; or by chemical similarity to a user-defined structure input. In both cases the results are displayed as a sortable and exportable datasheet with links to external databases such as Uniprot, PubMed. Furthermore each compound in the table has a link to an individual ID card that contains its physicochemical and pharmacological profile derived from iPPI-DB data. This includes information about its binding data, ligand and lipophilic efficiencies, location in the PPI chemical space, and importantly similarity with known drugs, and links to external databases like PubChem, and ChEMBL.

  17. SWISS-PROT: connecting biomolecular knowledge via a protein database

    OpenAIRE

    Gasteiger, Elisabeth; Jung, Eva; Bairoch, Amos Marc

    2001-01-01

    With the explosive growth of biological data, the development of new means of data storage was needed. More and more often biological information is no longer published in the conventional way via a publication in a scientific journal, but only deposited into a database. In the last two decades these databases have become essential tools for researchers in biological sciences. Biological databases can be classified according to the type of information they contain. There are basically three t...

  18. MODBASE: a database of annotated comparative protein structure models and associated resources

    OpenAIRE

    Pieper, Ursula; Eswar, Narayanan; Davis, Fred P.; Braberg, Hannes; Madhusudhan, M. S.; Rossi, Andrea; Marti-Renom, Marc; Karchin, Rachel; Webb, Ben M.; Eramian, David; Shen, Min-Yi; Kelly, Libusha; Melo, Francisco; Sali, Andrej

    2005-01-01

    MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculat...

  19. Remote access to ACNUC nucleotide and protein sequence databases at PBIL.

    Science.gov (United States)

    Gouy, Manolo; Delmotte, Stéphane

    2008-04-01

    The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html).

  20. PACSY, a relational database management system for protein structure and chemical shift analysis.

    Science.gov (United States)

    Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

    2012-10-01

    PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.

  1. DCCP and DICP: Construction and Analyses of Databases for Copper- and Iron-Chelating Proteins

    Institute of Scientific and Technical Information of China (English)

    Hao Wu; Yan Yang; Sheng-Juan Jiang; Ling-Ling Chen; Hai-Xia Gao; Qing-Shan Fu; Feng Li; Bin-Guang Ma; Hong-Yu Zhang

    2005-01-01

    Copper and iron play important roles in a variety of biological processes, especially when being chelated with proteins. The proteins involved in the metal binding,transporting and metabolism have aroused much interest. To facilitate the study on this topic, we constructed two databases (DCCP and DICP) containing the known copper- and iron-chelating proteins, which are freely available from the website http:∥sdbi.sdut.edu.cn/en. Users can conveniently search and browse all of the entries in the databases. Based on the two databases, bioinformatic analyses were performed, which provided some novel insights into metalloproteins.

  2. Protein - TP Atlas | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available otein information on other websites. Data file File name: tp_atlas_protein.zip Fi...le URL: ftp://ftp.biosciencedbc.jp/archive/tp_atlas/LATEST/tp_atlas_protein.zip File size: 49.8 KB Simple se...arch URL http://togodb.biosciencedbc.jp/togodb/view/tp_atlas_protein#en Data acquisition method - Data analy

  3. Protein - AT Atlas | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available eins Research Program (TPRP). Data file File name: at_atlas_protein.zip File URL:... ftp://ftp.biosciencedbc.jp/archive/at_atlas/LATEST/at_atlas_protein.zip File size: 1.13 KB Simple search UR...L http://togodb.biosciencedbc.jp/togodb/view/at_atlas_protein#en Data acquisition method - Data analysis met

  4. Human immunodeficiency virus type 1, human protein interaction database at NCBI.

    Science.gov (United States)

    Fu, William; Sanders-Beer, Brigitte E; Katz, Kenneth S; Maglott, Donna R; Pruitt, Kim D; Ptak, Roger G

    2009-01-01

    The 'Human Immunodeficiency Virus Type 1 (HIV-1), Human Protein Interaction Database', available through the National Library of Medicine at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions, was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature. The database serves the scientific community exploring the discovery of novel HIV vaccine candidates and therapeutic targets. To facilitate this discovery approach, the following information for each HIV-1 human protein interaction is provided and can be retrieved without restriction by web-based downloads and ftp protocols: Reference Sequence (RefSeq) protein accession numbers, Entrez Gene identification numbers, brief descriptions of the interactions, searchable keywords for interactions and PubMed identification numbers (PMIDs) of journal articles describing the interactions. Currently, 2589 unique HIV-1 to human protein interactions and 5135 brief descriptions of the interactions, with a total of 14,312 PMID references to the original articles reporting the interactions, are stored in this growing database. In addition, all protein-protein interactions documented in the database are integrated into Entrez Gene records and listed in the 'HIV-1 protein interactions' section of Entrez Gene reports. The database is also tightly linked to other databases through Entrez Gene, enabling users to search for an abundance of information related to HIV pathogenesis and replication.

  5. ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

    Science.gov (United States)

    Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

    2017-01-01

    Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Protein 3D Structure Image - PSCDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PSCDB Protein 3D Structure Image Data detail Data name Protein 3D Structure Image DOI 10.189...tory of This Database Site Policy | Contact Us Protein 3D Structure Image - PSCDB | LSDB Archive ...

  7. O-GLYCOBASE version 4.0: a revised database of O-glycosylated proteins

    DEFF Research Database (Denmark)

    Gupta, Ramneek; Birch, Hanne; Rapacki, Krzysztof;

    1999-01-01

    O-GLYCBASE is a database of glycoproteins with O-linked glycosylation sites. Entries with at least one experimentally verified O-glycosylation site have been complied from protein sequence databases and literature. Each entry contains information about the glycan involved, the species, sequence, ...

  8. UNcleProt (Universal Nuclear Protein database of barley): The first nuclear protein database that distinguishes proteins from different phases of the cell cycle.

    Science.gov (United States)

    Blavet, Nicolas; Uřinovská, Jana; Jeřábková, Hana; Chamrád, Ivo; Vrána, Jan; Lenobel, René; Beinhauer, Jana; Šebela, Marek; Doležel, Jaroslav; Petrovská, Beáta

    2017-01-02

    Proteins are the most abundant component of the cell nucleus, where they perform a plethora of functions, including the assembly of long DNA molecules into condensed chromatin, DNA replication and repair, regulation of gene expression, synthesis of RNA molecules and their modification. Proteins are important components of nuclear bodies and are involved in the maintenance of the nuclear architecture, transport across the nuclear envelope and cell division. Given their importance, the current poor knowledge of plant nuclear proteins and their dynamics during the cell's life and division is striking. Several factors hamper the analysis of the plant nuclear proteome, but the most critical seems to be the contamination of nuclei by cytosolic material during their isolation. With the availability of an efficient protocol for the purification of plant nuclei, based on flow cytometric sorting, contamination by cytoplasmic remnants can be minimized. Moreover, flow cytometry allows the separation of nuclei in different stages of the cell cycle (G1, S, and G2). This strategy has led to the identification of large number of nuclear proteins from barley (Hordeum vulgare), thus triggering the creation of a dedicated database called UNcleProt, http://barley.gambrinus.ueb.cas.cz/ .

  9. Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

    Science.gov (United States)

    Gaponov, Yu. A.; Igarashi, N.; Hiraki, M.; Sasajima, K.; Matsugaki, N.; Suzuki, M.; Kosuge, T.; Wakatsuki, S.

    2004-05-01

    An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view, create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.

  10. Development of human protein reference database as an initial platform for approaching systems biology in humans.

    Science.gov (United States)

    Peri, Suraj; Navarro, J Daniel; Amanchy, Ramars; Kristiansen, Troels Z; Jonnalagadda, Chandra Kiran; Surendranath, Vineeth; Niranjan, Vidya; Muthusamy, Babylakshmi; Gandhi, T K B; Gronborg, Mads; Ibarrola, Nieves; Deshpande, Nandan; Shanker, K; Shivashankar, H N; Rashmi, B P; Ramya, M A; Zhao, Zhixing; Chandrika, K N; Padma, N; Harsha, H C; Yatish, A J; Kavitha, M P; Menezes, Minal; Choudhury, Dipanwita Roy; Suresh, Shubha; Ghosh, Neelanjana; Saravana, R; Chandran, Sreenath; Krishna, Subhalakshmi; Joy, Mary; Anand, Sanjeev K; Madavan, V; Joseph, Ansamma; Wong, Guang W; Schiemann, William P; Constantinescu, Stefan N; Huang, Lily; Khosravi-Far, Roya; Steen, Hanno; Tewari, Muneesh; Ghaffari, Saghi; Blobe, Gerard C; Dang, Chi V; Garcia, Joe G N; Pevsner, Jonathan; Jensen, Ole N; Roepstorff, Peter; Deshpande, Krishna S; Chinnaiyan, Arul M; Hamosh, Ada; Chakravarti, Aravinda; Pandey, Akhilesh

    2003-10-01

    Human Protein Reference Database (HPRD) is an object database that integrates a wealth of information relevant to the function of human proteins in health and disease. Data pertaining to thousands of protein-protein interactions, posttranslational modifications, enzyme/substrate relationships, disease associations, tissue expression, and subcellular localization were extracted from the literature for a nonredundant set of 2750 human proteins. Almost all the information was obtained manually by biologists who read and interpreted >300,000 published articles during the annotation process. This database, which has an intuitive query interface allowing easy access to all the features of proteins, was built by using open source technologies and will be freely available at http://www.hprd.org to the academic community. This unified bioinformatics platform will be useful in cataloging and mining the large number of proteomic interactions and alterations that will be discovered in the postgenomic era.

  11. InterEvol database: exploring the structure and evolution of protein complex interfaces

    OpenAIRE

    Faure, Guilhem; Andreani, Jessica; Guerois, Raphaël

    2011-01-01

    Capturing how the structures of interacting partners evolved at their binding interfaces is a fundamental issue for understanding interactomes evolution. In that scope, the InterEvol database was designed for exploring 3D structures of homologous interfaces of protein complexes. For every chain forming a complex in the protein data bank (PDB), close and remote structural interologs were identified providing essential snapshots for studying interfaces evolution. The database provides tools to ...

  12. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    Science.gov (United States)

    Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

    2007-01-01

    Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328

  13. PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

    Directory of Open Access Journals (Sweden)

    Picard-Cloutier Aude

    2007-12-01

    Full Text Available Abstract Background In the "post-genome" era, mass spectrometry (MS has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5.

  14. The Histone Database: an integrated resource for histones and histone fold-containing proteins.

    Science.gov (United States)

    Mariño-Ramírez, Leonardo; Levine, Kevin M; Morales, Mario; Zhang, Suiyuan; Moreland, R Travis; Baxevanis, Andreas D; Landsman, David

    2011-01-01

    Eukaryotic chromatin is composed of DNA and protein components-core histones-that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins.

  15. The Pfam protein families database: towards a more sustainable future.

    Science.gov (United States)

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-01

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

  16. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    Science.gov (United States)

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.

  17. Extracting protein alignment models from the sequence database.

    Science.gov (United States)

    Neuwald, A F; Liu, J S; Lipman, D J; Lawrence, C E

    1997-05-01

    Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.

  18. AraPPISite: a database of fine-grained protein-protein interaction site annotations for Arabidopsis thaliana.

    Science.gov (United States)

    Li, Hong; Yang, Shiping; Wang, Chuan; Zhou, Yuan; Zhang, Ziding

    2016-09-01

    Knowledge about protein interaction sites provides detailed information of protein-protein interactions (PPIs). To date, nearly 20,000 of PPIs from Arabidopsis thaliana have been identified. Nevertheless, the interaction site information has been largely missed by previously published PPI databases. Here, AraPPISite, a database that presents fine-grained interaction details for A. thaliana PPIs is established. First, the experimentally determined 3D structures of 27 A. thaliana PPIs are collected from the Protein Data Bank database and the predicted 3D structures of 3023 A. thaliana PPIs are modeled by using two well-established template-based docking methods. For each experimental/predicted complex structure, AraPPISite not only provides an interactive user interface for browsing interaction sites, but also lists detailed evolutionary and physicochemical properties of these sites. Second, AraPPISite assigns domain-domain interactions or domain-motif interactions to 4286 PPIs whose 3D structures cannot be modeled. In this case, users can easily query protein interaction regions at the sequence level. AraPPISite is a free and user-friendly database, which does not require user registration or any configuration on local machines. We anticipate AraPPISite can serve as a helpful database resource for the users with less experience in structural biology or protein bioinformatics to probe the details of PPIs, and thus accelerate the studies of plant genetics and functional genomics. AraPPISite is available at http://systbio.cau.edu.cn/arappisite/index.html .

  19. MoonProt: a database for proteins that are known to moonlight

    Science.gov (United States)

    Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

    2015-01-01

    Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305

  20. Protein (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGD... The IDs of clusters that the amino acid sequences belong to in each taxon are indicated. Data file File name: pgd...bj_ortholog_db_cyanobacteria_protein.zip File URL: ftp://ftp.biosciencedbc.jp/archive/pgdbj-ortholog-db/LATEST/pgd...ch URL http://togodb.biosciencedbc.jp/togodb/view/pgdbj_ortholog_db_cyanobacteria_protein#en Data acquisitio...ase Database Description Download License Update History of This Database Site Policy | Contact Us Protein (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive ...

  1. dbPPT: a comprehensive database of protein phosphorylation in plants.

    Science.gov (United States)

    Cheng, Han; Deng, Wankun; Wang, Yongbo; Ren, Jian; Liu, Zexian; Xue, Yu

    2014-01-01

    As one of the most important protein post-translational modifications, the reversible phosphorylation is critical for plants in regulating a variety of biological processes such as cellular metabolism, signal transduction and responses to environmental stress. Numerous efforts especially large-scale phosphoproteome profiling studies have been contributed to dissect the phosphorylation signaling in various plants, while a large number of phosphorylation events were identified. To provide an integrated data resource for further investigations, here we present a comprehensive database of dbPPT (database of Phosphorylation site in PlanTs, at http://dbppt.biocuckoo.org), which contains experimentally identified phosphorylation sites in proteins from plants. The phosphorylation sites in dbPPT were manually curated from the literatures, whereas datasets in other public databases were also integrated. In total, there were 82,175 phosphorylation sites in 31,012 proteins from 20 plant organisms in dbPPT, presenting a larger quantity of phosphorylation sites and a higher coverage of plant species in comparison with other databases. The proportions of residue types including serine, threonine and tyrosine were 77.99, 17.81 and 4.20%, respectively. All the phosphoproteins and phosphorylation sites in the database were critically annotated. Since the phosphorylation signaling in plants attracted great attention recently, such a comprehensive resource of plant protein phosphorylation can be useful for the research community. Database URL: http://dbppt.biocuckoo.or

  2. MSV3d: database of human MisSense Variants mapped to 3D protein structure.

    Science.gov (United States)

    Luu, Tien-Dao; Rusu, Alin-Mihai; Walter, Vincent; Ripp, Raymond; Moulinier, Luc; Muller, Jean; Toursel, Thierry; Thompson, Julie D; Poch, Olivier; Nguyen, Hoan

    2012-01-01

    The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats. Database URL: http://decrypthon.igbmc.fr/msv3d.

  3. Literature curation of protein interactions: measuring agreement across major public databases

    Science.gov (United States)

    Turinsky, Andrei L.; Razick, Sabry; Turner, Brian; Wodak, Shoshana J.

    2010-01-01

    Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL: http://wodaklab.org/iRefWeb PMID:21183497

  4. Using homology relations within a database markedly boosts protein sequence similarity search.

    Science.gov (United States)

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-02

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  5. dbSAP: single amino-acid polymorphism database for protein variation detection

    Science.gov (United States)

    Cao, Ruifang; Shi, Yan; Chen, Shuangguan; Ma, Yimin; Chen, Jiajun; Yang, Juan; Chen, Geng; Shi, Tieliu

    2017-01-01

    Millions of human single nucleotide polymorphisms (SNPs) or mutations have been identified so far, and these variants could be strongly correlated with phenotypic variations of traits/diseases. Among these variants, non-synonymous ones can result in amino-acid changes that are called single amino-acid polymorphisms (SAPs). Although some studies have tried to investigate the SAPs, only a small fraction of SAPs have been identified due to inadequately inferred protein variation database and the low coverage of mass spectrometry (MS) experiments. Here, we present the dbSAP database for conveniently accessing the comprehensive information and relationships of spectra, peptides and proteins of SAPs, as well as related genes, pathways, diseases and drug targets. In order to fully explore human SAPs, we built a customized protein database that contained comprehensive variant proteins by integrating and annotating the human SNPs and mutations from eight distinct databases (UniProt, Protein Mutation Database, HPMD, MSIPI, MS-CanProVar, dbSNP, Ensembl and COSMIC). After a series of quality controls, a total of 16 854 SAP peptides involving in 439 537 spectra were identified with large scale MS datasets from various human tissues and cell lines. dbSAP is freely available at http://www.megabionet.org/dbSAP/index.html. PMID:27903894

  6. BriX: a database of protein building blocks for structural analysis, modeling and design.

    Science.gov (United States)

    Vanhee, Peter; Verschueren, Erik; Baeten, Lies; Stricher, Francois; Serrano, Luis; Rousseau, Frederic; Schymkowitz, Joost

    2011-01-01

    High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online (http://brix.crg.es) and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments.

  7. VaProS: a database-integration approach for protein/genome information retrieval

    KAUST Repository

    Gojobori, Takashi

    2016-12-24

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

  8. ProOpDB: Prokaryotic Operon DataBase.

    Science.gov (United States)

    Taboada, Blanca; Ciria, Ricardo; Martinez-Guerrero, Cristian E; Merino, Enrique

    2012-01-01

    The Prokaryotic Operon DataBase (ProOpDB, http://operons.ibt.unam.mx/OperonPredictor) constitutes one of the most precise and complete repositories of operon predictions now available. Using our novel and highly accurate operon identification algorithm, we have predicted the operon structures of more than 1200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: (i) organism name, (ii) metabolic pathways, as defined by the KEGG database, (iii) gene orthology, as defined by the COG database, (iv) conserved protein domains, as defined by the Pfam database, (v) reference gene and (vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient method to select the most representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool to visualize their genomic context and retrieve the sequence of their corresponding 5' regulatory regions, as well as the nucleotide or amino acid sequences of their genes.

  9. Non-redundant Functions of ATM and DNA-PKcs in Response to DNA Double-Strand Breaks

    Directory of Open Access Journals (Sweden)

    Pierre Caron

    2015-11-01

    Full Text Available DNA double-strand breaks (DSBs elicit the so-called DNA damage response (DDR, largely relying on ataxia telangiectasia mutated (ATM and DNA-dependent protein kinase (DNA-PKcs, two members of the PI3K-like kinase family, whose respective functions during the sequential steps of the DDR remains controversial. Using the DIvA system (DSB inducible via AsiSI combined with high-resolution mapping and advanced microscopy, we uncovered that both ATM and DNA-PKcs spread in cis on a confined region surrounding DSBs, independently of the pathway used for repair. However, once recruited, these kinases exhibit non-overlapping functions on end joining and γH2AX domain establishment. More specifically, we found that ATM is required to ensure the association of multiple DSBs within “repair foci.” Our results suggest that ATM acts not only on chromatin marks but also on higher-order chromatin organization to ensure repair accuracy and survival.

  10. The TissueNet v.2 database: A quantitative view of protein-protein interactions across human tissues

    Science.gov (United States)

    Basha, Omer; Barshir, Ruth; Sharon, Moran; Lerman, Eugene; Kirson, Binyamin F.; Hekselman, Idan; Yeger-Lotem, Esti

    2017-01-01

    Knowledge of the molecular interactions of human proteins within tissues is important for identifying their tissue-specific roles and for shedding light on tissue phenotypes. However, many protein–protein interactions (PPIs) have no tissue-contexts. The TissueNet database bridges this gap by associating experimentally-identified PPIs with human tissues that were shown to express both pair-mates. Users can select a protein and a tissue, and obtain a network view of the query protein and its tissue-associated PPIs. TissueNet v.2 is an updated version of the TissueNet database previously featured in NAR. It includes over 40 human tissues profiled via RNA-sequencing or protein-based assays. Users can select their preferred expression data source and interactively set the expression threshold for determining tissue-association. The output of TissueNet v.2 emphasizes qualitative and quantitative features of query proteins and their PPIs. The tissue-specificity view highlights tissue-specific and globally-expressed proteins, and the quantitative view highlights proteins that were differentially expressed in the selected tissue relative to all other tissues. Together, these views allow users to quickly assess the unique versus global functionality of query proteins. Thus, TissueNet v.2 offers an extensive, quantitative and user-friendly interface to study the roles of human proteins across tissues. TissueNet v.2 is available at http://netbio.bgu.ac.il/tissuenet. PMID:27899616

  11. Motifs with potential physiological activity in food proteins – BIOPEP database

    Directory of Open Access Journals (Sweden)

    Bartłomiej Dziuba

    2009-09-01

    Full Text Available Proteins are the multifunctional food components affecting the living organisms. One of the proteins function is the impact on the body due to the presence of motifs that show specific physiological and biological activities. Due to the worldwide growth of demand for the food containing bioactive components, increasing attention has been paid recently to the use of bioactive peptides as physiologically active food ingredients. They are important elements of the prevention and treatment of various lifestyle diseases. In addition to its primary function and according to current knowledge, each protein may be a reserve source of peptides controlling the life processes of organisms. For this reason, in this work, application of a new, additional criterion for evaluating proteins as a potential source of biologically active peptides, contributes to a more comprehensive and objective definition of their biological value. A complementary part of such research is the strategy for evaluation of the food proteins as precursors of biologically active peptides which involves the database of proteins and bioactive peptides – BIOPEP (available online at: http://www.uwm.edu.pl/biochemia. The database contains information on 2123 peptides representing 48 types of bioactivities, their EC50 values and source of origin. Proteins (706 sequences are considered as bioactive peptide precursors based on newly introduced criteria: the profile of potential biological activity, the frequency of bioactive fragments occurrence and potential biological protein activity. This original and unprecedented so far approach, started to be successfully and more widely applied by other authors. BIOPEP can be interfaced with global databases such as e.g. TrEMBL, SWISS-PROT, EROP and PepBank. Recently the BIOPEP database was enlarged with the data about allergenic proteins, including information about structure of their epitopes and molecular markers.  

  12. HIP2: An online database of human plasma proteins from healthy individuals

    Directory of Open Access Journals (Sweden)

    Shen Changyu

    2008-04-01

    Full Text Available Abstract Background With the introduction of increasingly powerful mass spectrometry (MS techniques for clinical research, several recent large-scale MS proteomics studies have sought to characterize the entire human plasma proteome with a general objective for identifying thousands of proteins leaked from tissues in the circulating blood. Understanding the basic constituents, diversity, and variability of the human plasma proteome is essential to the development of sensitive molecular diagnosis and treatment monitoring solutions for future biomedical applications. Biomedical researchers today, however, do not have an integrated online resource in which they can search for plasma proteins collected from different mass spectrometry platforms, experimental protocols, and search software for healthy individuals. The lack of such a resource for comparisons has made it difficult to interpret proteomics profile changes in patients' plasma and to design protein biomarker discovery experiments. Description To aid future protein biomarker studies of disease and health from human plasma, we developed an online database, HIP2 (Healthy Human Individual's Integrated Plasma Proteome. The current version contains 12,787 protein entries linked to 86,831 peptide entries identified using different MS platforms. Conclusion This web-based database will be useful to biomedical researchers involved in biomarker discovery research. This database has been developed to be the comprehensive collection of healthy human plasma proteins, and has protein data captured in a relational database schema built to contain mappings of supporting peptide evidence from several high-quality and high-throughput mass-spectrometry (MS experimental data sets. Users can search for plasma protein/peptide annotations, peptide/protein alignments, and experimental/sample conditions with options for filter-based retrieval to achieve greater analytical power for discovery and validation.

  13. ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

    KAUST Repository

    Wang, Jim Jing-Yan

    2012-05-08

    Background: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update

  14. STITCH 2: an interaction network database for small molecules and proteins

    DEFF Research Database (Denmark)

    Kuhn, Michael; Szklarczyk, Damian; Franceschini, Andrea

    2010-01-01

    Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug-target r......Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug......-target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other...

  15. An update of the DEF database of protein fold class predictions

    DEFF Research Database (Denmark)

    Reczko, Martin; Karras, Dimitris; Bohr, Henrik

    1997-01-01

    An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure...

  16. Sentra : a database of signal transduction proteins for comparative genome analysis.

    Energy Technology Data Exchange (ETDEWEB)

    D' Souza, M.; Glass, E. M.; Syed, M. H.; Zhang, Y.; Rodriguez, A.; Maltsev, N.; Galerpin, M. Y.; Mathematics and Computer Science; Univ. of Chicago; NIH

    2007-01-01

    Sentra (http://compbio.mcs.anl.gov/sentra), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.

  17. PDTD: a web-accessible protein database for drug target identification

    Directory of Open Access Journals (Sweden)

    Gao Zhenting

    2008-02-01

    Full Text Available Abstract Background Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD, and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation. Description PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores. Conclusion PDTD serves as a comprehensive and

  18. Algorithm for Generating Non-Redundant Association Rules%一种无冗余的关联规则发现算法

    Institute of Scientific and Technical Information of China (English)

    高峰; 谢剑英

    2001-01-01

    关联规则是数据挖掘的重要研究内容之一,而传统算法生成的关联规则之间存在着大量的冗余规则.本文提出了一种通用的由最大频繁项目集生成无冗余关联规则的GNRR算法,利用规则之间的冗余关系,按一定顺序挖掘不同的规则,消除了规则之间的冗余性,使发现的规则数目呈指数倍减少.%The discovery of association rules is an important research topic in data mining, but the traditional association rules discovery algorithm produces too many redundant rules. This paper presented a general algorithm for mining non-redundant rules from the largest frequent itemsets using the redundant relationship of rules. The algorithm eliminates the redundancy between the rules and reduces the number of rules exponentially.

  19. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation.

    Science.gov (United States)

    Thangakani, A Mary; Nagarajan, R; Kumar, Sandeep; Sakthivel, R; Velmurugan, D; Gromiha, M Michael

    2016-01-01

    Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed.

  20. PASS2: an automated database of protein alignments organised as structural superfamilies

    Directory of Open Access Journals (Sweden)

    Sowdhamini Ramanathan

    2004-04-01

    Full Text Available Abstract Background The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2 database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. Description An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of

  1. muBLASTP: database-indexed protein sequence search on multicore CPUs.

    Science.gov (United States)

    Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

    2016-11-04

    The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

  2. The TissueNet v.2 database: A quantitative view of protein-protein interactions across human tissues.

    Science.gov (United States)

    Basha, Omer; Barshir, Ruth; Sharon, Moran; Lerman, Eugene; Kirson, Binyamin F; Hekselman, Idan; Yeger-Lotem, Esti

    2017-01-04

    Knowledge of the molecular interactions of human proteins within tissues is important for identifying their tissue-specific roles and for shedding light on tissue phenotypes. However, many protein-protein interactions (PPIs) have no tissue-contexts. The TissueNet database bridges this gap by associating experimentally-identified PPIs with human tissues that were shown to express both pair-mates. Users can select a protein and a tissue, and obtain a network view of the query protein and its tissue-associated PPIs. TissueNet v.2 is an updated version of the TissueNet database previously featured in NAR. It includes over 40 human tissues profiled via RNA-sequencing or protein-based assays. Users can select their preferred expression data source and interactively set the expression threshold for determining tissue-association. The output of TissueNet v.2 emphasizes qualitative and quantitative features of query proteins and their PPIs. The tissue-specificity view highlights tissue-specific and globally-expressed proteins, and the quantitative view highlights proteins that were differentially expressed in the selected tissue relative to all other tissues. Together, these views allow users to quickly assess the unique versus global functionality of query proteins. Thus, TissueNet v.2 offers an extensive, quantitative and user-friendly interface to study the roles of human proteins across tissues. TissueNet v.2 is available at http://netbio.bgu.ac.il/tissuenet. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. MPIC: a mitochondrial protein import components database for plant and non-plant species.

    Science.gov (United States)

    Murcha, Monika W; Narsai, Reena; Devenish, James; Kubiszewski-Jakubiak, Szymon; Whelan, James

    2015-01-01

    In the 2 billion years since the endosymbiotic event that gave rise to mitochondria, variations in mitochondrial protein import have evolved across different species. With the genomes of an increasing number of plant species sequenced, it is possible to gain novel insights into mitochondrial protein import pathways. We have generated the Mitochondrial Protein Import Components (MPIC) Database (DB; http://www.plantenergy.uwa.edu.au/applications/mpic) providing searchable information on the protein import apparatus of plant and non-plant mitochondria. An in silico analysis was carried out, comparing the mitochondrial protein import apparatus from 24 species representing various lineages from Saccharomyces cerevisiae (yeast) and algae to Homo sapiens (human) and higher plants, including Arabidopsis thaliana (Arabidopsis), Oryza sativa (rice) and other more recently sequenced plant species. Each of these species was extensively searched and manually assembled for analysis in the MPIC DB. The database presents an interactive diagram in a user-friendly manner, allowing users to select their import component of interest. The MPIC DB presents an extensive resource facilitating detailed investigation of the mitochondrial protein import machinery and allowing patterns of conservation and divergence to be recognized that would otherwise have been missed. To demonstrate the usefulness of the MPIC DB, we present a comparative analysis of the mitochondrial protein import machinery in plants and non-plant species, revealing plant-specific features that have evolved.

  4. Two dimensional gel human protein databases offer a systematic approach to the study of cell proliferation and differentiation

    DEFF Research Database (Denmark)

    Celis, J E; Gesser, B; Dejgaard, K;

    1989-01-01

    Human cellular protein databases have been established using computer-analyzed 2D gel electrophoresis. These databases, which include information on various properties of proteins, offer a global approach to the study of regulation of cell proliferation and differentiation. Furthermore, thanks...

  5. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  6. Exploring the composition of protein-ligand binding sites on a large scale.

    Directory of Open Access Journals (Sweden)

    Nickolay A Khazanov

    Full Text Available The residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant "valid" ligands from "invalid" small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.

  7. Fly-DPI: database of protein interactomes for D. melanogaster in the approach of systems biology

    Directory of Open Access Journals (Sweden)

    Lin Chieh-Hua

    2006-12-01

    Full Text Available Abstract Background Proteins control and mediate many biological activities of cells by interacting with other protein partners. This work presents a statistical model to predict protein interaction networks of Drosophila melanogaster based on insight into domain interactions. Results Three high-throughput yeast two-hybrid experiments and the collection in FlyBase were used as our starting datasets. The co-occurrences of domains in these interactive events are converted into a probability score of domain-domain interaction. These scores are used to infer putative interaction among all available open reading frames (ORFs of fruit fly. Additionally, the likelihood function is used to estimate all potential protein-protein interactions. All parameters are successfully iterated and MLE is obtained for each pair of domains. Additionally, the maximized likelihood reaches its converged criteria and maintains the probability stable. The hybrid model achieves a high specificity with a loss of sensitivity, suggesting that the model may possess major features of protein-protein interactions. Several putative interactions predicted by the proposed hybrid model are supported by literatures, while experimental data with a low probability score indicate an uncertain reliability and require further proof of interaction. Fly-DPI is the online database used to present this work. It is an integrated proteomics tool with comprehensive protein annotation information from major databases as well as an effective means of predicting protein-protein interactions. As a novel search strategy, the ping-pong search is a naïve path map between two chosen proteins based on pre-computed shortest paths. Adopting effective filtering strategies will facilitate researchers in depicting the bird's eye view of the network of interest. Fly-DPI can be accessed at http://flydpi.nhri.org.tw. Conclusion This work provides two reference systems, statistical and biological, to evaluate

  8. A database of naturally occurring human urinary peptides and proteins for use in clinical applications

    OpenAIRE

    Petra Zürbig; Joshua Coon; Hartwig Bauer; Georg Behrens; Mohammed Dakna; Anna Dominiczak; Stephane Decramer; Jochen Ehrich; Danilo Fliser; Moritz Frommberger; Arnold Ganser; Mark Giolami; Igor Golovko; David Good; Wilfried Gwinner

    2007-01-01

    Owing to its availability, ease of collection and correlation with (patho-) physiology, urine is an attractive source for clinical proteomics. However, the lack of comparable datasets from large cohorts has greatly hindered development in this field. Here we report the establishment of a high resolution proteome database of naturally occurring human urinary peptides and proteins - ranging from 800-17,000 Da - from over 3,600 individual samples using capillary electrophoresis coupled to mass s...

  9. ZifBASE: a database of zinc finger proteins and associated resources

    Directory of Open Access Journals (Sweden)

    Punetha Ankita

    2009-09-01

    databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. Conclusion A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.

  10. pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins

    Science.gov (United States)

    Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A. Keith; Felli, Isabella C.; Forman-Kay, Julie D.; Kriwacki, Richard W.; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I.; Uversky, Vladimir N.; Vendruscolo, Michele; Wishart, David; Wright, Peter E.; Tompa, Peter

    2014-01-01

    The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states. PMID:24174539

  11. High-performance computational analysis and peptide screening from databases of cyclotides from poaceae.

    Science.gov (United States)

    Porto, William F; Miranda, Vivian J; Pinto, Michelle F S; Dohms, Stephan M; Franco, Octavio L

    2016-01-01

    Cyclotides are a family of head-to-tail cyclized peptides containing three conserved disulfide bonds, in a structural scaffold also known as a cyclic cysteine knot. Due to the high degree of cysteine conservation, novel members from this peptide family can be identified in protein databases through a search through regular expression (REGEX). In this work, six novel cyclotide-like precursors from the Poaceae were identified from NCBI's non-redundant protein database by the use of REGEX. Two out of six sequences (named Zea mays L and M) showed an Asp residue in the C-terminal, which indicated that they could be cyclic. Gene expression in maize tissues was investigated, showing that the previously described cyclotide-like Z. mays J is expressed in the roots. According to molecular dynamics, the structure of Z. mays J seems to be stable, despite the putative absence of cyclization. As regards cyclotide evolution, it was hypothesized that this is an outcome from convergent evolution and/or horizontal gene transfer. The results showed that peptide screening from databases should be performed periodically in order to include novel sequences, which are deposited as the databases grow. Indeed, the advances in computational and experimental methods will together help to answer key questions and reach new horizons in defense-related peptide identification. © 2015 Wiley Periodicals, Inc.

  12. Use of the BioGRID Database for Analysis of Yeast Protein and Genetic Interactions.

    Science.gov (United States)

    Oughtred, Rose; Chatr-aryamontri, Andrew; Breitkreutz, Bobby-Joe; Chang, Christie S; Rust, Jennifer M; Theesfeld, Chandra L; Heinicke, Sven; Breitkreutz, Ashton; Chen, Daici; Hirschman, Jodi; Kolas, Nadine; Livstone, Michael S; Nixon, Julie; O'Donnell, Lara; Ramage, Lindsay; Winter, Andrew; Reguly, Teresa; Sellam, Adnane; Stark, Chris; Boucher, Lorrie; Dolinski, Kara; Tyers, Mike

    2016-01-04

    The BioGRID database is an extensive repository of curated genetic and protein interactions for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe, and the yeast Candida albicans SC5314, as well as for several other model organisms and humans. This protocol describes how to use the BioGRID website to query genetic or protein interactions for any gene of interest, how to visualize the associated interactions using an embedded interactive network viewer, and how to download data files for either selected interactions or the entire BioGRID interaction data set. © 2016 Cold Spring Harbor Laboratory Press.

  13. DbMDR: a relational database for multidrug resistance genes as potential drug targets.

    Science.gov (United States)

    Gupta, Sanchita; Mishra, Manoj; Sen, Naresh; Parihar, Rashi; Dwivedi, Gaurav Raj; Khan, Feroz; Sharma, Ashok

    2011-10-01

    DbMDR is non-redundant reference database of multidrug resistance (MDR) genes and their orthologs acting as potential drug targets. Drug resistance is a common phenomenon of pathogens, creating a serious problem of inactivation of drugs and antibiotics resulting in occurrence of diseases. Apart from other factors, the MDR genes present in pathogens are shown to be responsible for multidrug resistance. Much of the unorganized information on MDR genes is scattered across the literature and other web resources. Thus, consolidation of such knowledge about MDR genes into one database will make the drug discovery research more efficient. Mining of text for MDR genes has resulted into a large number of publications but in scattered and unorganized form. This information was compiled into a database, which enables a user not only to look at a particular MDR gene but also to find out putative homologs based on sequence similarity, conserved domains, and motifs in proteins encoded by MDR genes more efficiently. At present, DbMDR database contains 2843 MDR genes characterized experimentally as well as functionally annotated with cross-referencing search support. The DbMDR database (http://203.190.147.116/dbmdr/) is a comprehensive resource for comparative study focused on MDR genes and metabolic pathway efflux pumps and intended to provide a platform for researchers for further research in drug resistance.

  14. CentrosomeDB: a new generation of the centrosomal proteins database for Human and Drosophila melanogaster.

    Science.gov (United States)

    Alves-Cruzeiro, Joao Miguel da Conceiçao; Nogales-Cadenas, Rubén; Pascual-Montano, Alberto Domingo

    2014-01-01

    We present the second generation of centrosomeDB, available online at http://centrosome.cnb.csic.es, with a significant expansion of 1357 human and drosophila centrosomal genes and their corresponding information. The centrosome of animal cells takes part in important biological processes such as the organization of the interphase microtubule cytoskeleton and the assembly of the mitotic spindle. The active research done during the past decades has produced lots of data related to centrosomal proteins. Unfortunately, the accumulated data are dispersed among diverse and heterogeneous sources of information. We believe that the availability of a repository collecting curated evidences of centrosomal proteins would constitute a key resource for the scientific community. This was our first motivation to introduce CentrosomeDB in NAR database issue in 2009, collecting a set of human centrosomal proteins that were reported in the literature and other sources. The intensive use of this resource during these years has encouraged us to present this new expanded version. Using our database, the researcher is offered the possibility to study the evolution, function and structure of the centrosome. We have compiled information from many sources, including Gene Ontology, disease-association, single nucleotide polymorphisms and associated gene expression experiments. Special interest has been paid to protein-protein interaction.

  15. Interleukin-1beta induced changes in the protein expression of rat islets: a computerized database

    DEFF Research Database (Denmark)

    Andersen, H U; Fey, S J; Larsen, Peter Mose

    1997-01-01

    Insulin-dependent diabetes mellitus is caused by an autoimmune destruction of the beta-cells in the islets of Langerhans. The cytokine interleukin 1 inhibits insulin release and is selectively cytotoxic to beta-cells in isolated pancreatic rat islets. The antigen(s) triggering the immune response...... as well as the intracellular mechanisms of action of interleukin 1-mediated beta-cell cytotoxicity are unknown. However, previous studies have found an association of beta-cell destruction with alterations in protein synthesis. Thus, two-dimensional (2-D) gel electrophoresis of pancreatic islet proteins......% of %IOD was 45.7% in the NEPHGE gels. Addition of interleukin-1beta (IL-1beta) to the cultures resulted in statistically significant modulation or de novo synthesis of 105 proteins in the 10% gels. In conclusion, we present the first 10% and 15% acrylamide 2-D gel protein databases of neonatal rat islets...

  16. DSFL database: A hub of target proteins of Leishmania sp. to combat leishmaniasis

    Directory of Open Access Journals (Sweden)

    Ameer Khusro

    2017-07-01

    Full Text Available Leishmaniasis is a vector-borne chronic infectious tropical dermal disease caused by the protozoa parasite of the genus Leishmania that causes high mortality globally. Among three different clinical forms of leishmaniasis, visceral leishmaniasis (VL or kala-azar is a systemic public health disease with high morbidity and mortality in developing countries, caused by Leishmania donovani, Leishmania infantum or Leishmania chagasi. Unfortunately, there is no vaccine available till date for the treatment of leishmaniasis. On the other hand, the therapeutics approved to treat this fatal disease is expensive, toxic, and associated with serious side effects. Furthermore, the emergence of drug-resistant Leishmania parasites in most endemic countries due to the incessant utilization of existing drugs is a major concern at present. Drug Search for Leishmaniasis (DSFL is a unique database that involves 50 crystallized target proteins of varied Leishmania sp. in order to develop new drugs in future by interacting several antiparasitic compounds or molecules with specific protein through computational tools. The structure of target protein from different Leishmania sp. is available in this database. In this review, we spotlighted not only the current global status of leishmaniasis in brief but also detailed information about target proteins of various Leishmania sp. available in DSFL. DSFL has created a new expectation for mankind in order to combat leishmaniasis by targeting parasitic proteins and commence a new era to get rid of drug resistance parasites. The database will substantiate to be a worthwhile project for further development of new, non-toxic, and cost-effective antileishmanial drugs as targeted therapies using in vitro/in vivo assays.

  17. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  18. Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

    Science.gov (United States)

    Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

    2017-08-03

    Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants.

    Science.gov (United States)

    Draizen, Eli J; Shaytan, Alexey K; Mariño-Ramírez, Leonardo; Talbert, Paul B; Landsman, David; Panchenko, Anna R

    2016-01-01

    Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0.

  20. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database

    KAUST Repository

    Komatsu, Setsuko

    2017-05-10

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max ‘Enrei’). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. Biological significanceThe Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all

  1. Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

    Science.gov (United States)

    Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

    2017-06-23

    The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from

  2. PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.

    Science.gov (United States)

    Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki

    2010-08-25

    This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/

  3. ANGIOGENES: knowledge database for protein-coding and noncoding RNA genes in endothelial cells

    Science.gov (United States)

    Müller, Raphael; Weirick, Tyler; John, David; Militello, Giuseppe; Chen, Wei; Dimmeler, Stefanie; Uchida, Shizuka

    2016-09-01

    Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database “ANGIOGENES” (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis.

  4. Loops In Proteins (LIP)--a comprehensive loop database for homology modelling.

    Science.gov (United States)

    Michalsky, E; Goede, A; Preissner, R

    2003-12-01

    One of the most important and challenging tasks in protein modelling is the prediction of loops, as can be seen in the large variety of existing approaches. Loops In Proteins (LIP) is a database that includes all protein segments of a length up to 15 residues contained in the Protein Data Bank (PDB). In this study, the applicability of LIP to loop prediction in the framework of homology modelling is investigated. Searching the database for loop candidates takes less than 1 s on a desktop PC, and ranking them takes a few minutes. This is an order of magnitude faster than most existing procedures. The measure of accuracy is the root mean square deviation (RMSD) with respect to the main-chain atoms after local superposition of target loop and predicted loop. Loops of up to nine residues length were modelled with a local RMSD <1 A and those of length up to 14 residues with an accuracy better than 2 A. The results were compared in detail with a thoroughly evaluated and tested ab initio method published recently and additionally with two further methods for a small loop test set. The LIP method produced very good predictions. In particular for longer loops it outperformed other methods.

  5. ANGIOGENES: knowledge database for protein-coding and noncoding RNA genes in endothelial cells.

    Science.gov (United States)

    Müller, Raphael; Weirick, Tyler; John, David; Militello, Giuseppe; Chen, Wei; Dimmeler, Stefanie; Uchida, Shizuka

    2016-09-01

    Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database "ANGIOGENES" (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis.

  6. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

    Directory of Open Access Journals (Sweden)

    Bányai László

    2008-08-01

    Full Text Available Abstract Background Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii co-occurrence of extracellular and nuclear domains; (iv violation of domain integrity; (v chimeras encoded by two or more genes located on different chromosomes. Results Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis and two protostome species (Caenorhabditis elegans and Drosophila melanogaster have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON

  7. iLIR database: A web resource for LIR motif-containing proteins in eukaryotes

    Science.gov (United States)

    Jacomin, Anne-Claire; Samavedam, Siva; Promponas, Vasilis; Nezis, Ioannis P.

    2016-01-01

    ABSTRACT Atg8-family proteins are the best-studied proteins of the core autophagic machinery. They are essential for the elongation and closure of the phagophore into a proper autophagosome. Moreover, Atg8-family proteins are associated with the phagophore from the initiation of the autophagic process to, or just prior to, the fusion between autophagosomes with lysosomes. In addition to their implication in autophagosome biogenesis, they are crucial for selective autophagy through their ability to interact with selective autophagy receptor proteins necessary for the specific targeting of substrates for autophagic degradation. In the past few years it has been revealed that Atg8-interacting proteins include not only receptors but also components of the core autophagic machinery, proteins associated with vesicles and their transport, and specific proteins that are selectively degraded by autophagy. Atg8-interacting proteins contain a short linear LC3-interacting region/LC3 recognition sequence/Atg8-interacting motif (LIR/LRS/AIM) motif which is responsible for their interaction with Atg8-family proteins. These proteins are referred to as LIR-containing proteins (LIRCPs). So far, many experimental efforts have been carried out to identify new LIRCPs, leading to the characterization of some of them in the past 10 years. Given the need for the identification of LIRCPs in various organisms, we developed the iLIR database (https://ilir.warwick.ac.uk) as a freely available web resource, listing all the putative canonical LIRCPs identified in silico in the proteomes of 8 model organisms using the iLIR server, combined with a Gene Ontology (GO) term analysis. Additionally, a curated text-mining analysis of the literature permitted us to identify novel putative LICRPs in mammals that have not previously been associated with autophagy. PMID:27484196

  8. Alga-PrAS (Algal Protein Annotation Suite): A Database of Comprehensive Annotation in Algal Proteomes

    Science.gov (United States)

    Kurotani, Atsushi; Yamada, Yutaka

    2017-01-01

    Algae are smaller organisms than land plants and offer clear advantages in research over terrestrial species in terms of rapid production, short generation time and varied commercial applications. Thus, studies investigating the practical development of effective algal production are important and will improve our understanding of both aquatic and terrestrial plants. In this study we estimated multiple physicochemical and secondary structural properties of protein sequences, the predicted presence of post-translational modification (PTM) sites, and subcellular localization using a total of 510,123 protein sequences from the proteomes of 31 algal and three plant species. Algal species were broadly selected from green and red algae, glaucophytes, oomycetes, diatoms and other microalgal groups. The results were deposited in the Algal Protein Annotation Suite database (Alga-PrAS; http://alga-pras.riken.jp/), which can be freely accessed online. PMID:28069893

  9. Analysis of residue conformations in peptides in Cambridge structural database and protein-peptide structural complexes.

    Science.gov (United States)

    Raghavender, Upadhyayula Surya

    2017-03-01

    A comprehensive statistical analysis of the geometric parameters of peptide chains in a reduced dataset of protein-peptide complexes in Protein Data Bank (PDB) is presented. The angular variables describing the backbone conformations of amino acid residues in peptide chains shed insights into the conformational preferences of peptide residues interacting with protein partners. Nonparametric statistical approaches are employed to evaluate the interrelationships and associations in structural variables. Grouping of residues based on their structure into chemical classes reveals characteristic trends in parameter relationships. A comparison of canonical amino acid residues in free peptide structures in Cambridge structural database (CSD) with identical residues in PDB complexes, suggests that the information can be integrated from both the structural repositories enabling efficient and accurate modeling of biologically active peptides. © 2016 John Wiley & Sons A/S.

  10. Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

    Science.gov (United States)

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-04-21

    To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease

  11. Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

    Directory of Open Access Journals (Sweden)

    Mixon Mark

    2009-04-01

    Full Text Available Abstract Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene

  12. Neutron cross-sections database for amino acids and proteins analysis

    Energy Technology Data Exchange (ETDEWEB)

    Voi, Dante L.; Ferreira, Francisco de O.; Nunes, Rogerio Chaffin, E-mail: dante@ien.gov.br, E-mail: fferreira@ien.gov.br, E-mail: Chaffin@ien.gov.br [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Rocha, Helio F. da, E-mail: hrocha@gbl.com.br [Universidade Federal do Rio de Janeiro (IPPMG/UFRJ), Rio de Janeiro, RJ (Brazil). Instituto de Pediatria

    2015-07-01

    Biological materials may be studied using neutrons as an unconventional tool of analysis. Dynamics and structures data can be obtained for amino acids, protein and others cellular components by neutron cross sections determinations especially for applications in nuclear purity and conformation analysis. The instrument used for this is the crystal spectrometer of the Instituto de Engenharia Nuclear (IEN-CNEN-RJ), the only one in Latin America that uses neutrons for this type of analyzes and it is installed in one of the reactor Argonauta irradiation channels. The experimentally values obtained are compared with calculated values using literature data with a rigorous analysis of the chemical composition, conformation and molecular structure analysis of the materials. A neutron cross-section database was constructed to assist in determining molecular dynamic, structure and formulae of biological materials. The database contains neutron cross-sections values of all amino acids, chemical elements, molecular groups, auxiliary radicals, as well as values of constants and parameters necessary for the analysis. An unprecedented analytical procedure was developed using the neutron cross section parceling and grouping method for data manipulation. This database is a result of measurements obtained from twenty amino acids that were provided by different manufactories and are used in oral administration in hospital individuals for nutritional applications. It was also constructed a small data file of compounds with different molecular groups including carbon, nitrogen, sulfur and oxygen, all linked to hydrogen atoms. A review of global and national scene in the acquisition of neutron cross sections data, the formation of libraries and the application of neutrons for analyzing biological materials is presented. This database has further application in protein analysis and the neutron cross-section from the insulin was estimated. (author)

  13. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

    Science.gov (United States)

    O'Leary, Nuala A.; Wright, Mathew W.; Brister, J. Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M.; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S.; Kodali, Vamsi K.; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M.; Murphy, Michael R.; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H.; Rausch, Daniel; Riddick, Lillian D.; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S.; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E.; Vatsan, Anjana R.; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J.; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D.; Pruitt, Kim D.

    2016-01-01

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. PMID:26553804

  14. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

    Science.gov (United States)

    O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

    2016-01-04

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

  15. SPODOBASE : an EST database for the lepidopteran crop pest Spodoptera

    Directory of Open Access Journals (Sweden)

    Sabourault Cécile

    2006-06-01

    Full Text Available Abstract Background The Lepidoptera Spodoptera frugiperda is a pest which causes widespread economic damage on a variety of crop plants. It is also well known through its famous Sf9 cell line which is used for numerous heterologous protein productions. Species of the Spodoptera genus are used as model for pesticide resistance and to study virus host interactions. A genomic approach is now a critical step for further new developments in biology and pathology of these insects, and the results of ESTs sequencing efforts need to be structured into databases providing an integrated set of tools and informations. Description The ESTs from five independent cDNA libraries, prepared from three different S. frugiperda tissues (hemocytes, midgut and fat body and from the Sf9 cell line, are deposited in the database. These tissues were chosen because of their importance in biological processes such as immune response, development and plant/insect interaction. So far, the SPODOBASE contains 29,325 ESTs, which are cleaned and clustered into non-redundant sets (2294 clusters and 6103 singletons. The SPODOBASE is constructed in such a way that other ESTs from S. frugiperda or other species may be added. User can retrieve information using text searches, pre-formatted queries, query assistant or blast searches. Annotation is provided against NCBI, UNIPROT or Bombyx mori ESTs databases, and with GO-Slim vocabulary. Conclusion The SPODOBASE database provides integrated access to expressed sequence tags (EST from the lepidopteran insect Spodoptera frugiperda. It is a publicly available structured database with insect pest sequences which will allow identification of a number of genes and comprehensive cloning of gene families of interest for scientific community. SPODOBASE is available from URL: http://bioweb.ensam.inra.fr/spodobase

  16. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles

    DEFF Research Database (Denmark)

    Mathelier, Anthony; Fornes, Oriol; Arenillas, David J;

    2016-01-01

    JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we...

  17. Exploring the Ligand-Protein Networks in Traditional Chinese Medicine: Current Databases, Methods, and Applications

    Directory of Open Access Journals (Sweden)

    Mingzhu Zhao

    2013-01-01

    Full Text Available The traditional Chinese medicine (TCM, which has thousands of years of clinical application among China and other Asian countries, is the pioneer of the “multicomponent-multitarget” and network pharmacology. Although there is no doubt of the efficacy, it is difficult to elucidate convincing underlying mechanism of TCM due to its complex composition and unclear pharmacology. The use of ligand-protein networks has been gaining significant value in the history of drug discovery while its application in TCM is still in its early stage. This paper firstly surveys TCM databases for virtual screening that have been greatly expanded in size and data diversity in recent years. On that basis, different screening methods and strategies for identifying active ingredients and targets of TCM are outlined based on the amount of network information available, both on sides of ligand bioactivity and the protein structures. Furthermore, applications of successful in silico target identification attempts are discussed in detail along with experiments in exploring the ligand-protein networks of TCM. Finally, it will be concluded that the prospective application of ligand-protein networks can be used not only to predict protein targets of a small molecule, but also to explore the mode of action of TCM.

  18. SuperLigands – a database of ligand structures derived from the Protein Data Bank

    Directory of Open Access Journals (Sweden)

    Preissner Robert

    2005-05-01

    Full Text Available Abstract Background Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function. Description Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients. Conclusion SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research.

  19. SuperLigands – a database of ligand structures derived from the Protein Data Bank

    Science.gov (United States)

    Michalsky, Elke; Dunkel, Mathias; Goede, Andrean; Preissner, Robert

    2005-01-01

    Background Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function. Description Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients. Conclusion SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research. PMID:15943884

  20. KEGG as a reference resource for gene and protein annotation.

    Science.gov (United States)

    Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki; Furumichi, Miho; Tanabe, Mao

    2016-01-04

    KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.

  1. CLIPZ: a database and analysis environment for experimentally determined binding sites of RNA-binding proteins.

    Science.gov (United States)

    Khorshid, Mohsen; Rodak, Christoph; Zavolan, Mihaela

    2011-01-01

    The stability, localization and translation rate of mRNAs are regulated by a multitude of RNA-binding proteins (RBPs) that find their targets directly or with the help of guide RNAs. Among the experimental methods for mapping RBP binding sites, cross-linking and immunoprecipitation (CLIP) coupled with deep sequencing provides transcriptome-wide coverage as well as high resolution. However, partly due to their vast volume, the data that were so far generated in CLIP experiments have not been put in a form that enables fast and interactive exploration of binding sites. To address this need, we have developed the CLIPZ database and analysis environment. Binding site data for RBPs such as Argonaute 1-4, Insulin-like growth factor II mRNA-binding protein 1-3, TNRC6 proteins A-C, Pumilio 2, Quaking and Polypyrimidine tract binding protein can be visualized at the level of the genome and of individual transcripts. Individual users can upload their own sequence data sets while being able to limit the access to these data to specific users, and analyses of the public and private data sets can be performed interactively. CLIPZ, available at http://www.clipz.unibas.ch, aims to provide an open access repository of information for post-transcriptional regulatory elements.

  2. GFExtractor:事件序列上有效挖掘无冗余情节规则的算法%GFExtractor:algorithm of mining non-redundant episode rules effectively in event sequence

    Institute of Scientific and Technical Information of China (English)

    袁红娟

    2013-01-01

    事件序列上挖掘情节规则,旨在发现情节之间的因果关系。基于非重叠的最小发生的支持度定义及深度优先搜索策略,提出在事件序列上挖掘无冗余情节规则的GFExtractor算法。利用非生成子情节的剪枝策略,淘汰非生成子情节;利用向前、向后扩展检查,淘汰非闭情节;最终在情节生成子集Gen与频繁闭情节集FCE之间产生无冗余的情节规则。实验结果证实了算法在事件序列上挖掘无冗余情节规则的有效性。%Mining episode rules in event sequence aims to discover the causal relationship between the episodes. To mine non-redundant episode rules in event sequence, the algorithm of GFExtractor is proposed in this paper, based on the support defi-nition of non-overlapping minimal occurrences and the depth-first search strategy. GFExtractor uses the pruning technology to eliminate non-generator episodes, and uses the forward and backward extension check to eliminate non-closed episodes. Non-redundant episode rules are generated between a superset of Gen and FCE. Experimental results confirm the validity of algo-rithm in mining non-redundant episode rules in event sequence.

  3. deconSTRUCT: general purpose protein database search on the substructure level.

    Science.gov (United States)

    Zhang, Zong Hong; Bharatham, Kavitha; Sherman, Westley A; Mihalek, Ivana

    2010-07-01

    deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.

  4. Human Gene and Protein Database (HGPD): a novel database presenting a large quantity of experiment-based results in human proteomics.

    Science.gov (United States)

    Maruyama, Yukio; Wakamatsu, Ai; Kawamura, Yoshifumi; Kimura, Kouichi; Yamamoto, Jun-ichi; Nishikawa, Tetsuo; Kisu, Yasutomo; Sugano, Sumio; Goshima, Naoki; Isogai, Takao; Nomura, Nobuo

    2009-01-01

    Completion of human genome sequencing has greatly accelerated functional genomic research. Full-length cDNA clones are essential experimental tools for functional analysis of human genes. In one of the projects of the New Energy and Industrial Technology Development Organization (NEDO) in Japan, the full-length human cDNA sequencing project (FLJ project), nucleotide sequences of approximately 30 000 human cDNA clones have been analyzed. The Gateway system is a versatile framework to construct a variety of expression clones for various experiments. We have constructed 33 275 human Gateway entry clones from full-length cDNAs, representing to our knowledge the largest collection in the world. Utilizing these clones with a highly efficient cell-free protein synthesis system based on wheat germ extract, we have systematically and comprehensively produced and analyzed human proteins in vitro. Sequence information for both amino acids and nucleotides of open reading frames of cDNAs cloned into Gateway entry clones and in vitro expression data using those clones can be retrieved from the Human Gene and Protein Database (HGPD, http://www.HGPD.jp). HGPD is a unique database that stores the information of a set of human Gateway entry clones and protein expression data and helps the user to search the Gateway entry clones.

  5. The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

    DEFF Research Database (Denmark)

    Celis, J E; Leffers, H; Rasmussen, H H;

    1991-01-01

    The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus un...

  6. Transcriptomic and Proteomic Analysis of Arion vulgaris--Proteins for Probably Successful Survival Strategies?

    Science.gov (United States)

    Bulat, Tanja; Smidak, Roman; Sialana, Fernando J; Jung, Gangsoo; Rattei, Thomas; Bilban, Martin; Sattmann, Helmut; Lubec, Gert; Aradska, Jana

    2016-01-01

    The Spanish slug, Arion vulgaris, is considered one of the hundred most invasive species in Central Europe. The immense and very successful adaptation and spreading of A. vulgaris suggest that it developed highly effective mechanisms to deal with infections and natural predators. Current transcriptomic and proteomic studies on gastropods have been restricted mainly to marine and freshwater gastropods. No transcriptomic or proteomic study on A. vulgaris has been carried out so far, and in the current study, the first transcriptomic database from adult specimen of A. vulgaris is reported. To facilitate and enable proteomics in this non-model organism, a mRNA-derived protein database was constructed for protein identification. A gel-based proteomic approach was used to obtain the first generation of a comprehensive slug mantle proteome. A total of 2128 proteins were unambiguously identified; 48 proteins represent novel proteins with no significant homology in NCBI non-redundant database. Combined transcriptomic and proteomic analysis revealed an extensive repertoire of novel proteins with a role in innate immunity including many associated pattern recognition, effector proteins and cytokine-like proteins. The number and diversity in gene families encoding lectins point to a complex defense system, probably as a result of adaptation to a pathogen-rich environment. These results are providing a fundamental and important resource for subsequent studies on molluscs as well as for putative antimicrobial compounds for drug discovery and biomedical applications.

  7. Transcriptomic and Proteomic Analysis of Arion vulgaris—Proteins for Probably Successful Survival Strategies?

    Science.gov (United States)

    Bulat, Tanja; Smidak, Roman; Sialana, Fernando J.; Jung, Gangsoo; Rattei, Thomas; Bilban, Martin; Sattmann, Helmut; Lubec, Gert; Aradska, Jana

    2016-01-01

    The Spanish slug, Arion vulgaris, is considered one of the hundred most invasive species in Central Europe. The immense and very successful adaptation and spreading of A. vulgaris suggest that it developed highly effective mechanisms to deal with infections and natural predators. Current transcriptomic and proteomic studies on gastropods have been restricted mainly to marine and freshwater gastropods. No transcriptomic or proteomic study on A. vulgaris has been carried out so far, and in the current study, the first transcriptomic database from adult specimen of A. vulgaris is reported. To facilitate and enable proteomics in this non-model organism, a mRNA-derived protein database was constructed for protein identification. A gel-based proteomic approach was used to obtain the first generation of a comprehensive slug mantle proteome. A total of 2128 proteins were unambiguously identified; 48 proteins represent novel proteins with no significant homology in NCBI non-redundant database. Combined transcriptomic and proteomic analysis revealed an extensive repertoire of novel proteins with a role in innate immunity including many associated pattern recognition, effector proteins and cytokine-like proteins. The number and diversity in gene families encoding lectins point to a complex defense system, probably as a result of adaptation to a pathogen-rich environment. These results are providing a fundamental and important resource for subsequent studies on molluscs as well as for putative antimicrobial compounds for drug discovery and biomedical applications. PMID:26986963

  8. Transcriptomic and Proteomic Analysis of Arion vulgaris--Proteins for Probably Successful Survival Strategies?

    Directory of Open Access Journals (Sweden)

    Tanja Bulat

    Full Text Available The Spanish slug, Arion vulgaris, is considered one of the hundred most invasive species in Central Europe. The immense and very successful adaptation and spreading of A. vulgaris suggest that it developed highly effective mechanisms to deal with infections and natural predators. Current transcriptomic and proteomic studies on gastropods have been restricted mainly to marine and freshwater gastropods. No transcriptomic or proteomic study on A. vulgaris has been carried out so far, and in the current study, the first transcriptomic database from adult specimen of A. vulgaris is reported. To facilitate and enable proteomics in this non-model organism, a mRNA-derived protein database was constructed for protein identification. A gel-based proteomic approach was used to obtain the first generation of a comprehensive slug mantle proteome. A total of 2128 proteins were unambiguously identified; 48 proteins represent novel proteins with no significant homology in NCBI non-redundant database. Combined transcriptomic and proteomic analysis revealed an extensive repertoire of novel proteins with a role in innate immunity including many associated pattern recognition, effector proteins and cytokine-like proteins. The number and diversity in gene families encoding lectins point to a complex defense system, probably as a result of adaptation to a pathogen-rich environment. These results are providing a fundamental and important resource for subsequent studies on molluscs as well as for putative antimicrobial compounds for drug discovery and biomedical applications.

  9. PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

    Directory of Open Access Journals (Sweden)

    Adeel Malik

    2010-01-01

    Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

  10. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    Science.gov (United States)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  11. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    Science.gov (United States)

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  12. Completion of HLA protein sequences by automated homology-based nearest-neighbor extrapolation of HLA database sequences

    NARCIS (Netherlands)

    Geneugelijk, K; Niemann, M; de Hoop, T; Spierings, E

    The IMGT/HLA database contains every publicly available HLA sequence. However, most of these HLA protein sequences are restricted to the alpha-1/alpha-2 domain for HLA class-I and alpha-1/beta-1 domain for HLA class-II. Nevertheless, also polymorphism outside these domains may play a role in

  13. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    Science.gov (United States)

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  14. The human keratinocyte two-dimensional gel protein database (update 1995): mapping components of signal transduction pathways

    DEFF Research Database (Denmark)

    Celis, J E; Rasmussen, H H; Gromov, P

    1995-01-01

    The master two-dimensional (2-D) gel database of human keratinocytes currently lists 3154 cellular proteins (2224 isoelectric focusing, IEF; and 930 nonequilibrium pH gradient electrophoresis, NEPHGE), many of which correspond to post-translational modifications. 1082 polypeptides have been ident...

  15. O-GLYCBASE version 2.0: a revised database of O-glycosylated proteins

    DEFF Research Database (Denmark)

    Hansen, Jan; Lund, Ole; Rapacki, Kristoffer;

    1997-01-01

    O-GLYCBASE is an updated database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the literature, and from the SWISS-PROT database. Entries include information about species, sequence, glycosylation sites and glycan type. O-GLYCBASE is...... patterns for the GalNAc, mannose and GlcNAc transferases are shown. The O-GLYCBASE database is available through WWW or by anonymous FTP....

  16. Database of two-dimensional polyacrylamide gel electrophoresis of proteins labeled with CyDye DIGE Fluor saturation dye.

    Science.gov (United States)

    Fujii, Kazuyasu; Kondo, Tadashi; Yokoo, Hideki; Okano, Tetsuya; Yamada, Masayo; Yamada, Tesshi; Iwatsuki, Keiji; Hirohashi, Setsuo

    2006-03-01

    CyDye DIGE Fluor saturation dye (saturation dye, GE Healthcare Amersham Biosciences) enables highly sensitive 2-D PAGE. As the dye reacts with all reduced cysteine thiols, 2-D PAGE can be performed with a lower amount of protein, compared with CyDye DIGE Fluor minimal dye (GE Healthcare Amersham Biosciences), the sensitivity of which is equivalent to that of silver staining. We constructed a 2-D map of the saturation dye-labeled proteins of a liver cancer cell line (HepG2) and identified by MS 92 proteins corresponding to 123 protein spots. Functional classification revealed that the identified proteins had chaperone, protein binding, nucleotide binding, metal ion binding, isomerase activity, and motor activity. The functional distribution and the cysteine contents of the proteins were similar to those in the most comprehensive 2-D database of hepatoma cells (Seow et al.., Electrophoresis 2000, 21, 1787-1813), where silver staining was used for protein visualization. Hierarchical clustering on the basis of the quantitative expression profiles of the 123 characterized spots labeled with two charge- and mass-matched saturation dyes (Cy3 and Cy5) discriminated between nine hepatocellular carcinoma cell lines and primary cultured hepatocytes from five individuals, suggesting the utility of saturation dye and our database for proteomic studies of liver cancer.

  17. The importance of recognizing and reporting sequence database contamination for proteomics

    Directory of Open Access Journals (Sweden)

    Olivier Pible

    2014-06-01

    Full Text Available Advances in genome sequencing have made proteomic experiments more successful than ever. However, not all entries in a sequence database are of equal quality. Genome sequences are contaminated more frequently than is admitted. Contamination impacts homology-based proteomic, proteogenomic, and metaproteomic results. We highlight two examples in the National Center for Biotechnology Information non-redundant database (NCBInr that are likely contaminated: the bacterium Enterococcus gallinarum EGD-AAK12 and the insect Ceratitis capitata. We hope to incite users of this and other databases to critically evaluate submitted sequences and to contribute to the overall quality of the database by signaling potential errors when possible.

  18. O-GLYCBASE: a revised database of O-glycosylated proteins

    DEFF Research Database (Denmark)

    Hansen, Jan; Lund, Ole; Nielsen, Jens O.

    1996-01-01

    O-GLYCBASE is a comprehensive database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the SWISS-PROT and PIR databases as well as directly from recently published reports. Nineteen percent of the entries extracted from the databases n...... of mucin type O-glycosylation sites in mammalian glycoproteins exclusively from the primary sequence is made available by E-mail or WWW. The O-GLYCBASE database is also available electronically through our WWW server or by anonymous FTP....

  19. Optimization of diagnostic strategy in non-redundant multi-fault system based on probability threshold%基于概率阈的非冗余多故障系统诊断策略优化

    Institute of Scientific and Technical Information of China (English)

    朱海鹏; 景博; 黄以锋; 苏俊阳

    2012-01-01

    针对传统的单故障假设无法诊断复杂系统多故障并发的情况,提出了一种基于概率阈的非冗余系统多故障诊断策略.首先对系统的相关性模型进行扩展,并删除低于概率阈的故障状态,建立非冗余系统的多故障测试诊断模型;其次在信息熵算法的基础上建立Rollout算法,获得了最优测试序列;然后建立故障诊断树并计算测试代价;最后以某机载电子系统为例验证了该方法的有效性.该方法可以在保证测试费用最小的情况下获得非冗余系统的最优测试序列.%This paper propsoed a multiple fault diagnostic strategy for non-redundant system based on probability threshold. Firstly,it expanded system dependency models, removed fault states which were below probability threshold, and established multiple fault diagnostic models of non-redundant system. Then using entropy algorithm as base algorithm, it presented Rollout algorithm to obtain best test sequence. At last.it built the fault diagnosis tree and calculated test cost. It put forward an airborne electronic system to prove its effectiveness. It can obtain the best test sequence by minimums cost test.

  20. SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    Science.gov (United States)

    Walsh, Thomas P; Webber, Caleb; Searle, Stephen; Sturrock, Shane S; Barton, Geoffrey J

    2008-07-01

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.

  1. Oligomeric protein structure networks: insights into protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Brinda KV

    2005-12-01

    Full Text Available Abstract Background Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues with special emphasis to protein interfaces. Results A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb. A few predictions of interface hot

  2. O-GLYCBASE version 3.0: a revised database of O-glycosylated proteins

    DEFF Research Database (Denmark)

    Hansen, Jan; Lund, Ole; Nilsson, Jette;

    1998-01-01

    cross-referenced. Compared to version 2.0 the number of entries has increased by 20%. Sequence logos displaying the acceptor specificity patterns for the GalNAc, mannose and GlcNAc transferases are shown. The O-GLYCBASE database is available through the WWW at http://www.cbs.dtu.dk/databases/OGLYCBASE/...

  3. Collision cross sections of proteins and their complexes: a calibration framework and database for gas-phase structural biology.

    Science.gov (United States)

    Bush, Matthew F; Hall, Zoe; Giles, Kevin; Hoyes, John; Robinson, Carol V; Ruotolo, Brandon T

    2010-11-15

    Collision cross sections in both helium and nitrogen gases were measured directly using a drift cell with RF ion confinement inserted within a quadrupole/ion mobility/time-of-flight hybrid mass spectrometer (Waters Synapt HDMS, Manchester, U.K.). Collision cross sections for a large set of denatured peptide, denatured protein, native-like protein, and native-like protein complex ions are reported here, forming a database of collision cross sections that spans over 2 orders of magnitude. The average effective density of the native-like ions is 0.6 g cm(-3), which is significantly lower than that for the solvent-excluded regions of proteins and suggests that these ions can retain significant memory of their solution-phase structures rather than collapse to globular structures. Because the measurements are acquired using an instrument that mimics the geometry of the commercial Synapt HDMS instrument, this database enables the determination of highly accurate collision cross sections from traveling-wave ion mobility data through the use of calibration standards with similar masses and mobilities. Errors in traveling-wave collision cross sections determined for native-like protein complexes calibrated using other native-like protein complexes are significantly less than those calibrated using denatured proteins. This database indicates that collision cross sections in both helium and nitrogen gases can be well-correlated for larger biomolecular ions, but non-correlated differences for smaller ions can be more significant. These results enable the generation of more accurate three-dimensional models of protein and other biomolecular complexes using gas-phase structural biology techniques.

  4. Construction and analysis of a plant non-specific lipid transfer protein database (nsLTPDB

    Directory of Open Access Journals (Sweden)

    Wang Nai-Jyuan

    2012-01-01

    Full Text Available Abstract Background Plant non-specific lipid transfer proteins (nsLTPs are small and basic proteins. Recently, nsLTPs have been reported involved in many physiological functions such as mediating phospholipid transfer, participating in plant defence activity against bacterial and fungal pathogens, and enhancing cell wall extension in tobacco. However, the lipid transfer mechanism of nsLTPs is still unclear, and comprehensive information of nsLTPs is difficult to obtain. Methods In this study, we identified 595 nsLTPs from 121 different species and constructed an nsLTPs database -- nsLTPDB -- which comprises the sequence information, structures, relevant literatures, and biological data of all plant nsLTPs http://nsltpdb.life.nthu.edu.tw/. Results Meanwhile, bioinformatics and statistics methods were implemented to develop a classification method for nsLTPs based on the patterns of the eight highly-conserved cysteine residues, and to suggest strict Prosite-styled patterns for Type I and Type II nsLTPs. The pattern of Type I is C X2 V X5-7 C [V, L, I] × Y [L, A, V] X8-13 CC × G X12 D × [Q, K, R] X2 CXC X16-21 P X2 C X13-15C, and that of Type II is C X4 L X2 C X9-11 P [S, T] X2 CC X5 Q X2-4 C[L, F]C X2 [A, L, I] × [D, N] P X10-12 [K, R] X4-5 C X3-4 P X0-2 C. Moreover, we referred the Prosite-styled patterns to the experimental mutagenesis data that previously established by our group, and found that the residues with higher conservation played an important role in the structural stability or lipid binding ability of nsLTPs. Conclusions Taken together, this research has suggested potential residues that might be essential to modulate the structural and functional properties of plant nsLTPs. Finally, we proposed some biologically important sites of the nsLTPs, which are described by using a new Prosite-styled pattern that we defined.

  5. Knockout of the PKN family of Rho effector kinases reveals a non-redundant role for PKN2 in developmental mesoderm expansion

    OpenAIRE

    Ivan Quétier; Jacqueline J.T. Marshall; Bradley Spencer-Dene; Sylvie Lachmann; Adele Casamassima; Claudio Franco; Sarah Escuin; Joseph T. Worrall; Priththivika Baskaran; Vinothini Rajeeve; Michael Howell; Andrew J. Copp; Gordon Stamp; Ian Rosewell; Pedro Cutillas

    2016-01-01

    Summary In animals, the protein kinase C (PKC) family has expanded into diversely regulated subgroups, including the Rho family-responsive PKN kinases. Here, we describe knockouts of all three mouse PKN isoforms and reveal that PKN2 loss results in lethality at embryonic day 10 (E10), with associated cardiovascular and morphogenetic defects. The cardiovascular phenotype was not recapitulated by conditional deletion of PKN2 in endothelial cells or the developing heart. In contrast, inducible s...

  6. Knockout of the PKN Family of Rho Effector Kinases Reveals a Non-redundant Role for PKN2 in Developmental Mesoderm Expansion

    OpenAIRE

    Quétier, I.; Marshall, J. J.; Spencer-Dene, B.; Lachmann, S.; Casamassima, A.; Franco, C.; Escuin, S.; Worrall, J. T.; Baskaran, P.; Rajeeve, V.; Howell, M.; Copp, A. J.; Stamp, G.; Rosewell, I.; Cutillas, P.

    2016-01-01

    In animals, the protein kinase C (PKC) family has expanded into diversely regulated subgroups, including the Rho family-responsive PKN kinases. Here, we describe knockouts of all three mouse PKN isoforms and reveal that PKN2 loss results in lethality at embryonic day 10 (E10), with associated cardiovascular and morphogenetic defects. The cardiovascular phenotype was not recapitulated by conditional deletion of PKN2 in endothelial cells or the developing heart. In contrast, inducible systemic ...

  7. The human keratinocyte two-dimensional protein database (update 1994): towards an integrated approach to the study of cell proliferation, differentiation and skin diseases

    DEFF Research Database (Denmark)

    Celis, J E; Rasmussen, H H; Olsen, E

    1994-01-01

    The master two-dimensional (2-D) gel database of human keratinocytes currently lists 3087 cellular proteins (2168 isoelectric focusing, IEF; and 919 none-quilibrium pH gradient electrophoresis, NEPHGE), many of which correspond to posttranslational modifications, 890 polypeptides have been...... identified (protein name, organelle components, etc.) using one or a combination of procedures that include (i) comigration with known human proteins, (ii) 2-D gel immunoblotting using specific antibodies (iii) microsequencing of Coomassie Brilliant Blue stained proteins, (iv) mass spectrometry and (v...... in the database. We also report a database of proteins recovered from the medium of noncultured, unfractionated keratinocytes. This database lists 398 polypeptides (309 IEF; 89 NEPHGE) of which 76 have been identified. The aim of the comprehensive databases is to gather, through a systematic study...

  8. The human interactome knowledge base (hint-kb): An integrative human protein interaction database enriched with predicted protein–protein interaction scores using a novel hybrid technique

    KAUST Repository

    Theofilatos, Konstantinos A.

    2013-07-12

    Proteins are the functional components of many cellular processes and the identification of their physical protein–protein interactions (PPIs) is an area of mature academic research. Various databases have been developed containing information about experimentally and computationally detected human PPIs as well as their corresponding annotation data. However, these databases contain many false positive interactions, are partial and only a few of them incorporate data from various sources. To overcome these limitations, we have developed HINT-KB (http://biotools.ceid.upatras.gr/hint-kb/), a knowledge base that integrates data from various sources, provides a user-friendly interface for their retrieval, cal-culatesasetoffeaturesofinterest and computesaconfidence score for every candidate protein interaction. This confidence score is essential for filtering the false positive interactions which are present in existing databases, predicting new protein interactions and measuring the frequency of each true protein interaction. For this reason, a novel machine learning hybrid methodology, called (Evolutionary Kalman Mathematical Modelling—EvoKalMaModel), was used to achieve an accurate and interpretable scoring methodology. The experimental results indicated that the proposed scoring scheme outperforms existing computational methods for the prediction of PPIs.

  9. Sting_RDB: a relational database of structural parameters for protein analysis with support for data warehousing and data mining.

    Science.gov (United States)

    Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G

    2007-10-05

    An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.

  10. Proteins in similarity relationship with the cluster - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Gclust Server Proteins in similarity relationship with the cluster Data detail Data name Proteins in similarity relationship wit...t Us Proteins in similarity relationship with the cluster - Gclust Server | LSDB Archive ...

  11. MELOGEN: an EST database for melon functional genomics

    Directory of Open Access Journals (Sweden)

    Puigdomènech Pere

    2007-09-01

    Full Text Available Abstract Background Melon (Cucumis melo L. is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs and 10,614 unclustered sequences (singletons. Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs and 356 single nucleotide polymorphisms (SNPs were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of

  12. Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteins

    DEFF Research Database (Denmark)

    Diella, F.; Cameron, S.; Gemund, C.

    2004-01-01

    need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins. Description: Phospho. ELM http://phospho.elm.eu.org is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed...... to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho. ELM version 2.0 contains 1703 phosphorylation site...

  13. Knockout of the PKN Family of Rho Effector Kinases Reveals a Non-redundant Role for PKN2 in Developmental Mesoderm Expansion

    Directory of Open Access Journals (Sweden)

    Ivan Quétier

    2016-01-01

    Full Text Available In animals, the protein kinase C (PKC family has expanded into diversely regulated subgroups, including the Rho family-responsive PKN kinases. Here, we describe knockouts of all three mouse PKN isoforms and reveal that PKN2 loss results in lethality at embryonic day 10 (E10, with associated cardiovascular and morphogenetic defects. The cardiovascular phenotype was not recapitulated by conditional deletion of PKN2 in endothelial cells or the developing heart. In contrast, inducible systemic deletion of PKN2 after E7 provoked collapse of the embryonic mesoderm. Furthermore, mouse embryonic fibroblasts, which arise from the embryonic mesoderm, depend on PKN2 for proliferation and motility. These cellular defects are reflected in vivo as dependence on PKN2 for mesoderm proliferation and neural crest migration. We conclude that failure of the mesoderm to expand in the absence of PKN2 compromises cardiovascular integrity and development, resulting in lethality.

  14. Knockout of the PKN Family of Rho Effector Kinases Reveals a Non-redundant Role for PKN2 in Developmental Mesoderm Expansion.

    Science.gov (United States)

    Quétier, Ivan; Marshall, Jacqueline J T; Spencer-Dene, Bradley; Lachmann, Sylvie; Casamassima, Adele; Franco, Claudio; Escuin, Sarah; Worrall, Joseph T; Baskaran, Priththivika; Rajeeve, Vinothini; Howell, Michael; Copp, Andrew J; Stamp, Gordon; Rosewell, Ian; Cutillas, Pedro; Gerhardt, Holger; Parker, Peter J; Cameron, Angus J M

    2016-01-26

    In animals, the protein kinase C (PKC) family has expanded into diversely regulated subgroups, including the Rho family-responsive PKN kinases. Here, we describe knockouts of all three mouse PKN isoforms and reveal that PKN2 loss results in lethality at embryonic day 10 (E10), with associated cardiovascular and morphogenetic defects. The cardiovascular phenotype was not recapitulated by conditional deletion of PKN2 in endothelial cells or the developing heart. In contrast, inducible systemic deletion of PKN2 after E7 provoked collapse of the embryonic mesoderm. Furthermore, mouse embryonic fibroblasts, which arise from the embryonic mesoderm, depend on PKN2 for proliferation and motility. These cellular defects are reflected in vivo as dependence on PKN2 for mesoderm proliferation and neural crest migration. We conclude that failure of the mesoderm to expand in the absence of PKN2 compromises cardiovascular integrity and development, resulting in lethality.

  15. Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria

    OpenAIRE

    2015-01-01

    We have determined refined multidimensional chemical shift ranges for intra-residue correlations ([superscript 13]C–[superscript 13]C, [superscript 15]N–[superscript 13]C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 [superscript 13]C chemical shifts and >3 million chemical...

  16. eProS--a database and toolbox for investigating protein sequence-structure-function relationships through energy profiles.

    Science.gov (United States)

    Heinke, Florian; Schildbach, Stefan; Stockmann, Daniel; Labudde, Dirk

    2013-01-01

    Gaining information about structural and functional features of newly identified proteins is often a difficult task. This information is crucial for understanding sequence-structure-function relationships of target proteins and, thus, essential in comprehending the mechanisms and dynamics of the molecular systems of interest. Using protein energy profiles is a novel approach that can contribute in addressing such problems. An energy profile corresponds to the sequence of energy values that are derived from a coarse-grained energy model. Energy profiles can be computed from protein structures or predicted from sequences. As shown, correspondences and dissimilarities in energy profiles can be applied for investigations of protein mechanics and dynamics. We developed eProS (energy profile suite, freely available at http://bioservices.hs-mittweida.de/Epros/), a database that provides ∼76 000 pre-calculated energy profiles as well as a toolbox for addressing numerous problems of structure biology. Energy profiles can be browsed, visualized, calculated from an uploaded structure or predicted from sequence. Furthermore, it is possible to align energy profiles of interest or compare them with all entries in the eProS database to identify significantly similar energy profiles and, thus, possibly relevant structural and functional relationships. Additionally, annotations and cross-links from numerous sources provide a broad view of potential biological correspondences.

  17. In silico re-identification of properties of drug target proteins.

    Science.gov (United States)

    Kim, Baeksoo; Jo, Jihoon; Han, Jonghyun; Park, Chungoo; Lee, Hyunju

    2017-05-31

    Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other properties of proteins have not been fully investigated. Using the DrugBank (version 3.0) database containing nearly 6,816 drug entries including 760 FDA-approved drugs and 1822 of their targets and human UniProt/Swiss-Prot databases, we defined 1578 non-redundant drug target and 17,575 non-drug target proteins. To select these non-redundant protein datasets, we built four datasets (A, B, C, and D) by considering clustering of paralogous proteins. We first reassessed the widely used properties of drug target proteins. We confirmed and extended that drug target proteins (1) are likely to have more hydrophobic, less polar, less PEST sequences, and more signal peptide sequences higher and (2) are more involved in enzyme catalysis, oxidation and reduction in cellular respiration, and operational genes. In this study, we proposed new properties (essentiality, expression pattern, PTMs, and solvent accessibility) for effectively identifying drug target proteins. We found that (1) drug targetability and protein essentiality are decoupled, (2) druggability of proteins has high expression level and tissue specificity, and (3) functional post-translational modification residues are enriched in drug target proteins. In addition, to predict the drug targetability of proteins, we exploited two machine learning methods (Support Vector Machine and Random Forest). When we predicted drug targets by combining previously known protein properties and proposed new properties, an F-score of 0.8307 was obtained. When the newly proposed properties are integrated, the prediction performance

  18. Functionally specified protein signatures distinctive for each of the different blue copper proteins

    Directory of Open Access Journals (Sweden)

    Anishetty Sharmila

    2004-09-01

    bound to the copper atom. It was highly specific for each kind of blue copper protein and the false picks were minimized. The set of signatures designed specifically for the BCP's was entirely different from the existing broad spectrum signatures as mentioned in the background section. Conclusions These signatures can be very useful for the annotation of uncharacterized proteins and highly specific to retrieve blue copper protein sequences of interest from the non redundant databases containing a large deposition of protein sequences.

  19. Exploiting protein flexibility to predict the location of allosteric sites

    Directory of Open Access Journals (Sweden)

    Panjkovich Alejandro

    2012-10-01

    Full Text Available Abstract Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity, by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing

  20. GALT Protein Database, a Bioinformatics Resource for the Manage-ment and Analysis of Structural Features of a Galactosemia-related Protein and Its Mutants

    Institute of Scientific and Technical Information of China (English)

    Antonio d'Acierno; Angelo Facchiano; Anna Marabotti

    2009-01-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type Ⅰ. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  1. Database Description - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us Trypanosomes Database... Database Description General information of database Database name Trypanosomes Database...rmation and Systems Yata 1111, Mishima, Shizuoka 411-8540, JAPAN E mail: Database... classification Protein sequence databases Organism Taxonomy Name: Trypanosoma Taxonomy ID: 5690 Taxonomy Na...me: Homo sapiens Taxonomy ID: 9606 Database description The Trypanosomes database is a database providing th

  2. Genome databases

    Energy Technology Data Exchange (ETDEWEB)

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  3. DOPA: GPU-based protein alignment using database and memory access optimizations

    NARCIS (Netherlands)

    Hasan, L.; Kentie, M.; Al-Ars, Z.

    2011-01-01

    Background Smith-Waterman (S-W) algorithm is an optimal sequence alignment method for biological databases, but its computational complexity makes it too slow for practical purposes. Heuristics based approximate methods like FASTA and BLAST provide faster solutions but at the cost of reduced accurac

  4. DOPA: GPU-based protein alignment using database and memory access optimizations

    NARCIS (Netherlands)

    Hasan, L.; Kentie, M.; Al-Ars, Z.

    2011-01-01

    Background Smith-Waterman (S-W) algorithm is an optimal sequence alignment method for biological databases, but its computational complexity makes it too slow for practical purposes. Heuristics based approximate methods like FASTA and BLAST provide faster solutions but at the cost of reduced accurac

  5. ContaMiner and ContaBase: a webserver and database for early identification of unwantedly crystallized protein contaminants

    KAUST Repository

    Hungler, Arnaud

    2016-11-02

    Solving the phase problem in protein X-ray crystallography relies heavily on the identity of the crystallized protein, especially when molecular replacement (MR) methods are used. Yet, it is not uncommon that a contaminant crystallizes instead of the protein of interest. Such contaminants may be proteins from the expression host organism, protein fusion tags or proteins added during the purification steps. Many contaminants co-purify easily, crystallize and give good diffraction data. Identification of contaminant crystals may take time, since the presence of the contaminant is unexpected and its identity unknown. A webserver (ContaMiner) and a contaminant database (ContaBase) have been established, to allow fast MR-based screening of crystallographic data against currently 62 known contaminants. The web-based ContaMiner (available at http://strube.cbrc.kaust.edu.sa/contaminer/) currently produces results in 5 min to 4 h. The program is also available in a github repository and can be installed locally. ContaMiner enables screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for \\'crystallization and preliminary X-ray analysis\\' publications. Thus, in addition to potentially saving X-ray crystallographers much time and effort, ContaMiner might considerably lower the risk of publishing erroneous data. A web server, titled ContaMiner, has been established, which allows fast molecular-replacement-based screening of crystallographic data against a database (ContaBase) of currently 62 potential contaminants. ContaMiner enables systematic screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for \\'crystallization and preliminary X-ray analysis\\' publications. © Arnaud Hungler et al. 2016.

  6. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    Science.gov (United States)

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  7. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism species Data detail... Data name Cluster based on sequence comparison of homologous proteins of 95 organism species Description of...e History of This Database Site Policy | Contact Us Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive ...

  8. New algorithmic approaches to protein spot detection and pattern matching in two-dimensional electrophoresis gel databases.

    Science.gov (United States)

    Pleissner, K P; Hoffmann, F; Kriegel, K; Wenk, C; Wegner, S; Sahlström, A; Oswald, H; Alt, H; Fleck, E

    1999-01-01

    Protein spot identification in two-dimensional electrophoresis gels can be supported by the comparison of gel images accessible in different World Wide Web two-dimensional electrophoresis (2-DE) gel protein databases. The comparison may be performed either by visual cross-matching between gel images or by automatic recognition of similar protein spot patterns. A prerequisite for the automatic point pattern matching approach is the detection of protein spots yielding the x(s),y(s) coordinates and integrated spot intensities i(s). For this purpose an algorithm is developed based on a combination of hierarchical watershed transformation and feature extraction methods. This approach reduces the strong over-segmentation of spot regions normally produced by watershed transformation. Measures for the ellipticity and curvature are determined as features of spot regions. The resulting spot lists containing x(s),y(s),i(s)-triplets are calculated for a source as well as for a target gel image accessible in 2-DE gel protein databases. After spot detection a matching procedure is applied. Both the matching of a local pattern vs. a full 2-DE gel image and the global matching between full images are discussed. Preset slope and length tolerances of pattern edges serve as matching criteria. The local matching algorithm relies on a data structure derived from the incremental Delaunay triangulation of a point set and a two-step hashing technique. For the incremental construction of triangles the spot intensities are considered in decreasing order. The algorithm needs neither landmarks nor an a priori image alignment. A graphical user interface for spot detection and gel matching is written in the Java programming language for the Internet. The software package called CAROL (http://gelmatching.inf.fu-berlin.de) is realized in a client-server architecture.

  9. TcoF-DB: dragon database for human transcription co-factors and transcription factor interacting proteins

    KAUST Repository

    Schaefer, Ulf

    2010-10-21

    The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF-DNA binding, protein-protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/. © The Author(s) 2010.

  10. DMPD: Post-transcriptional regulation of proinflammatory proteins. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 15075353 Post-transcriptional regulation of proinflammatory proteins. Anderson P, P...hillips K, Stoecklin G, Kedersha N. J Leukoc Biol. 2004 Jul;76(1):42-7. Epub 2004 Apr 1. (.png) (.svg) (.html) (.csml) Show Post...-transcriptional regulation of proinflammatory proteins. PubmedID 15075353 Title Post-tr

  11. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    Science.gov (United States)

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.

  12. CancerResource: a comprehensive database of cancer-relevant proteins and compound interactions supported by experimental knowledge.

    Science.gov (United States)

    Ahmed, Jessica; Meinel, Thomas; Dunkel, Mathias; Murgueitio, Manuela S; Adams, Robert; Blasse, Corinna; Eckert, Andreas; Preissner, Saskia; Preissner, Robert

    2011-01-01

    During the development of methods for cancer diagnosis and treatment, a vast amount of information is generated. Novel cancer target proteins have been identified and many compounds that activate or inhibit cancer-relevant target genes have been developed. This knowledge is based on an immense number of experimentally validated compound-target interactions in the literature, and excerpts from literature text mining are spread over numerous data sources. Our own analysis shows that the overlap between important existing repositories such as Comparative Toxicogenomics Database (CTD), Therapeutic Target Database (TTD), Pharmacogenomics Knowledge Base (PharmGKB) and DrugBank as well as between our own literature mining for cancer-annotated entries is surprisingly small. In order to provide an easy overview of interaction data, it is essential to integrate this information into a single, comprehensive data repository. Here, we present CancerResource, a database that integrates cancer-relevant relationships of compounds and targets from (i) our own literature mining and (ii) external resources complemented with (iii) essential experimental and supporting information on genes and cellular effects. In order to facilitate an overview of existing and supporting information, a series of novel information connections have been established. CancerResource addresses the spectrum of research on compound-target interactions in natural sciences as well as in individualized medicine; CancerResource is available at: http://bioinformatics.charite.de/cancerresource/.

  13. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein–protein interaction sites for PDB structures

    Science.gov (United States)

    Ripoche, Hugues; Laine, Elodie; Ceres, Nicoletta; Carbone, Alessandra

    2017-01-01

    The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET2 at large-scale on more than 20 000 chains. JET2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein–protein interactions modulation and interaction surface redesign. PMID:27899675

  14. Protein structure image - ConfC | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ba.nbdc00400-006 Description of data contents Structure images of the protein which has structural flexibility.... Each image is linked from Data of structural flexibility table. Data file File name: confc_structure_ima

  15. TMBETADISC-RBF: Discrimination of beta-barrel membrane proteins using RBF networks and PSSM profiles.

    Science.gov (United States)

    Ou, Yu-Yen; Gromiha, M Michael; Chen, Shu-An; Suwa, Makiko

    2008-06-01

    Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. We have developed a method based on radial basis function networks and position specific scoring matrix (PSSM) profiles generated by PSI-BLAST and non-redundant protein database. Our approach with PSSM profiles has correctly predicted the OMPs with a cross-validated accuracy of 96.4% in a set of 1251 proteins, which contain 206 OMPs, 667 globular proteins and 378 alpha-helical inner membrane proteins. Furthermore, we applied our method on a dataset containing 114 OMPs, 187 TMH proteins and 195 globular proteins obtained with less than 20% sequence identity and obtained the cross-validated accuracy of 95%. This accuracy of discriminating OMPs is higher than other methods in the literature and our method could be used as an effective tool for dissecting OMPs from genomic sequences. We have developed a prediction server, TMBETADISC-RBF, which is available at http://rbf.bioinfo.tw/~sachen/OMP.html.

  16. Protein (Viridiplantae) - PGDBj - Ortholog DB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGD... IDs of clusters that the amino acid sequences belong to in each taxon are indicated. Data file File name: pgd...bj_ortholog_db_viridiplantae_protein.zip File URL: ftp://ftp.biosciencedbc.jp/archive/pgdbj-ortholog-db/LATEST/pgd...RL http://togodb.biosciencedbc.jp/togodb/view/pgdbj_ortholog_db_viridiplantae_protein#en Data acquisition me...BI GI number of Amino Acid sequence RefSeq ID NCBI Reference Sequence ID Cluster (Kingdom) Cluster ID (rank: Kingd

  17. Denatured-state energy landscapes of a protein structural database reveal the energetic determinants of a framework model for folding.

    Science.gov (United States)

    Wang, Suwei; Gu, Jenny; Larson, Scott A; Whitten, Steven T; Hilser, Vincent J

    2008-09-19

    Position-specific denatured-state thermodynamics were determined for a database of human proteins by use of an ensemble-based model of protein structure. The results of modeling denatured protein in this manner reveal important sequence-dependent thermodynamic properties in the denatured ensembles as well as fundamental differences between the denatured and native ensembles in overall thermodynamic character. The generality and robustness of these results were validated by performing fold-recognition experiments, whereby sequences were matched with their respective folds based on amino acid propensities for the different energetic environments in the protein, as determined through cluster analysis. Correlation analysis between structure and energetic information revealed that sequence segments destined for beta-sheet in the final native fold are energetically more predisposed to a broader repertoire of states than are sequence segments destined for alpha-helix. These results suggest that within the subensemble of mostly unstructured states, the energy landscapes are dominated by states in which parts of helices adopt structure, whereas structure formation for sequences destined for beta-strand is far less probable. These results support a framework model of folding, which suggests that, in general, the denatured state has evolutionarily evolved to avoid low-energy conformations in sequences that ultimately adopt beta-strand. Instead, the denatured state evolved so that sequence segments that ultimately adopt alpha-helix and coil will have a high intrinsic structure formation capability, thus serving as potential nucleation sites.

  18. A resource for benchmarking the usefulness of protein structure models.

    KAUST Repository

    Carbajo, Daniel

    2012-08-02

    BACKGROUND: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. RESULTS: This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. CONCLUSIONS: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by

  19. Automated builder and database of protein/membrane complexes for molecular dynamics simulations.

    Directory of Open Access Journals (Sweden)

    Sunhwan Jo

    Full Text Available Molecular dynamics simulations of membrane proteins have provided deeper insights into their functions and interactions with surrounding environments at the atomic level. However, compared to solvation of globular proteins, building a realistic protein/membrane complex is still challenging and requires considerable experience with simulation software. Membrane Builder in the CHARMM-GUI website (http://www.charmm-gui.org helps users to build such a complex system using a web browser with a graphical user interface. Through a generalized and automated building process including system size determination as well as generation of lipid bilayer, pore water, bulk water, and ions, a realistic membrane system with virtually any kinds and shapes of membrane proteins can be generated in 5 minutes to 2 hours depending on the system size. Default values that were elaborated and tested extensively are given in each step to provide reasonable options and starting points for both non-expert and expert users. The efficacy of Membrane Builder is illustrated by its applications to 12 transmembrane and 3 interfacial membrane proteins, whose fully equilibrated systems with three different types of lipid molecules (DMPC, DPPC, and POPC and two types of system shapes (rectangular and hexagonal are freely available on the CHARMM-GUI website. One of the most significant advantages of using the web environment is that, if a problem is found, users can go back and re-generate the whole system again before quitting the browser. Therefore, Membrane Builder provides the intuitive and easy way to build and simulate the biologically important membrane system.

  20. The Histone Database: an integrated resource for histones and histone fold-containing proteins

    OpenAIRE

    Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D; Landsman, David

    2011-01-01

    Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Hi...

  1. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    Energy Technology Data Exchange (ETDEWEB)

    Dogget, N.; Myers, G. [Los Alamos National Lab., NM (United States); Wills, C.J. [Univ. of California, San Diego, CA (United States)

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  2. PeptideDepot: flexible relational database for visual analysis of quantitative proteomic data and integration of existing protein information.

    Science.gov (United States)

    Yu, Kebing; Salomon, Arthur R

    2009-12-01

    Recently, dramatic progress has been achieved in expanding the sensitivity, resolution, mass accuracy, and scan rate of mass spectrometers able to fragment and identify peptides through MS/MS. Unfortunately, this enhanced ability to acquire proteomic data has not been accompanied by a concomitant increase in the availability of flexible tools allowing users to rapidly assimilate, explore, and analyze this data and adapt to various experimental workflows with minimal user intervention. Here we fill this critical gap by providing a flexible relational database called PeptideDepot for organization of expansive proteomic data sets, collation of proteomic data with available protein information resources, and visual comparison of multiple quantitative proteomic experiments. Our software design, built upon the synergistic combination of a MySQL database for safe warehousing of proteomic data with a FileMaker-driven graphical user interface for flexible adaptation to diverse workflows, enables proteomic end-users to directly tailor the presentation of proteomic data to the unique analysis requirements of the individual proteomics lab. PeptideDepot may be deployed as an independent software tool or integrated directly with our high throughput autonomous proteomic pipeline used in the automated acquisition and post-acquisition analysis of proteomic data.

  3. Comprehensive two-dimensional gel protein databases offer a global approach to the analysis of human cells: the transformed amnion cells (AMA) master database and its link to genome DNA sequence data

    DEFF Research Database (Denmark)

    Celis, J E; Gesser, B; Rasmussen, H H;

    1990-01-01

    qualitative and quantitative annotations has been established. The protein numbers in this database differ from those reported in an earlier version (Celis et al. Leukemia 1988, 2,561-602) as a result of changes in the scanning hardware. The reported information includes: percentage of total radioactivity...... recovered from the gels (based on quantitations of polypeptides labeled with a mixture of 16 14C-amino acids), protein name (including credit to investigators that aided identification), antibody against protein, cellular localization, (nuclear, 40S hnRNP, 20S snRNP U5, proteasomes, endoplasmic reticulum...

  4. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organism species. Data ...detail Data name Amino acid sequences of predicted proteins and their annotation for 95 organism species. De...scription of data contents Amino acid sequences of predicted proteins and their a...nload License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted pro

  5. Statistical analysis of crystallization database links protein physico-chemical features with crystallization mechanisms.

    Directory of Open Access Journals (Sweden)

    Diana Fusco

    Full Text Available X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.

  6. From networks of protein interactions to networks of functional dependencies

    Directory of Open Access Journals (Sweden)

    Luciani Davide

    2012-05-01

    Full Text Available Abstract Background As protein-protein interactions connect proteins that participate in either the same or different functions, networks of interacting and functionally annotated proteins can be converted into process graphs of inter-dependent function nodes (each node corresponding to interacting proteins with the same functional annotation. However, as proteins have multiple annotations, the process graph is non-redundant, if only proteins participating directly in a given function are included in the related function node. Results Reasoning that topological features (e.g., clusters of highly inter-connected proteins might help approaching structured and non-redundant understanding of molecular function, an algorithm was developed that prioritizes inclusion of proteins into the function nodes that best overlap protein clusters. Specifically, the algorithm identifies function nodes (and their mutual relations, based on the topological analysis of a protein interaction network, which can be related to various biological domains, such as cellular components (e.g., peroxisome and cellular bud or biological processes (e.g., cell budding of the model organism S. cerevisiae. Conclusions The method we have described allows converting a protein interaction network into a non-redundant process graph of inter-dependent function nodes. The examples we have described show that the resulting graph allows researchers to formulate testable hypotheses about dependencies among functions and the underlying mechanisms.

  7. Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria

    Energy Technology Data Exchange (ETDEWEB)

    Fritzsching, Keith J., E-mail: kfritzsc@brandeis.edu [Brandeis University, Department of Chemistry (United States); Hong, Mei [Massachusetts Institute of Technology, Department of Chemistry (United States); Schmidt-Rohr, Klaus, E-mail: srohr@brandeis.edu [Brandeis University, Department of Chemistry (United States)

    2016-02-15

    We have determined refined multidimensional chemical shift ranges for intra-residue correlations ({sup 13}C–{sup 13}C, {sup 15}N–{sup 13}C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 {sup 13}C chemical shifts and >3 million chemical shifts in total); these data were originally derived from the Biological Magnetic Resonance Data Bank. Using relatively simple non-parametric statistics to find peak maxima in the distributions of helix, sheet, coil and turn chemical shifts, and without the use of limited “hand-picked” data sets, we show that ∼94 % of the {sup 13}C NMR data and almost all {sup 15}N data are quite accurately referenced and assigned, with smaller standard deviations (0.2 and 0.8 ppm, respectively) than recognized previously. On the other hand, approximately 6 % of the {sup 13}C chemical shift data in the PACSY database are shown to be clearly misreferenced, mostly by ca. −2.4 ppm. The removal of the misreferenced data and other outliers by this purging by intrinsic quality criteria (PIQC) allows for reliable identification of secondary maxima in the two-dimensional chemical-shift distributions already pre-separated by secondary structure. We demonstrate that some of these correspond to specific regions in the Ramachandran plot, including left-handed helix dihedral angles, reflect unusual hydrogen bonding, or are due to the influence of a following proline residue. With appropriate smoothing, significantly more tightly defined chemical shift ranges are obtained for each amino acid type in the different secondary structures. These chemical shift ranges, which may be defined at any statistical threshold, can be used for amino-acid type assignment and secondary-structure analysis of chemical shifts from intra

  8. Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria.

    Science.gov (United States)

    Fritzsching, Keith J; Hong, Mei; Schmidt-Rohr, Klaus

    2016-02-01

    We have determined refined multidimensional chemical shift ranges for intra-residue correlations ((13)C-(13)C, (15)N-(13)C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 (13)C chemical shifts and >3 million chemical shifts in total); these data were originally derived from the Biological Magnetic Resonance Data Bank. Using relatively simple non-parametric statistics to find peak maxima in the distributions of helix, sheet, coil and turn chemical shifts, and without the use of limited "hand-picked" data sets, we show that ~94% of the (13)C NMR data and almost all (15)N data are quite accurately referenced and assigned, with smaller standard deviations (0.2 and 0.8 ppm, respectively) than recognized previously. On the other hand, approximately 6% of the (13)C chemical shift data in the PACSY database are shown to be clearly misreferenced, mostly by ca. -2.4 ppm. The removal of the misreferenced data and other outliers by this purging by intrinsic quality criteria (PIQC) allows for reliable identification of secondary maxima in the two-dimensional chemical-shift distributions already pre-separated by secondary structure. We demonstrate that some of these correspond to specific regions in the Ramachandran plot, including left-handed helix dihedral angles, reflect unusual hydrogen bonding, or are due to the influence of a following proline residue. With appropriate smoothing, significantly more tightly defined chemical shift ranges are obtained for each amino acid type in the different secondary structures. These chemical shift ranges, which may be defined at any statistical threshold, can be used for amino-acid type assignment and secondary-structure analysis of chemical shifts from intra-residue cross peaks by inspection or by using a provided

  9. CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies.

    Science.gov (United States)

    Nakamura, Yoji; Komiyama, Tomoyoshi; Furue, Motoki; Gojobori, Takashi; Akiyama, Yasuto

    2010-07-27

    Immunoglobulin (IG or antibody) and the T-cell receptor (TR) are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]). This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I) and 1,470 on hematological tumors (Group II). Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable. The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting annotation on IG, TR, and their epitopes. This database

  10. CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies

    Directory of Open Access Journals (Sweden)

    Furue Motoki

    2010-07-01

    Full Text Available Abstract Background Immunoglobulin (IG or antibody and the T-cell receptor (TR are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]. Description This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I and 1,470 on hematological tumors (Group II. Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable. Conclusions The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting

  11. Transporter Classification Database (TCDB)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Transporter Classification Database details a comprehensive classification system for membrane transport proteins known as the Transporter Classification (TC)...

  12. A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

    Science.gov (United States)

    Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

    2017-04-03

    The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative. However, proteomic methods require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten.

  13. Application of open-access databases to determine functional connectivity between resveratrol-binding protein QR2 and colorectal carcinoma.

    Science.gov (United States)

    Doonan, Barbara B; Schaafsma, Evelien; Pinto, John T; Wu, Joseph M; Hsieh, Tze-Chen

    2017-08-01

    Colorectal cancer (CRC) is a major cause of cancer-associated deaths worldwide. Recently, oral administration of resveratrol (trans-3,5,4'-trihydroxystilbene) has been reported to significantly reduce tumor proliferation in colorectal cancer patients, however, with little specific information on functional connections. The pathogenesis and development of colorectal cancer is a multistep process that can be categorized using three phenotypic pathways, respectively, chromosome instability (CIN), microsatellite instability (MSI), and CpG island methylator (CIMP). Targets of resveratrol, including a high-affinity binding protein, quinone reductase 2 (QR2), have been identified with little information on disease association. We hypothesize that the relationship between resveratrol and different CRC etiologies might be gleaned using publicly available databases. A web-based microarray gene expression data-mining platform, Oncomine, was selected and used to determine whether QR2 may serve as a mechanistic and functional biotarget within the various CRC etiologies. We found that QR2 messenger RNA (mRNA) is overexpressed in CRC characterized by CIN, particularly in cells showing a positive KRAS (Kirsten rat sarcoma viral oncogene homolog) mutation, as well as by the MSI but not the CIMP phenotype. Mining of Oncomine revealed an excellent correlation between QR2 mRNA expression and certain CRC etiologies. Two resveratrol-associated genes, adenomatous polyposis coli (APC) and TP53, found in CRC were further mined, using cBio portal and Colorectal Cancer Atlas which predicted a mechanistic link to exist between resveratrol→QR2/TP53→CIN. Multiple web-based data mining can provide valuable insights which may lead to hypotheses serving to guide clinical trials and design of therapies for enhanced disease prognosis and patient survival. This approach resembles a BioGPS, a capability for mining web-based databases that can elucidate the potential links between compounds to

  14. GPCR-SSFE: A comprehensive database of G-protein-coupled receptor template predictions and homology models

    Directory of Open Access Journals (Sweden)

    Kreuchwig Annika

    2011-05-01

    Full Text Available Abstract Background G protein-coupled receptors (GPCRs transduce a wide variety of extracellular signals to within the cell and therefore have a key role in regulating cell activity and physiological function. GPCR malfunction is responsible for a wide range of diseases including cancer, diabetes and hyperthyroidism and a large proportion of drugs on the market target these receptors. The three dimensional structure of GPCRs is important for elucidating the molecular mechanisms underlying these diseases and for performing structure-based drug design. Although structural data are restricted to only a handful of GPCRs, homology models can be used as a proxy for those receptors not having crystal structures. However, many researchers working on GPCRs are not experienced homology modellers and are therefore unable to benefit from the information that can be gleaned from such three-dimensional models. Here, we present a comprehensive database called the GPCR-SSFE, which provides initial homology models of the transmembrane helices for a large variety of family A GPCRs. Description Extending on our previous theoretical work, we have developed an automated pipeline for GPCR homology modelling and applied it to a large set of family A GPCR sequences. Our pipeline is a fragment-based approach that exploits available family A crystal structures. The GPCR-SSFE database stores the template predictions, sequence alignments, identified sequence and structure motifs and homology models for 5025 family A GPCRs. Users are able to browse the GPCR dataset according to their pharmacological classification or search for results using a UniProt entry name. It is also possible for a user to submit a GPCR sequence that is not contained in the database for analysis and homology model building. The models can be viewed using a Jmol applet and are also available for download along with the alignments. Conclusions The data provided by GPCR-SSFE are useful for investigating

  15. GPCR-SSFE: a comprehensive database of G-protein-coupled receptor template predictions and homology models.

    Science.gov (United States)

    Worth, Catherine L; Kreuchwig, Annika; Kleinau, Gunnar; Krause, Gerd

    2011-05-23

    G protein-coupled receptors (GPCRs) transduce a wide variety of extracellular signals to within the cell and therefore have a key role in regulating cell activity and physiological function. GPCR malfunction is responsible for a wide range of diseases including cancer, diabetes and hyperthyroidism and a large proportion of drugs on the market target these receptors. The three dimensional structure of GPCRs is important for elucidating the molecular mechanisms underlying these diseases and for performing structure-based drug design. Although structural data are restricted to only a handful of GPCRs, homology models can be used as a proxy for those receptors not having crystal structures. However, many researchers working on GPCRs are not experienced homology modellers and are therefore unable to benefit from the information that can be gleaned from such three-dimensional models. Here, we present a comprehensive database called the GPCR-SSFE, which provides initial homology models of the transmembrane helices for a large variety of family A GPCRs. Extending on our previous theoretical work, we have developed an automated pipeline for GPCR homology modelling and applied it to a large set of family A GPCR sequences. Our pipeline is a fragment-based approach that exploits available family A crystal structures. The GPCR-SSFE database stores the template predictions, sequence alignments, identified sequence and structure motifs and homology models for 5025 family A GPCRs. Users are able to browse the GPCR dataset according to their pharmacological classification or search for results using a UniProt entry name. It is also possible for a user to submit a GPCR sequence that is not contained in the database for analysis and homology model building. The models can be viewed using a Jmol applet and are also available for download along with the alignments. The data provided by GPCR-SSFE are useful for investigating general and detailed sequence-structure-function relationships

  16. Development of a protein-ligand-binding site prediction method based on interaction energy and sequence conservation.

    Science.gov (United States)

    Tsujikawa, Hiroto; Sato, Kenta; Wei, Cao; Saad, Gul; Sumikoshi, Kazuya; Nakamura, Shugo; Terada, Tohru; Shimizu, Kentaro

    2016-09-01

    We present a new method for predicting protein-ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.

  17. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    Science.gov (United States)

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  18. ChiTaRS-3.1—the enhanced chimeric transcripts and RNA-seq database matched with protein–protein interactions

    Science.gov (United States)

    Gorohovski, Alessandro; Tagore, Somnath; Palande, Vikrant; Malka, Assaf; Raviv-Shay, Dorith; Frenkel-Morgenstern, Milana

    2017-01-01

    Discovery of chimeric RNAs, which are produced by chromosomal translocations as well as the joining of exons from different genes by trans-splicing, has added a new level of complexity to our study and understanding of the transcriptome. The enhanced ChiTaRS-3.1 database (http://chitars.md.biu.ac.il) is designed to make widely accessible a wealth of mined data on chimeric RNAs, with easy-to-use analytical tools built-in. The database comprises 34 922 chimeric transcripts along with 11 714 cancer breakpoints. In this latest version, we have included multiple cross-references to GeneCards, iHop, PubMed, NCBI, Ensembl, OMIM, RefSeq and the Mitelman collection for every entry in the ‘Full Collection’. In addition, for every chimera, we have added a predicted chimeric protein–protein interaction (ChiPPI) network, which allows for easy visualization of protein partners of both parental and fusion proteins for all human chimeras. The database contains a comprehensive annotation for 34 922 chimeric transcripts from eight organisms, and includes the manual annotation of 200 sense-antiSense (SaS) chimeras. The current improvements in the content and functionality to the ChiTaRS database make it a central resource for the study of chimeric transcripts and fusion proteins. PMID:27899596

  19. Protein functional-group 3D motif and its applications

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Representing and recognizing protein active sites sequence motif (1D motif) and structural motif (3D motif) is an important topic for predicting and designing protein function. Prevalent methods for extracting and searching 3D motif always consider residue as the minimal unit, which have limited sensitivity. Here we present a new spatial representation of protein active sites, called "functional-group 3D motif ", based on the fact that the functional groups inside a residue contribute mostly to its function. Relevant algorithm and computer program are developed, which could be widely used in the function prediction and the study of structural-function relationship of proteins. As a test, we defined a functional-group 3D motif of the catalytic triad and oxyanion hole with the structure of porcine trypsin (PDB code: 1mct) as the template. With our motif-searching program, we successfully found similar sub-structures in trypsins, subtilisins and a/b hydrolases, which show distinct folds but share similar catalytic mechanism. Moreover, this motif can be used to elucidate the structural basis of other proteins with variant catalytic triads by comparing it to those proteins. Finally, we scanned this motif against a non-redundant protein structure database to find its matches, and the results demonstrated the potential application of functional group 3D motif in function prediction. Above all, compared with the other 3D-motif representations on residues, the functional group 3D motif achieves better representation of protein active region, which is more sensitive for protein function prediction.

  20. The Chloroplast Function Database II: a comprehensive collection of homozygous mutants and their phenotypic/genotypic traits for nuclear-encoded chloroplast proteins.

    Science.gov (United States)

    Myouga, Fumiyoshi; Akiyama, Kenji; Tomonaga, Yumi; Kato, Aya; Sato, Yuka; Kobayashi, Megumi; Nagata, Noriko; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-02-01

    The Chloroplast Function Database has so far offered phenotype information on mutants of the nuclear-encoded chloroplast proteins in Arabidopsis that pertains to >200 phenotypic data sets that were obtained from 1,722 transposon- or T-DNA-tagged lines. Here, we present the development of the second version of the database, which is named the Chloroplast Function Database II and was redesigned to increase the number of mutant characters and new user-friendly tools for data mining and integration. The upgraded database offers information on genome-wide mutant screens for any visible phenotype against 2,495 tagged lines to create a comprehensive homozygous mutant collection. The collection consists of 147 lines with seedling phenotypes and 185 lines for which we could not obtain homozygotes, as well as 1,740 homozygotes with wild-type phenotypes. Besides providing basic information about primer lists that were used for the PCR genotyping of T-DNA-tagged lines and explanations about the preparation of homozygous mutants and phenotype screening, the database includes access to a link between the gene locus and existing publicly available databases. This gives users access to a combined pool of data, enabling them to gain valuable insights into biological processes. In addition, high-resolution images of plastid morphologies of mutants with seedling-specific chloroplast defects as observed with transmission electron microscopy (TEM) are available in the current database. This database is used to compare the phenotypes of visually identifiable mutants with their plastid ultrastructures and to evaluate their potential significance from characteristic patterns of plastid morphology in vivo. Thus, the Chloroplast Function Database II is a useful and comprehensive information resource that can help researchers to connect individual Arabidopsis genes to plastid functions on the basis of phenotype analysis of our tagged mutant collection. It can be freely accessed at http://rarge.psc.riken.jp/chloroplast/.

  1. Biological Macromolecule Crystallization Database

    Science.gov (United States)

    SRD 21 Biological Macromolecule Crystallization Database (Web, free access)   The Biological Macromolecule Crystallization Database and NASA Archive for Protein Crystal Growth Data (BMCD) contains the conditions reported for the crystallization of proteins and nucleic acids used in X-ray structure determinations and archives the results of microgravity macromolecule crystallization studies.

  2. Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database.

    Science.gov (United States)

    Hawkins, Paul C D; Skillman, A Geoffrey; Warren, Gregory L; Ellingson, Benjamin A; Stahl, Matthew T

    2010-04-26

    Here, we present the algorithm and validation for OMEGA, a systematic, knowledge-based conformer generator. The algorithm consists of three phases: assembly of an initial 3D structure from a library of fragments; exhaustive enumeration of all rotatable torsions using values drawn from a knowledge-based list of angles, thereby generating a large set of conformations; and sampling of this set by geometric and energy criteria. Validation of conformer generators like OMEGA has often been undertaken by comparing computed conformer sets to experimental molecular conformations from crystallography, usually from the Protein Databank (PDB). Such an approach is fraught with difficulty due to the systematic problems with small molecule structures in the PDB. Methods are presented to identify a diverse set of small molecule structures from cocomplexes in the PDB that has maximal reliability. A challenging set of 197 high quality, carefully selected ligand structures from well-solved models was obtained using these methods. This set will provide a sound basis for comparison and validation of conformer generators in the future. Validation results from this set are compared to the results using structures of a set of druglike molecules extracted from the Cambridge Structural Database (CSD). OMEGA is found to perform very well in reproducing the crystallographic conformations from both these data sets using two complementary metrics of success.

  3. FastBLAST: homology relationships for millions of proteins.

    Directory of Open Access Journals (Sweden)

    Morgan N Price

    Full Text Available BACKGROUND: All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. METHODOLOGY/PRINCIPAL FINDINGS: We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR", FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query. CONCLUSIONS/SIGNIFICANCE: FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.

  4. Generation of a predicted protein database from EST data and application to iTRAQ analyses in grape (Vitis vinifera cv. Cabernet Sauvignon berries at ripening initiation

    Directory of Open Access Journals (Sweden)

    Smith Derek

    2009-01-01

    Full Text Available Abstract Background iTRAQ is a proteomics technique that uses isobaric tags for relative and absolute quantitation of tryptic peptides. In proteomics experiments, the detection and high confidence annotation of proteins and the significance of corresponding expression differences can depend on the quality and the species specificity of the tryptic peptide map database used for analysis of the data. For species for which finished genome sequence data are not available, identification of proteins relies on similarity to proteins from other species using comprehensive peptide map databases such as the MSDB. Results We were interested in characterizing ripening initiation ('veraison' in grape berries at the protein level in order to better define the molecular control of this important process for grape growers and wine makers. We developed a bioinformatic pipeline for processing EST data in order to produce a predicted tryptic peptide database specifically targeted to the wine grape cultivar, Vitis vinifera cv. Cabernet Sauvignon, and lacking truncated N- and C-terminal fragments. By searching iTRAQ MS/MS data generated from berry exocarp and mesocarp samples at ripening initiation, we determined that implementation of the custom database afforded a large improvement in high confidence peptide annotation in comparison to the MSDB. We used iTRAQ MS/MS in conjunction with custom peptide db searches to quantitatively characterize several important pathway components for berry ripening previously described at the transcriptional level and confirmed expression patterns for these at the protein level. Conclusion We determined that a predicted peptide database for MS/MS applications can be derived from EST data using advanced clustering and trimming approaches and successfully implemented for quantitative proteome profiling. Quantitative shotgun proteome profiling holds great promise for characterizing biological processes such as fruit ripening

  5. Protein: CAD [Trypanosomes Database

    Lifescience Database Archive (English)

    Full Text Available CAD carbamoyl-phosphate synthetase 2, aspartate transcarbamylase, and dihydroorotaseCAD trifunctional prot...eincarbamoylphosphate synthetase 2/aspartate transcarbamylase/dihydroorotasemultifunctional prot

  6. Database Description - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available TMBETA-GENOME Database Description General information of database Database name TMBETA-GENOME Alternative n...oinfo/Gromiha/ Database classification Protein sequence databases - Protein prope...: Eukaryota Taxonomy ID: 2759 Database description TMBETA-GENOME is a database for transmembrane β-barrel pr...lgorithms and statistical methods have been perfumed and the annotation results are accumulated in the database.... Features and manner of utilization of database Users can download lists of sequences predicted as β-bar

  7. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    Science.gov (United States)

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  8. Putative Vitis vinifera Rop- and Rab-GAP-, GEF-, and GDI-interacting proteins uncovered with novel methods for public genomic and EST database analysis.

    Science.gov (United States)

    Abbal, Philippe; Tesniere, Catherine

    2010-01-01

    To understand how grapevine Rop and Rab proteins achieve their functional versatility in signalling, identification of the putative VvRop- and VvRab-interacting proteins was performed using newly designed tools. In this study, sequences encoding eight full-length proteins for VvRop GTPase-activating proteins (GAPs), five for VvRabGAPs, six for VvRop guanine nucleotide exchange factors (GEFs), one for VvRabGEF, five for VvRop GDP dissociation inhibitors (GDIs), and three for VvRabGDIs were identified. These proteins had a CRIB motif or PH domain, a TBC domain, a PRONE domain, a DENN domain, or GDI signatures, respectively. By bootstrap analysis, an unrooted consensus phylogenetic tree was constructed which indicated that VvRopGDIs and VvRopGEFs--but not VvRopGAP--belonged to the same clade, and that VvRabGEF1 protein was more closely related to VvRopGAPs than to the other putative VvRab-interacting proteins. Twenty-two genes out of 28 encoding putative VvRop- and VvRab-interacting proteins could be located on identified grapevine chromosomes. Generally one gene was anchored on one chromosome, but in some cases up to four genes were located on the same chromosome. Expression patterns of the genes encoding putative VvRop- and VvRab-interacting proteins were also examined using a newly developed tool based on public expressed sequence tag (EST) database analysis. Expression patterns were sometimes found to be specific to an organ or a developmental stage. Although some limitations exist, the use of EST database analysis is stressed, in particular in the case of species where expression data are obtained at high costs in terms of time and effort.

  9. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions

    Science.gov (United States)

    Hume, Maxwell A.; Barrera, Luis A.; Gisselbrecht, Stephen S.; Bulyk, Martha L.

    2015-01-01

    The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k (‘k-mers’). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. PMID:25378322

  10. 2P2I HUNTER: a tool for filtering orthosteric protein-protein interaction modulators via a dedicated support vector machine.

    Science.gov (United States)

    Hamon, Véronique; Bourgeas, Raphael; Ducrot, Pierre; Theret, Isabelle; Xuereb, Laura; Basse, Marie Jeanne; Brunel, Jean Michel; Combes, Sebastien; Morelli, Xavier; Roche, Philippe

    2014-01-06

    Over the last 10 years, protein-protein interactions (PPIs) have shown increasing potential as new therapeutic targets. As a consequence, PPIs are today the most screened target class in high-throughput screening (HTS). The development of broad chemical libraries dedicated to these particular targets is essential; however, the chemical space associated with this 'high-hanging fruit' is still under debate. Here, we analyse the properties of 40 non-redundant small molecules present in the 2P2I database (http://2p2idb.cnrs-mrs.fr/) to define a general profile of orthosteric inhibitors and propose an original protocol to filter general screening libraries using a support vector machine (SVM) with 11 standard Dragon molecular descriptors. The filtering protocol has been validated using external datasets from PubChem BioAssay and results from in-house screening campaigns. This external blind validation demonstrated the ability of the SVM model to reduce the size of the filtered chemical library by eliminating up to 96% of the compounds as well as enhancing the proportion of active compounds by up to a factor of 8. We believe that the resulting chemical space identified in this paper will provide the scientific community with a concrete support to search for PPI inhibitors during HTS campaigns.

  11. The NCBI Taxonomy database.

    Science.gov (United States)

    Federhen, Scott

    2012-01-01

    The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.

  12. Isolation of cross-linked peptides by diagonal strong cation exchange chromatography for protein complex topology studies by peptide fragment fingerprinting from large sequence databases.

    Science.gov (United States)

    Buncherd, Hansuk; Roseboom, Winfried; Ghavim, Behrad; Du, Weina; de Koning, Leo J; de Koster, Chris G; de Jong, Luitzen

    2014-06-27

    Knowledge of spatial proximity of amino acid residues obtained by chemical cross-linking and mass spectrometric analysis provides information about protein folding, protein-protein interactions and topology of macromolecular assemblies. We show that the use of bis(succinimidyl)-3-azidomethyl glutarate as a cross-linker provides a solution for two major analytical problems of cross-link mapping by peptide fragment fingerprinting (PFF) from complex sequence databases, i.e., low abundance of protease-generated target peptides and lack of knowledge of the masses of linked peptides. Tris(carboxyethyl)phosphine (TCEP) reduces the azido group in cross-linked peptides to an amine group in competition with cleavage of an amide bond formed in the cross-link reaction. TCEP-induced reaction products were separated by diagonal strong cation exchange (SCX) from unmodified peptides. The relation between the sum of the masses of the cleavage products and the mass of the parent cross-linked peptide enables determination of the masses of candidate linked peptides. By reversed phase LC-MS/MS analysis of secondary SCX fractions, we identified several intraprotein and interprotein cross-links in a HeLa cell nuclear extract, aided by software tools supporting PFF from the entire human sequence database. The data provide new information about interacting protein domains, among others from assemblies involved in splicing.

  13. Establishing a protein expression profile database for the normal human pituitary gland using two-dimensional high-performance liquid chromatography combined with LTQ-Orbitrap mass spectrometry

    Institute of Scientific and Technical Information of China (English)

    Rong Xie; Wei Xu; Weimin Bao; Hang Liu; Luping Chen; Yiwen Shen; Jianhong Zhu

    2012-01-01

    In this study, we selected adult normal pituitary gland tissues from six patients during operations for pituitary microadenomas via the transsphenoidal approach for extended normal pituitary tissue resection around the tumor, and analyzed the protein expression of human normal pituitary using two-dimensional high-performance liquid chromatography combined with LTQ-Orbitrap mass spectrometry proteomics technology. The ten most highly expressed proteins in normal human pituitary were: alpha 3 type VI collagen isoform 5 precursor (abundance among tall pituitary proteins, 1.30%), fibrinogen beta chain preproprotein (0.99%), vimentin (0.73%), prolactin (0.69%), ATP synthase, H+ transporting and mitochondrial F1 complex beta subunit precursor (0.52%), keratin I (0.49%), growth hormone (0.45%), carbonic anhydrase I (0.40%), heat shock protein 90 kDa I (0.31%), and annexin V (0.30%). Based on the biological function classifications of these proteins, the top three categories by content were neuroendocrine proteins (abundance among all pituitary proteins, 40.1%), catalytic and metabolic proteins (28.3%), and cell signal transduction proteins (9.8%). Based on cell positioning classification, the top three categories were cell organelle (24.5%), membrane (20.8%), and cytoplasm (13.0%). Based on biological process classification, the top three categories of proteins are involved in physiological processes (42.9%), cellular processes (40.4%), and regulation of biological processes (9.1%). Our experimental findings indicate that a protein expression profile database of normal human pituitary can be precisely and efficiently established by proteomics technology.

  14. Molecular Quantum Similarity, Chemical Reactivity and Database Screening of 3D Pharmacophores of the Protein Kinases A, B and G from Mycobacterium tuberculosis.

    Science.gov (United States)

    Morales-Bayuelo, Alejandro

    2017-06-21

    Mycobacterium tuberculosis remains one of the world's most devastating pathogens. For this reason, we developed a study involving 3D pharmacophore searching, selectivity analysis and database screening for a series of anti-tuberculosis compounds, associated with the protein kinases A, B, and G. This theoretical study is expected to shed some light onto some molecular aspects that could contribute to the knowledge of the molecular mechanics behind interactions of these compounds, with anti-tuberculosis activity. Using the Molecular Quantum Similarity field and reactivity descriptors supported in the Density Functional Theory, it was possible to measure the quantification of the steric and electrostatic effects through the Overlap and Coulomb quantitative convergence (alpha and beta) scales. In addition, an analysis of reactivity indices using global and local descriptors was developed, identifying the binding sites and selectivity on these anti-tuberculosis compounds in the active sites. Finally, the reported pharmacophores to PKn A, B and G, were used to carry out database screening, using a database with anti-tuberculosis drugs from the Kelly Chibale research group (http://www.kellychibaleresearch.uct.ac.za/), to find the compounds with affinity for the specific protein targets associated with PKn A, B and G. In this regard, this hybrid methodology (Molecular Mechanic/Quantum Chemistry) shows new insights into drug design that may be useful in the tuberculosis treatment today.

  15. DMPD: G-protein-coupled receptor expression, function, and signaling in macrophages. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 17456803 G-protein-coupled receptor expression, function, and signaling in macropha...2007 Apr 24. (.png) (.svg) (.html) (.csml) Show G-protein-coupled receptor expression, function, and signali...ng in macrophages. PubmedID 17456803 Title G-protein-coupled receptor expression,

  16. Databases for Microbiologists

    Science.gov (United States)

    2015-01-01

    Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493

  17. Relational databases

    CERN Document Server

    Bell, D A

    1986-01-01

    Relational Databases explores the major advances in relational databases and provides a balanced analysis of the state of the art in relational databases. Topics covered include capture and analysis of data placement requirements; distributed relational database systems; data dependency manipulation in database schemata; and relational database support for computer graphics and computer aided design. This book is divided into three sections and begins with an overview of the theory and practice of distributed systems, using the example of INGRES from Relational Technology as illustration. The

  18. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

    Science.gov (United States)

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2010-04-06

    Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  19. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

    Directory of Open Access Journals (Sweden)

    Schmidt Bertil

    2010-04-01

    Full Text Available Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA. A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72 times using the optimized SIMT algorithm and up to 1.77 (1.66 times using the partitioned vectorized algorithm, with a performance of up to 17 (30 billion cells update per second (GCUPS on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295 graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  20. The B6 database: a tool for the description and classification of vitamin B6-dependent enzymatic activities and of the corresponding protein families

    Directory of Open Access Journals (Sweden)

    Peracchi Alessio

    2009-09-01

    Full Text Available Abstract Background - Enzymes that depend on vitamin B6 (and in particular on its metabolically active form, pyridoxal 5'-phosphate, PLP are of great relevance to biology and medicine, as they catalyze a wide variety of biochemical reactions mainly involving amino acid substrates. Although PLP-dependent enzymes belong to a small number of independent evolutionary lineages, they encompass more than 160 distinct catalytic functions, thus representing a striking example of divergent evolution. The importance and remarkable versatility of these enzymes, as well as the difficulties in their functional classification, create a need for an integrated source of information about them. Description - The B6 database http://bioinformatics.unipr.it/B6db contains documented B6-dependent activities and the relevant protein families, defined as monophyletic groups of sequences possessing the same enzymatic function. One or more families were associated to each of 121 PLP-dependent activities with known sequences. Hidden Markov models (HMMs were built from family alignments and incorporated in the database. These HMMs can be used for the functional classification of PLP-dependent enzymes in genomic sets of predicted protein sequences. An example of such analyses (a census of human genes coding for PLP-dependent enzymes is provided here, whereas many more are accessible through the database itself. Conclusion - The B6 database is a curated repository of biochemical and molecular information about an important group of enzymes. This information is logically organized and available for computational analyses, providing a key resource for the identification, classification and comparative analysis of B6-dependent enzymes.

  1. DMPD: Suppressor of cytokine signaling (SOCS) 2, a protein with multiple functions. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 17070092 Suppressor of cytokine signaling (SOCS) 2, a protein with multiple functio...Epub 2006 Oct 27. (.png) (.svg) (.html) (.csml) Show Suppressor of cytokine signaling (SOCS) 2, a protein with multiple...SOCS) 2, a protein with multiple functions. Authors Rico-Bautista E, Flores-Morales A, Fernandez-Perez L. Pu... functions. PubmedID 17070092 Title Suppressor of cytokine signaling (

  2. PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya.

    Science.gov (United States)

    Cromar, Graham L; Zhao, Anthony; Xiong, Xuejian; Swapna, Lakshmipuram S; Loughran, Noeleen; Song, Hongyan; Parkinson, John

    2016-01-01

    PhyloPro is a database and accompanying web-based application for the construction and exploration of phylogenetic profiles across the Eukarya. In this update article, we present six major new developments in PhyloPro: (i) integration of Pfam-A domain predictions for all proteins; (ii) new summary heatmaps and detailed level views of domain conservation; (iii) an interactive, network-based visualization tool for exploration of domain architectures and their conservation; (iv) ability to browse based on protein functional categories (GOSlim); (v) improvements to the web interface to enhance drill down capability from the heatmap view; and (vi) improved coverage including 164 eukaryotes and 12 reference species. In addition, we provide improved support for downloading data and images in a variety of formats. Among the existing tools available for phylogenetic profiles, PhyloPro provides several innovative domain-based features including a novel domain adjacency visualization tool. These are designed to allow the user to identify and compare proteins with similar domain architectures across species and thus develop hypotheses about the evolution of lineage-specific trajectories. Database URL: http://www.compsysbio.org/phylopro/.

  3. Biofuel Database

    Science.gov (United States)

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  4. Onzekere databases

    NARCIS (Netherlands)

    van Keulen, Maurice

    Een recente ontwikkeling in het databaseonderzoek betret zogenaamde 'onzekere databases'. Dit artikel beschrijft wat onzekere databases zijn, hoe ze gebruikt kunnen worden en welke toepassingen met name voordeel zouden kunnen hebben van deze technologie.

  5. Community Database

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such...

  6. SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

    Science.gov (United States)

    Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

    2015-07-29

    Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web

  7. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  8. Database Administrator

    Science.gov (United States)

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  9. DMPD: Protein kinase C epsilon: a new target to control inflammation andimmune-mediated disorders. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 14643884 Protein kinase C epsilon: a new target to control inflammation andimmune-mediated disorder...g) (.html) (.csml) Show Protein kinase C epsilon: a new target to control inflammation andimmune-mediated disorder...l inflammation andimmune-mediated disorders. Authors Aksoy E, Goldman M, Willems F. Publication Int J Bioche

  10. DMPD: Structure, function and regulation of the Toll/IL-1 receptor adaptor proteins. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 17667936 Structure, function and regulation of the Toll/IL-1 receptor adaptor proteins. Watters TM, Kenny...tor adaptor proteins. Authors Watters TM, Kenny EF, O'Neill LA. Publication Immunol Cell Biol. 2007 Aug-Sep;

  11. DMPD: Macrophage-stimulating protein and RON receptor tyrosine kinase: potentialregulators of macrophage inflammatory activities. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 12472665 Macrophage-stimulating protein and RON receptor tyrosine kinase: potential...:545-53. (.png) (.svg) (.html) (.csml) Show Macrophage-stimulating protein and RON receptor tyrosine kinase:... potentialregulators of macrophage inflammatory activities. PubmedID 12472665 Title Macrophage-stimulatin

  12. DMPD: The role of Toll-like receptors and Nod proteins in bacterial infection. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 15476921 The role of Toll-like receptors and Nod proteins in bacterial infection. P...hilpott DJ, Girardin SE. Mol Immunol. 2004 Nov;41(11):1099-108. (.png) (.svg) (.html) (.csml) Show The role ...of Toll-like receptors and Nod proteins in bacterial infection. PubmedID 15476921 Title The role of Toll-lik

  13. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    Science.gov (United States)

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  14. The CATH database

    Directory of Open Access Journals (Sweden)

    Knudsen Michael

    2010-02-01

    Full Text Available Abstract The CATH database provides hierarchical classification of protein domains based on their folding patterns. Domains are obtained from protein structures deposited in the Protein Data Bank and both domain identification and subsequent classification use manual as well as automated procedures. The accompanying website http://www.cathdb.info provides an easy-to-use entry to the classification, allowing for both browsing and downloading of data. Here, we give a brief review of the database, its corresponding website and some related tools.

  15. The yeast two hybrid system in a screen for proteins interacting with axolotl (Ambystoma mexicanum) Msx1 during early limb regeneration.

    Science.gov (United States)

    Abuqarn, Mehtap; Allmeling, Christina; Amshoff, Inga; Menger, Bjoern; Nasser, Inas; Vogt, Peter M; Reimers, Kerstin

    2011-07-01

    Urodele amphibians are exceptional in their ability to regenerate complex body structures such as limbs. Limb regeneration depends on a process called dedifferentiation. Under an inductive wound epidermis terminally differentiated cells transform to pluripotent progenitor cells that coordinately proliferate and eventually redifferentiate to form the new appendage. Recent studies have developed molecular models integrating a set of genes that might have important functions in the control of regenerative cellular plasticity. Among them is Msx1, which induced dedifferentiation in mammalian myotubes in vitro. Herein, we screened for interaction partners of axolotl Msx1 using a yeast two hybrid system. A two hybrid cDNA library of 5-day-old wound epidermis and underlying tissue containing more than 2×10⁶ cDNAs was constructed and used in the screen. 34 resulting cDNA clones were isolated and sequenced. We then compared sequences of the isolated clones to annotated EST contigs of the Salamander EST database (BLASTn) to identify presumptive orthologs. We subsequently searched all no-hit clone sequences against non redundant NCBI sequence databases using BLASTx. It is the first time, that the yeast two hybrid system was adapted to the axolotl animal model and successfully used in a screen for proteins interacting with Msx1 in the context of amphibian limb regeneration.

  16. Tandem application of cationic colloidal silica and Triton X-114 for plasma membrane protein isolation and purification: towards developing an MDCK protein database.

    Science.gov (United States)

    Mathias, Rommel A; Chen, Yuan-Shou; Goode, Robert J A; Kapp, Eugene A; Mathivanan, Suresh; Moritz, Robert L; Zhu, Hong-Jian; Simpson, Richard J

    2011-04-01

    Plasma membrane (PM) proteins are attractive therapeutic targets because of their accessibility to drugs. Although genes encoding PM proteins represent 20-30% of eukaryotic genomes, a detailed characterisation of their encoded proteins is underrepresented, due, to their low copy number and the inherent difficulties in their isolation and purification as a consequence of their high hydrophobicity. We describe here a strategy that combines two orthogonal methods to isolate and purify PM proteins from Madin Darby canine kidney (MDCK) cells. In this two-step method, we first used cationic colloidal silica (CCS) to isolate adherent (Ad) and non-adherent (nAd) PM fractions, and then subjected each fraction to Triton X-114 (TX-114) phase partitioning to further enrich for hydrophobic proteins. While CCS alone identified 255/757 (34%) membrane proteins, CCS/TX-114 in combination yielded 453/745 (61%). Strikingly, of those proteins unique to CCS/TX-114, 277/393 (70%) had membrane annotation. Further characterisation of the CCS/TX-114 data set using Uniprot and transmembrane hidden Markov model revealed that 306/745 (41%) contained one or more transmembrane domains (TMDs), including proteins with 25 and 17 TMDs. Of the remaining proteins in the data set, 69/439 (16%) are known to contain lipid modifications. Of all membrane proteins identified, 93 had PM origin, including proteins that mediate cell adhesion, modulate transmembrane ion transport, and cell-cell communication. These studies reveal that the application of CCS to first isolate Ad and nAd PM fractions, followed by their detergent-phase TX-114 partitioning, to be a powerful method to isolate low-abundance PM proteins, and a useful adjunct for in-depth cell surface proteome analyses.

  17. Predicting RNA-Protein Interactions Using Only Sequence Information

    Directory of Open Access Journals (Sweden)

    Muppirala Usha K

    2011-12-01

    Full Text Available Abstract Background RNA-protein interactions (RPIs play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. Results We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB, RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC curve of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99% of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Conclusions Our experiments with RPISeq demonstrate that RNA-protein interactions can be

  18. Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

    Directory of Open Access Journals (Sweden)

    Samer Haidar

    2017-01-01

    Full Text Available Protein kinase CK2, initially designated as casein kinase 2, is an ubiquitously expressed serine/threonine kinase. This enzyme, implicated in many cellular processes, is highly expressed and active in many tumor cells. A large number of compounds has been developed as inhibitors comprising different backbones. Beside others, structures with an indeno[1,2-b]indole scaffold turned out to be potent new leads. With the aim of developing new inhibitors of human protein kinase CK2, we report here on the generation of common feature pharmacophore model to further explain the binding requirements for human CK2 inhibitors. Nine common chemical features of indeno[1,2-b]indole-type CK2 inhibitors were determined using MOE software (Chemical Computing Group, Montreal, Canada. This pharmacophore model was used for database mining with the aim to identify novel scaffolds for developing new potent and selective CK2 inhibitors. Using this strategy several structures were selected by searching inside the ZINC compound database. One of the selected compounds was bikaverin (6,11-dihydroxy-3,8-dimethoxy-1-methylbenzo[b]xanthene-7,10,12-trione, a natural compound which is produced by several kinds of fungi. This compound was tested on human recombinant CK2 and turned out to be an active inhibitor with an IC50 value of 1.24 µM.

  19. Differentiating rectal carcinoma by an immunohistological analysis of carcinomas of pelvic organs based on the NCBI Literature Survey and the Human Protein Atlas database.

    Science.gov (United States)

    Miura, Koh; Ishida, Kazuyuki; Fujibuchi, Wataru; Ito, Akihiro; Niikura, Hitoshi; Ogawa, Hitoshi; Sasaki, Iwao

    2012-06-01

    The treatments and prognoses of pelvic organ carcinomas differ, depending on whether the primary tumor originated in the rectum, urinary bladder, prostate, ovary, or uterus; therefore, it is essential to diagnose pathologically the primary origin and stages of these tumors. To establish the panels of immunohistochemical markers for differential diagnosis, we reviewed 91 of the NCBI articles on these topics and found that the results correlated closely with those of the public protein database, the Human Protein Atlas. The results revealed the panels of immunohistochemical markers for the differential diagnosis of rectal adenocarcinoma, in which [+] designates positivity in rectal adenocarcinoma and [-] designates negativity in rectal adenocarcinoma: from bladder adenocarcinoma, CDX2[+], VIL1[+], KRT7[-], THBD[-] and UPK3A[-]; from prostate adenocarcinoma, CDX2[+], VIL1[+], CEACAM5[+], KLK3(PSA)[-], ACPP(PAP)[-] and SLC45A3(prostein)[-]; and from ovarian mucinous adenocarcinoma, CEACAM5[+], VIL1[+], CDX2[+], KRT7[-] and MUC5AC[-]. The panels of markers distinguishing ovarian serous adenocarcinoma, cervical carcinoma, and endometrial adenocarcinoma were also represented. Such a comprehensive review on the differential diagnosis of carcinomas of pelvic organs has not been reported before. Thus, much information has been accumulated in public databases to provide an invaluable resource for clinicians and researchers.

  20. A resource for benchmarking the usefulness of protein structure models

    Directory of Open Access Journals (Sweden)

    Carbajo Daniel

    2012-08-01

    Full Text Available Abstract Background Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. Results This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. Conclusions The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. Implementation, availability and requirements Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php. Operating system(s: Platform independent. Programming language: Perl-BioPerl (program; mySQL, Perl DBI and DBD modules (database; php, JavaScript, Jmol scripting (web server. Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet and PSAIA. License: Free. Any