visualizing protein database: Topics by WorldWideScience.org

Sample records for visualizing protein database

SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces

Directory of Open Access Journals (Sweden)

Schroeder Michael

2006-03-01

Full Text Available Abstract Background Currently there is a strong need for methods that help to obtain an accurate description of protein interfaces in order to be able to understand the principles that govern molecular recognition and protein function. Many of the recent efforts to computationally identify and characterize protein networks extract protein interaction information at atomic resolution from the PDB. However, they pay none or little attention to small protein ligands and solvent. They are key components and mediators of protein interactions and fundamental for a complete description of protein interfaces. Interactome profiling requires the development of computational tools to extract and analyze protein-protein, protein-ligand and detailed solvent interaction information from the PDB in an automatic and comparative fashion. Adding this information to the existing one on protein-protein interactions will allow us to better understand protein interaction networks and protein function. Description SCOWLP (Structural Characterization Of Water, Ligands and Proteins is a user-friendly and publicly accessible web-based relational database for detailed characterization and visualization of the PDB protein interfaces. The SCOWLP database includes proteins, peptidic-ligands and interface water molecules as descriptors of protein interfaces. It contains currently 74,907 protein interfaces and 2,093,976 residue-residue interactions formed by 60,664 structural units (protein domains and peptidic-ligands and their interacting solvent. The SCOWLP web-server allows detailed structural analysis and comparisons of protein interfaces at atomic level by text query of PDB codes and/or by navigating a SCOP-based tree. It includes a visualization tool to interactively display the interfaces and label interacting residues and interface solvent by atomic physicochemical properties. SCOWLP is automatically updated with every SCOP release. Conclusion SCOWLP enriches
CellMap visualizes protein-protein interactions and subcellular localization

Science.gov (United States)

Dallago, Christian; Goldberg, Tatyana; Andrade-Navarro, Miguel Angel; Alanis-Lobato, Gregorio; Rost, Burkhard

2018-01-01

Many tools visualize protein-protein interaction (PPI) networks. The tool introduced here, CellMap, adds one crucial novelty by visualizing PPI networks in the context of subcellular localization, i.e. the location in the cell or cellular component in which a PPI happens. Users can upload images of cells and define areas of interest against which PPIs for selected proteins are displayed (by default on a cartoon of a cell). Annotations of localization are provided by the user or through our in-house database. The visualizer and server are written in JavaScript, making CellMap easy to customize and to extend by researchers and developers. PMID:29497493
MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

Science.gov (United States)

Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

2018-05-08

Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on
PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

Science.gov (United States)

Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

2017-09-01

We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Full Data of Yeast Interacting Proteins Database (Original Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Yeast Interacting Proteins Database Full Data of Yeast Interacting Proteins Database (Origin...al Version) Data detail Data name Full Data of Yeast Interacting Proteins Database (Original Version) DOI 10....18908/lsdba.nbdc00742-004 Description of data contents The entire data in the Yeast Interacting Proteins Database...eir interactions are required. Several sources including YPD (Yeast Proteome Database, Costanzo, M. C., Hoga...ematic name in the SGD (Saccharomyces Genome Database; http://www.yeastgenome.org /). Bait gene name The gen
Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

Science.gov (United States)

Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

2016-01-01

Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.
Thermodynamic database for proteins: features and applications.

Science.gov (United States)

Gromiha, M Michael; Sarai, Akinori

2010-01-01

We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.
Update History of This Database - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Yeast Interacting Proteins Database Update History of This Database Date Update contents 201...0/03/29 Yeast Interacting Proteins Database English archive site is opened. 2000/12/4 Yeast Interacting Proteins Database...( http://itolab.cb.k.u-tokyo.ac.jp/Y2H/ ) is released. About This Database Database Description... Download License Update History of This Database Site Policy | Contact Us Update History of This Database... - Yeast Interacting Proteins Database | LSDB Archive ...
The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists.

OpenAIRE

Sven Heinicke; Michael S Livstone; Charles Lu; Rose Oughtred; Fan Kang; Samuel V Angiuoli; Owen White; David Botstein; Kara Dolinski

2007-01-01

Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic r...
Database Description - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Yeast Interacting Proteins Database Database Description General information of database Database... name Yeast Interacting Proteins Database Alternative name - DOI 10.18908/lsdba.nbdc00742-000 Creator C...-ken 277-8561 Tel: +81-4-7136-3989 FAX: +81-4-7136-3979 E-mail : Database classif...s cerevisiae Taxonomy ID: 4932 Database description Information on interactions and related information obta...l Acad Sci U S A. 2001 Apr 10;98(8):4569-74. Epub 2001 Mar 13. External Links: Original website information Database
iPfam: a database of protein family and domain interactions found in the Protein Data Bank.

Science.gov (United States)

Finn, Robert D; Miller, Benjamin L; Clements, Jody; Bateman, Alex

2014-01-01

The database iPfam, available at http://ipfam.org, catalogues Pfam domain interactions based on known 3D structures that are found in the Protein Data Bank, providing interaction data at the molecular level. Previously, the iPfam domain-domain interaction data was integrated within the Pfam database and website, but it has now been migrated to a separate database. This allows for independent development, improving data access and giving clearer separation between the protein family and interactions datasets. In addition to domain-domain interactions, iPfam has been expanded to include interaction data for domain bound small molecule ligands. Functional annotations are provided from source databases, supplemented by the incorporation of Wikipedia articles where available. iPfam (version 1.0) contains >9500 domain-domain and 15 500 domain-ligand interactions. The new website provides access to this data in a variety of ways, including interactive visualizations of the interaction data.
Visualization of protein interaction networks: problems and solutions

Directory of Open Access Journals (Sweden)

Agapito Giuseppe

2013-01-01

Full Text Available Abstract Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins and edges (interactions, the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i technology, i.e. availability/license of the software and supported OS (Operating System platforms; (ii interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the
The PMDB Protein Model Database

Science.gov (United States)

Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

2006-01-01

The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873
HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

Science.gov (United States)

Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

2015-04-01

The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
Protein-Protein Interaction Databases

DEFF Research Database (Denmark)

Szklarczyk, Damian; Jensen, Lars Juhl

2015-01-01

Years of meticulous curation of scientific literature and increasingly reliable computational predictions have resulted in creation of vast databases of protein interaction data. Over the years, these repositories have become a basic framework in which experiments are analyzed and new directions...
Core Data of Yeast Interacting Proteins Database (Original Version) - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available y are in the reverse direction. *1 A comprehensive two-hybrid analysis to explore the yeast protein interact...s. 2000 Jan 1;28(1):73-6. *2 The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive...000 Jan 1;28(1):73-6. *3 A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisia
ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.

Science.gov (United States)

Huan, Tianxiao; Sivachenko, Andrey Y; Harrison, Scott H; Chen, Jake Y

2008-08-12

according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network. The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies.
Visualizing information across multidimensional post-genomic structured and textual databases.

Science.gov (United States)

Tao, Ying; Friedman, Carol; Lussier, Yves A

2005-04-15

Visualizing relationships among biological information to facilitate understanding is crucial to biological research during the post-genomic era. Although different systems have been developed to view gene-phenotype relationships for specific databases, very few have been designed specifically as a general flexible tool for visualizing multidimensional genotypic and phenotypic information together. Our goal is to develop a method for visualizing multidimensional genotypic and phenotypic information and a model that unifies different biological databases in order to present the integrated knowledge using a uniform interface. We developed a novel, flexible and generalizable visualization tool, called PhenoGenesviewer (PGviewer), which in this paper was used to display gene-phenotype relationships from a human-curated database (OMIM) and from an automatic method using a Natural Language Processing tool called BioMedLEE. Data obtained from multiple databases were first integrated into a uniform structure and then organized by PGviewer. PGviewer provides a flexible query interface that allows dynamic selection and ordering of any desired dimension in the databases. Based on users' queries, results can be visualized using hierarchical expandable trees that present views specified by users according to their research interests. We believe that this method, which allows users to dynamically organize and visualize multiple dimensions, is a potentially powerful and promising tool that should substantially facilitate biological research. PhenogenesViewer as well as its support and tutorial are available at http://www.dbmi.columbia.edu/pgviewer/ Lussier@dbmi.columbia.edu.
PDTD: a web-accessible protein database for drug target identification

Directory of Open Access Journals (Sweden)

Gao Zhenting

2008-02-01

Full Text Available Abstract Background Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking http://www.dddc.ac.cn/tarfisdock, which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD, and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation. Description PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores. Conclusion PDTD serves as a comprehensive and
A protein domain interaction interface database: InterPare

Directory of Open Access Journals (Sweden)

Lee Jungsul

2005-08-01

Full Text Available Abstract Background Most proteins function by interacting with other molecules. Their interaction interfaces are highly conserved throughout evolution to avoid undesirable interactions that lead to fatal disorders in cells. Rational drug discovery includes computational methods to identify the interaction sites of lead compounds to the target molecules. Identifying and classifying protein interaction interfaces on a large scale can help researchers discover drug targets more efficiently. Description We introduce a large-scale protein domain interaction interface database called InterPare http://interpare.net. It contains both inter-chain (between chains interfaces and intra-chain (within chain interfaces. InterPare uses three methods to detect interfaces: 1 the geometric distance method for checking the distance between atoms that belong to different domains, 2 Accessible Surface Area (ASA, a method for detecting the buried region of a protein that is detached from a solvent when forming multimers or complexes, and 3 the Voronoi diagram, a computational geometry method that uses a mathematical definition of interface regions. InterPare includes visualization tools to display protein interior, surface, and interaction interfaces. It also provides statistics such as the amino acid propensities of queried protein according to its interior, surface, and interface region. The atom coordinates that belong to interface, surface, and interior regions can be downloaded from the website. Conclusion InterPare is an open and public database server for protein interaction interface information. It contains the large-scale interface data for proteins whose 3D-structures are known. As of November 2004, there were 10,583 (Geometric distance, 10,431 (ASA, and 11,010 (Voronoi diagram entries in the Protein Data Bank (PDB containing interfaces, according to the above three methods. In the case of the geometric distance method, there are 31,620 inter-chain domain

Developing Visualization Support System for Teaching/Learning Database Normalization

Science.gov (United States)

Folorunso, Olusegun; Akinwale, AdioTaofeek

2010-01-01

Purpose: In tertiary institution, some students find it hard to learn database design theory, in particular, database normalization. The purpose of this paper is to develop a visualization tool to give students an interactive hands-on experience in database normalization process. Design/methodology/approach: The model-view-controller architecture…
MIPS: a database for genomes and protein sequences.

Science.gov (United States)

Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
Visual Querying in Chemical Databases using SMARTS Patterns

OpenAIRE

Šípek, Vojtěch

2014-01-01

The purpose of this thesis is to create framework for visual querying in chemical databases which will be implemented as a web application. By using graphical editor, which is a part of client side, the user creates queries which are translated into chemical query language SMARTS. This query is parsed on the application server which is connected to the chemical database. This framework also contains tooling for creating the database and index structure above it. 1
HCVpro: Hepatitis C virus protein interaction database

KAUST Repository

Kwofie, Samuel K.

2011-12-01

It is essential to catalog characterized hepatitis C virus (HCV) protein-protein interaction (PPI) data and the associated plethora of vital functional information to augment the search for therapies, vaccines and diagnostic biomarkers. In furtherance of these goals, we have developed the hepatitis C virus protein interaction database (HCVpro) by integrating manually verified hepatitis C virus-virus and virus-human protein interactions curated from literature and databases. HCVpro is a comprehensive and integrated HCV-specific knowledgebase housing consolidated information on PPIs, functional genomics and molecular data obtained from a variety of virus databases (VirHostNet, VirusMint, HCVdb and euHCVdb), and from BIND and other relevant biology repositories. HCVpro is further populated with information on hepatocellular carcinoma (HCC) related genes that are mapped onto their encoded cellular proteins. Incorporated proteins have been mapped onto Gene Ontologies, canonical pathways, Online Mendelian Inheritance in Man (OMIM) and extensively cross-referenced to other essential annotations. The database is enriched with exhaustive reviews on structure and functions of HCV proteins, current state of drug and vaccine development and links to recommended journal articles. Users can query the database using specific protein identifiers (IDs), chromosomal locations of a gene, interaction detection methods, indexed PubMed sources as well as HCVpro, BIND and VirusMint IDs. The use of HCVpro is free and the resource can be accessed via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. © 2011 Elsevier B.V.
UbiProt: a database of ubiquitylated proteins

Directory of Open Access Journals (Sweden)

Kondratieva Ekaterina V

2007-04-01

Full Text Available Abstract Background Post-translational protein modification with ubiquitin, or ubiquitylation, is one of the hottest topics in a modern biology due to a dramatic impact on diverse metabolic pathways and involvement in pathogenesis of severe human diseases. A great number of eukaryotic proteins was found to be ubiquitylated. However, data about particular ubiquitylated proteins are rather disembodied. Description To fill a general need for collecting and systematizing experimental data concerning ubiquitylation we have developed a new resource, UbiProt Database, a knowledgebase of ubiquitylated proteins. The database contains retrievable information about overall characteristics of a particular protein, ubiquitylation features, related ubiquitylation and de-ubiquitylation machinery and literature references reflecting experimental evidence of ubiquitylation. UbiProt is available at http://ubiprot.org.ru for free. Conclusion UbiProt Database is a public resource offering comprehensive information on ubiquitylated proteins. The resource can serve as a general reference source both for researchers in ubiquitin field and those who deal with particular ubiquitylated proteins which are of their interest. Further development of the UbiProt Database is expected to be of common interest for research groups involved in studies of the ubiquitin system.
Visualizing Data and the Online FRED Database

Science.gov (United States)

Méndez-Carbajo, Diego

2015-01-01

The author discusses a pedagogical strategy based on data visualization and analysis in the teaching of intermediate macroeconomics and financial economics. In these short projects, students collect and manipulate economic data from the online Federal Reserve Economic Database (FRED) in order to illustrate theoretical relationships discussed in…
Prototyping visual interface for maintenance and supply databases

OpenAIRE

Fore, Henry Ray

1989-01-01

Approved for public release; distribution is unlimited This research examined the feasibility of providing a visual interface to standard Army Management Information Systems at the unit level. The potential of improving the Human-Machine Interface of unit level maintenance and supply software, such as ULLS (Unit Level Logistics System), is very attractive. A prototype was implemented in GLAD (Graphics Language for Database). GLAD is a graphics object-oriented environment for databases t...
Visualizing the semantic content of large text databases using text maps

Science.gov (United States)

Combs, Nathan

1993-01-01

A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
Completion of autobuilt protein models using a database of protein fragments

International Nuclear Information System (INIS)

Cowtan, Kevin

2012-01-01

Two developments in the process of automated protein model building in the Buccaneer software are described: the use of a database of protein fragments in improving the model completeness and the assembly of disconnected chain fragments into complete molecules. Two developments in the process of automated protein model building in the Buccaneer software are presented. A general-purpose library for protein fragments of arbitrary size is described, with a highly optimized search method allowing the use of a larger database than in previous work. The problem of assembling an autobuilt model into complete chains is discussed. This involves the assembly of disconnected chain fragments into complete molecules and the use of the database of protein fragments in improving the model completeness. Assembly of fragments into molecules is a standard step in existing model-building software, but the methods have not received detailed discussion in the literature
Improving decoy databases for protein folding algorithms

KAUST Repository

Lindsey, Aaron

2014-01-01

Copyright © 2014 ACM. Predicting protein structures and simulating protein folding are two of the most important problems in computational biology today. Simulation methods rely on a scoring function to distinguish the native structure (the most energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing redundant structures. We test our approach on 17 different decoy databases of varying size and type and show significant improvement across a variety of metrics. We also test our improved databases on a popular modern scoring function and show that they contain a greater number of native-like structures than the original databases, thereby producing a more rigorous database for testing scoring functions.
License - Yeast Interacting Proteins Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Yeast Interacting Proteins Database License to Use This Database Last updated : 2010/02/15 You may use this database...nal License described below. The Standard License specifies the license terms regarding the use of this database... and the requirements you must follow in using this database. The Additional ...the Standard License. Standard License The Standard License for this database is the license specified in th...e Creative Commons Attribution-Share Alike 2.1 Japan . If you use data from this database
The Protein Identifier Cross-Referencing (PICR service: reconciling protein identifiers across multiple source databases

Directory of Open Access Journals (Sweden)

Leinonen Rasko

2007-10-01

Full Text Available Abstract Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR service, a web application that provides interactive and programmatic (SOAP and REST access to a mapping algorithm that uses the UniProt Archive (UniParc as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV or Microsoft Excel (XLS files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR
Kin-Driver: a database of driver mutations in protein kinases.

Science.gov (United States)

Simonetti, Franco L; Tornador, Cristian; Nabau-Moretó, Nuria; Molina-Vila, Miguel A; Marino-Buslje, Cristina

2014-01-01

Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/ © The Author(s) 2014. Published by Oxford University Press.
The ABC (Analysing Biomolecular Contacts-database

Directory of Open Access Journals (Sweden)

Walter Peter

2007-03-01

Full Text Available As protein-protein interactions are one of the basic mechanisms in most cellular processes, it is desirable to understand the molecular details of protein-protein contacts and ultimately be able to predict which proteins interact. Interface areas on a protein surface that are involved in protein interactions exhibit certain characteristics. Therefore, several attempts were made to distinguish protein interactions from each other and to categorize them. One way of classification are the groups of transient and permanent interactions. Previously two of the authors analysed several properties for transient complexes such as the amino acid and secondary structure element composition and pairing preferences. Certainly, interfaces can be characterized by many more possible attributes and this is a subject of intense ongoing research. Although several freely available online databases exist that illuminate various aspects of protein-protein interactions, we decided to construct a new database collecting all desired interface features allowing for facile selection of subsets of complexes. As database-server we applied MySQL and the program logic was written in JAVA. Furthermore several class extensions and tools such as JMOL were included to visualize the interfaces and JfreeChart for the representation of diagrams and statistics. The contact data is automatically generated from standard PDB files by a tcl/tk-script running through the molecular visualization package VMD. Currently the database contains 536 interfaces extracted from 479 PDB files and it can be queried by various types of parameters. Here, we describe the database design and demonstrate its usefulness with a number of selected features.
The DExH/D protein family database.

Science.gov (United States)

Jankowsky, E; Jankowsky, A

2000-01-01

DExH/D proteins are essential for all aspects of cellular RNA metabolism and processing, in the replication of many viruses and in DNA replication. DExH/D proteins are subject to current biological, biochemical and biophysical research which provides a continuous wealth of data. The DExH/D protein family database compiles this information and makes it available over the WWW (http://www.columbia.edu/ ej67/dbhome.htm ). The database can be fully searched by text based queries, facilitating fast access to specific information about this important class of enzymes.
Database of Interacting Proteins (DIP)

Data.gov (United States)

U.S. Department of Health & Human Services — The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent...
AMYPdb: A database dedicated to amyloid precursor proteins

Directory of Open Access Journals (Sweden)

Delamarche Christian

2008-06-01

Full Text Available Abstract Background Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. Results We therefore created a free online knowledge database (AMYPdb dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. Conclusion AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.
MIPS: a database for protein sequences and complete genomes.

Science.gov (United States)

Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

1998-01-01

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
Visualization portal for genetic variation (VizGVar): a tool for interactive visualization of SNPs and somatic mutations in exons, genes and protein domains.

Science.gov (United States)

Solano-Román, Antonio; Alfaro-Arias, Verónica; Cruz-Castillo, Carlos; Orozco-Solano, Allan

2018-03-15

VizGVar was designed to meet the growing need of the research community for improved genomic and proteomic data viewers that benefit from better information visualization. We implemented a new information architecture and applied user centered design principles to provide a new improved way of visualizing genetic information and protein data related to human disease. VizGVar connects the entire database of Ensembl protein motifs, domains, genes and exons with annotated SNPs and somatic variations from PharmGKB and COSMIC. VizGVar precisely represents genetic variations and their respective location by colored curves to designate different types of variations. The structured hierarchy of biological data is reflected in aggregated patterns through different levels, integrating several layers of information at once. VizGVar provides a new interactive, web-based JavaScript visualization of somatic mutations and protein variation, enabling fast and easy discovery of clinically relevant variation patterns. VizGVar is accessible at http://vizport.io/vizgvar; http://vizport.io/vizgvar/doc/. asolano@broadinstitute.org or allan.orozcosolano@ucr.ac.cr.
A Visual Language for Protein Design

KAUST Repository

Cox, Robert Sidney

2017-02-08

As protein engineering becomes more sophisticated, practitioners increasingly need to share diagrams for communicating protein designs. To this end, we present a draft visual language, Protein Language, that describes the high-level architecture of an engineered protein with easy-to draw glyphs, intended to be compatible with other biological diagram languages such as SBOL Visual and SBGN. Protein Language consists of glyphs for representing important features (e.g., globular domains, recognition and localization sequences, sites of covalent modification, cleavage and catalysis), rules for composing these glyphs to represent complex architectures, and rules constraining the scaling and styling of diagrams. To support Protein Language we have implemented an extensible web-based software diagram tool, Protein Designer, that uses Protein Language in a

A Visual Language for Protein Design

KAUST Repository

Cox, Robert Sidney; McLaughlin, James Alastair; Grunberg, Raik; Beal, Jacob; Wipat, Anil; Sauro, Herbert M.

2017-01-01

As protein engineering becomes more sophisticated, practitioners increasingly need to share diagrams for communicating protein designs. To this end, we present a draft visual language, Protein Language, that describes the high-level architecture of an engineered protein with easy-to draw glyphs, intended to be compatible with other biological diagram languages such as SBOL Visual and SBGN. Protein Language consists of glyphs for representing important features (e.g., globular domains, recognition and localization sequences, sites of covalent modification, cleavage and catalysis), rules for composing these glyphs to represent complex architectures, and rules constraining the scaling and styling of diagrams. To support Protein Language we have implemented an extensible web-based software diagram tool, Protein Designer, that uses Protein Language in a
TOPDOM: database of conservatively located domains and motifs in proteins.

Science.gov (United States)

Varga, Julia; Dobson, László; Tusnády, Gábor E

2016-09-01

The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
MultitaskProtDB: a database of multitasking proteins.

Science.gov (United States)

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.
SpirPro: A Spirulina proteome database and web-based tools for the analysis of protein-protein interactions at the metabolic level in Spirulina (Arthrospira) platensis C1.

Science.gov (United States)

Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee

2015-07-29

Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web
A web-based data visualization tool for the MIMIC-II database.

Science.gov (United States)

Lee, Joon; Ribey, Evan; Wallace, James R

2016-02-04

Although MIMIC-II, a public intensive care database, has been recognized as an invaluable resource for many medical researchers worldwide, becoming a proficient MIMIC-II researcher requires knowledge of SQL programming and an understanding of the MIMIC-II database schema. These are challenging requirements especially for health researchers and clinicians who may have limited computer proficiency. In order to overcome this challenge, our objective was to create an interactive, web-based MIMIC-II data visualization tool that first-time MIMIC-II users can easily use to explore the database. The tool offers two main features: Explore and Compare. The Explore feature enables the user to select a patient cohort within MIMIC-II and visualize the distributions of various administrative, demographic, and clinical variables within the selected cohort. The Compare feature enables the user to select two patient cohorts and visually compare them with respect to a variety of variables. The tool is also helpful to experienced MIMIC-II researchers who can use it to substantially accelerate the cumbersome and time-consuming steps of writing SQL queries and manually visualizing extracted data. Any interested researcher can use the MIMIC-II data visualization tool for free to quickly and conveniently conduct a preliminary investigation on MIMIC-II with a few mouse clicks. Researchers can also use the tool to learn the characteristics of the MIMIC-II patients. Since it is still impossible to conduct multivariable regression inside the tool, future work includes adding analytics capabilities. Also, the next version of the tool will aim to utilize MIMIC-III which contains more data.
Proteomics: Protein Identification Using Online Databases

Science.gov (United States)

Eurich, Chris; Fields, Peter A.; Rice, Elizabeth

2012-01-01

Proteomics is an emerging area of systems biology that allows simultaneous study of thousands of proteins expressed in cells, tissues, or whole organisms. We have developed this activity to enable high school or college students to explore proteomic databases using mass spectrometry data files generated from yeast proteins in a college laboratory…
Protein structure database search and evolutionary classification.

Science.gov (United States)

Yang, Jinn-Moon; Tung, Chi-Hua

2006-01-01

As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
Protein - AT Atlas | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ..._protein.zip File URL: ftp://ftp.biosciencedbc.jp/archive/at_atlas/LATEST/at_atla...About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Protein - AT Atlas | LSDB Archive ...
cuticleDB: a relational database of Arthropod cuticular proteins

Directory of Open Access Journals (Sweden)

Willis Judith H

2004-09-01

Full Text Available Abstract Background The insect exoskeleton or cuticle is a bi-partite composite of proteins and chitin that provides protective, skeletal and structural functions. Little information is available about the molecular structure of this important complex that exhibits a helicoidal architecture. Scores of sequences of cuticular proteins have been obtained from direct protein sequencing, from cDNAs, and from genomic analyses. Most of these cuticular protein sequences contain motifs found only in arthropod proteins. Description cuticleDB is a relational database containing all structural proteins of Arthropod cuticle identified to date. Many come from direct sequencing of proteins isolated from cuticle and from sequences from cDNAs that share common features with these authentic cuticular proteins. It also includes proteins from the Drosophila melanogaster and the Anopheles gambiae genomes, that have been predicted to be cuticular proteins, based on a Pfam motif (PF00379 responsible for chitin binding in Arthropod cuticle. The total number of the database entries is 445: 370 derive from insects, 60 from Crustacea and 15 from Chelicerata. The database can be accessed from our web server at http://bioinformatics.biol.uoa.gr/cuticleDB. Conclusions CuticleDB was primarily designed to contain correct and full annotation of cuticular protein data. The database will be of help to future genome annotators. Users will be able to test hypotheses for the existence of known and also of yet unknown motifs in cuticular proteins. An analysis of motifs may contribute to understanding how proteins contribute to the physical properties of cuticle as well as to the precise nature of their interaction with chitin.
DB-PABP: a database of polyanion-binding proteins.

Science.gov (United States)

Fang, Jianwen; Dong, Yinghua; Salamat-Miller, Nazila; Middaugh, C Russell

2008-01-01

The interactions between polyanions (PAs) and polyanion-binding proteins (PABPs) have been found to play significant roles in many essential biological processes including intracellular organization, transport and protein folding. Furthermore, many neurodegenerative disease-related proteins are PABPs. Thus, a better understanding of PA/PABP interactions may not only enhance our understandings of biological systems but also provide new clues to these deadly diseases. The literature in this field is widely scattered, suggesting the need for a comprehensive and searchable database of PABPs. The DB-PABP is a comprehensive, manually curated and searchable database of experimentally characterized PABPs. It is freely available and can be accessed online at http://pabp.bcf.ku.edu/DB_PABP/. The DB-PABP was implemented as a MySQL relational database. An interactive web interface was created using Java Server Pages (JSP). The search page of the database is organized into a main search form and a section for utilities. The main search form enables custom searches via four menus: protein names, polyanion names, the source species of the proteins and the methods used to discover the interactions. Available utilities include a commonality matrix, a function of listing PABPs by the number of interacting polyanions and a string search for author surnames. The DB-PABP is maintained at the University of Kansas. We encourage users to provide feedback and submit new data and references.
A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

Science.gov (United States)

Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

2010-08-01

The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803

OpenAIRE

Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

2008-01-01

Background Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. Description We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactio...
CRAVE: a database, middleware and visualization system for phenotype ontologies.

Science.gov (United States)

Gkoutos, Georgios V; Green, Eain C J; Greenaway, Simon; Blake, Andrew; Mallon, Ann-Marie; Hancock, John M

2005-04-01

A major challenge in modern biology is to link genome sequence information to organismal function. In many organisms this is being done by characterizing phenotypes resulting from mutations. Efficiently expressing phenotypic information requires combinatorial use of ontologies. However tools are not currently available to visualize combinations of ontologies. Here we describe CRAVE (Concept Relation Assay Value Explorer), a package allowing storage, active updating and visualization of multiple ontologies. CRAVE is a web-accessible JAVA application that accesses an underlying MySQL database of ontologies via a JAVA persistent middleware layer (Chameleon). This maps the database tables into discrete JAVA classes and creates memory resident, interlinked objects corresponding to the ontology data. These JAVA objects are accessed via calls through the middleware's application programming interface. CRAVE allows simultaneous display and linking of multiple ontologies and searching using Boolean and advanced searches.
LoopX: A Graphical User Interface-Based Database for Comprehensive Analysis and Comparative Evaluation of Loops from Protein Structures.

Science.gov (United States)

Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna

2017-10-01

Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Protein (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available ut This Database Database Description Download License Update History of This Database Site Policy | Contact Us Protein (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Protein (Cyanobacteria) Data detail Data name Protein (Cyanobacteria) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
SHEETSPAIR: A Database of Amino Acid Pairs in Protein Sheet Structures

Directory of Open Access Journals (Sweden)

Ning Zhang

2007-10-01

Full Text Available Within folded strands of a protein, amino acids (AAs on every adjacent two strands form a pair of AAs. To explore the interactions between strands in a protein sheet structure, we have established an Internet-accessible relational database named SheetsPairs based on SQL Server 2000. The database has collected AAs pairs in proteins with detailed information. Furthermore, it utilizes a non-freetext database structure to store protein sequences and a specific database table with a unique number to store strands, which provides more searching options and rapid and accurate access to data queries. An IIS web server has been set up for data retrieval through a custom web interface, which enables complex data queries. Also searchable are parallel or anti-parallel folded strands and the list of strands in a specified protein.
MIPS: a database for protein sequences, homology data and yeast genome information.

Science.gov (United States)

Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

1997-01-01

The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498
ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval.

Science.gov (United States)

Wang, Jingyan; Gao, Xin; Wang, Quanquan; Li, Yongping

2012-05-08

The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N(i) and N(j).Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial
A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila

Directory of Open Access Journals (Sweden)

Parrish Jodi R

2006-04-01

Full Text Available Abstract Background Biological processes are mediated by networks of interacting genes and proteins. Efforts to map and understand these networks are resulting in the proliferation of interaction data derived from both experimental and computational techniques for a number of organisms. The volume of this data combined with the variety of specific forms it can take has created a need for comprehensive databases that include all of the available data sets, and for exploration tools to facilitate data integration and analysis. One powerful paradigm for the navigation and analysis of interaction data is an interaction graph or map that represents proteins or genes as nodes linked by interactions. Several programs have been developed for graphical representation and analysis of interaction data, yet there remains a need for alternative programs that can provide casual users with rapid easy access to many existing and emerging data sets. Description Here we describe a comprehensive database of Drosophila gene and protein interactions collected from a variety of sources, including low and high throughput screens, genetic interactions, and computational predictions. We also present a program for exploring multiple interaction data sets and for combining data from different sources. The program, referred to as the Interaction Map (IM Browser, is a web-based application for searching and visualizing interaction data stored in a relational database system. Use of the application requires no downloads and minimal user configuration or training, thereby enabling rapid initial access to interaction data. IM Browser was designed to readily accommodate and integrate new types of interaction data as it becomes available. Moreover, all information associated with interaction measurements or predictions and the genes or proteins involved are accessible to the user. This allows combined searches and analyses based on either common or technique-specific attributes
Filling and mining the reactive metabolite target protein database.

Science.gov (United States)

Hanzlik, Robert P; Fang, Jianwen; Koen, Yakov M

2009-04-15

The post-translational modification of proteins is a well-known endogenous mechanism for regulating protein function and activity. Cellular proteins are also susceptible to post-translational modification by xenobiotic agents that possess, or whose metabolites possess, significant electrophilic character. Such non-physiological modifications to endogenous proteins are sometimes benign, but in other cases they are strongly associated with, and are presumed to cause, lethal cytotoxic consequences via necrosis and/or apoptosis. The Reactive Metabolite Target Protein Database (TPDB) is a searchable, freely web-accessible (http://tpdb.medchem.ku.edu:8080/protein_database/) resource that attempts to provide a comprehensive, up-to-date listing of known reactive metabolite target proteins. In this report we characterize the TPDB by reviewing briefly how the information it contains came to be known. We also compare its information to that provided by other types of "-omics" studies relevant to toxicology, and we illustrate how bioinformatic analysis of target proteins may help to elucidate mechanisms of cytotoxic responses to reactive metabolites.

EKPD: a hierarchical database of eukaryotic protein kinases and protein phosphatases.

Science.gov (United States)

Wang, Yongbo; Liu, Zexian; Cheng, Han; Gao, Tianshun; Pan, Zhicheng; Yang, Qing; Guo, Anyuan; Xue, Yu

2014-01-01

We present here EKPD (http://ekpd.biocuckoo.org), a hierarchical database of eukaryotic protein kinases (PKs) and protein phosphatases (PPs), the key molecules responsible for the reversible phosphorylation of proteins that are involved in almost all aspects of biological processes. As extensive experimental and computational efforts have been carried out to identify PKs and PPs, an integrative resource with detailed classification and annotation information would be of great value for both experimentalists and computational biologists. In this work, we first collected 1855 PKs and 347 PPs from the scientific literature and various public databases. Based on previously established rationales, we classified all of the known PKs and PPs into a hierarchical structure with three levels, i.e. group, family and individual PK/PP. There are 10 groups with 149 families for the PKs and 10 groups with 33 families for the PPs. We constructed 139 and 27 Hidden Markov Model profiles for PK and PP families, respectively. Then we systematically characterized ∼50,000 PKs and >10,000 PPs in eukaryotes. In addition, >500 PKs and >400 PPs were computationally identified by ortholog search. Finally, the online service of the EKPD database was implemented in PHP + MySQL + JavaScript.
Using the clustered circular layout as an informative method for visualizing protein-protein interaction networks.

Science.gov (United States)

Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee

2010-07-01

The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.
PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

Directory of Open Access Journals (Sweden)

Picard-Cloutier Aude

2007-12-01

Full Text Available Abstract Background In the "post-genome" era, mass spectrometry (MS has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5.
The reactive metabolite target protein database (TPDB)--a web-accessible resource.

Science.gov (United States)

Hanzlik, Robert P; Koen, Yakov M; Theertham, Bhargav; Dong, Yinghua; Fang, Jianwen

2007-03-16

The toxic effects of many simple organic compounds stem from their biotransformation to chemically reactive metabolites which bind covalently to cellular proteins. To understand the mechanisms of cytotoxic responses it may be important to know which proteins become adducted and whether some may be common targets of multiple toxins. The literature of this field is widely scattered but expanding rapidly, suggesting the need for a comprehensive, searchable database of reactive metabolite target proteins. The Reactive Metabolite Target Protein Database (TPDB) is a comprehensive, curated, searchable, documented compilation of publicly available information on the protein targets of reactive metabolites of 18 well-studied chemicals and drugs of known toxicity. TPDB software enables i) string searches for author names and proteins names/synonyms, ii) more complex searches by selecting chemical compound, animal species, target tissue and protein names/synonyms from pull-down menus, and iii) commonality searches over multiple chemicals. Tabulated search results provide information, references and links to other databases. The TPDB is a unique on-line compilation of information on the covalent modification of cellular proteins by reactive metabolites of chemicals and drugs. Its comprehensiveness and searchability should facilitate the elucidation of mechanisms of reactive metabolite toxicity. The database is freely available at http://tpdb.medchem.ku.edu/tpdb.html.
BoreholeAR: A mobile tablet application for effective borehole database visualization using an augmented reality technology

Science.gov (United States)

Lee, Sangho; Suh, Jangwon; Park, Hyeong-Dong

2015-03-01

Boring logs are widely used in geological field studies since the data describes various attributes of underground and surface environments. However, it is difficult to manage multiple boring logs in the field as the conventional management and visualization methods are not suitable for integrating and combining large data sets. We developed an iPad application to enable its user to search the boring log rapidly and visualize them using the augmented reality (AR) technique. For the development of the application, a standard borehole database appropriate for a mobile-based borehole database management system was designed. The application consists of three modules: an AR module, a map module, and a database module. The AR module superimposes borehole data on camera imagery as viewed by the user and provides intuitive visualization of borehole locations. The map module shows the locations of corresponding borehole data on a 2D map with additional map layers. The database module provides data management functions for large borehole databases for other modules. Field survey was also carried out using more than 100,000 borehole data.
ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

KAUST Repository

Wang, Jim Jing-Yan

2012-05-08

Background: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update
The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families.

Science.gov (United States)

Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas

2018-04-01

The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https
Medicago PhosphoProtein Database: a repository for Medicago truncatula phosphoprotein data

Directory of Open Access Journals (Sweden)

Christopher M. Rose

2012-06-01

Full Text Available The ability of legume crops to fix atmospheric nitrogen via a symbiotic association with soil rhizobia makes them an essential component of many agricultural systems. Initiation of this symbiosis requires protein phosphorylation-mediated signaling in response to rhizobial signals named Nod factors. Medicago truncatula (Medicago is the model system for studying legume biology, making the study of its phosphoproteome essential. Here, we describe the Medicago Phosphoprotein Database (http://phospho.medicago.wisc.edu, a repository built to house phosphoprotein, phosphopeptide, and phosphosite data specific to Medicago. Currently, the Medicago Phosphoprotein Database holds 3,457 unique phosphopeptides that contain 3,404 non-redundant sites of phosphorylation on 829 proteins. Through the web-based interface, users are allowed to browse identified proteins or search for proteins of interest. Furthermore, we allow users to conduct BLAST searches of the database using both peptide sequences and phosphorylation motifs as queries. The data contained within the database are available for download to be investigated at the user’s discretion. The Medicago Phosphoprotein Database will be updated continually with novel phosphoprotein and phosphopeptide identifications, with the intent of constructing an unparalleled compendium of large-scale Medicago phosphorylation data.
Fragile X Mental Retardation Protein Is Required to Maintain Visual Conditioning-Induced Behavioral Plasticity by Limiting Local Protein Synthesis.

Science.gov (United States)

Liu, Han-Hsuan; Cline, Hollis T

2016-07-06

Fragile X mental retardation protein (FMRP) is thought to regulate neuronal plasticity by limiting dendritic protein synthesis, but direct demonstration of a requirement for FMRP control of local protein synthesis during behavioral plasticity is lacking. Here we tested whether FMRP knockdown in Xenopus optic tectum affects local protein synthesis in vivo and whether FMRP knockdown affects protein synthesis-dependent visual avoidance behavioral plasticity. We tagged newly synthesized proteins by incorporation of the noncanonical amino acid azidohomoalanine and visualized them with fluorescent noncanonical amino acid tagging (FUNCAT). Visual conditioning and FMRP knockdown produce similar increases in FUNCAT in tectal neuropil. Induction of visual conditioning-dependent behavioral plasticity occurs normally in FMRP knockdown animals, but plasticity degrades over 24 h. These results indicate that FMRP affects visual conditioning-induced local protein synthesis and is required to maintain the visual conditioning-induced behavioral plasticity. Fragile X syndrome (FXS) is the most common form of inherited intellectual disability. Exaggerated dendritic protein synthesis resulting from loss of fragile X mental retardation protein (FMRP) is thought to underlie cognitive deficits in FXS, but no direct evidence has demonstrated that FMRP-regulated dendritic protein synthesis affects behavioral plasticity in intact animals. Xenopus tadpoles exhibit a visual avoidance behavior that improves with visual conditioning in a protein synthesis-dependent manner. We showed that FMRP knockdown and visual conditioning dramatically increase protein synthesis in neuronal processes. Furthermore, induction of visual conditioning-dependent behavioral plasticity occurs normally after FMRP knockdown, but performance rapidly deteriorated in the absence of FMRP. These studies show that FMRP negatively regulates local protein synthesis and is required to maintain visual conditioning
Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

International Nuclear Information System (INIS)

Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.; Sasajima, K.; Matsugaki, N.; Suzuki, M.; Kosuge, T.; Wakatsuki, S.

2004-01-01

An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view, create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments
Computer systems and methods for the query and visualization of multidimensional databases

Science.gov (United States)

Stolte, Chris [Palo Alto, CA; Tang, Diane L [Palo Alto, CA; Hanrahan, Patrick [Portola Valley, CA

2011-02-01

In response to a user request, a computer generates a graphical user interface on a computer display. A schema information region of the graphical user interface includes multiple operand names, each operand name associated with one or more fields of a multi-dimensional database. A data visualization region of the graphical user interface includes multiple shelves. Upon detecting a user selection of the operand names and a user request to associate each user-selected operand name with a respective shelf in the data visualization region, the computer generates a visual table in the data visualization region in accordance with the associations between the operand names and the corresponding shelves. The visual table includes a plurality of panes, each pane having at least one axis defined based on data for the fields associated with a respective operand name.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

Science.gov (United States)

Truong, Kevin; Ikura, Mitsuhiko

2003-05-06

Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
iview: an interactive WebGL visualizer for protein-ligand complex.

Science.gov (United States)

Li, Hongjian; Leung, Kwong-Sak; Nakane, Takanori; Wong, Man-Hon

2014-02-25

Visualization of protein-ligand complex plays an important role in elaborating protein-ligand interactions and aiding novel drug design. Most existing web visualizers either rely on slow software rendering, or lack virtual reality support. The vital feature of macromolecular surface construction is also unavailable. We have developed iview, an easy-to-use interactive WebGL visualizer of protein-ligand complex. It exploits hardware acceleration rather than software rendering. It features three special effects in virtual reality settings, namely anaglyph, parallax barrier and oculus rift, resulting in visually appealing identification of intermolecular interactions. It supports four surface representations including Van der Waals surface, solvent excluded surface, solvent accessible surface and molecular surface. Moreover, based on the feature-rich version of iview, we have also developed a neat and tailor-made version specifically for our istar web platform for protein-ligand docking purpose. This demonstrates the excellent portability of iview. Using innovative 3D techniques, we provide a user friendly visualizer that is not intended to compete with professional visualizers, but to enable easy accessibility and platform independence.
Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases

International Nuclear Information System (INIS)

Nigsch, Florian; Mitchell, John B.O.

2008-01-01

The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of 'toxiclogical' profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we built a computational multiclass model using the Winnow algorithm based on a dataset of protein targets derived from the MDL Drug Data Report. A 15-fold Monte Carlo cross-validation using 50% of each class for training, and the remaining 50% for testing, provided an assessment of the accuracy of that model. We retained the 3 top-ranking predictions and found that in 82% of all cases the correct target was predicted within these three predictions. The first prediction was the correct one in almost 70% of cases. A model built on the whole protein target dataset was then used to predict the protein targets for 150 000 molecules from the MDL Toxicity Database. We analysed the frequency of the predictions across the panel of protein targets for experimentally determined toxicity classes of all molecules. This allowed us to identify clusters of proteins related by their toxicological profiles, as well as toxicities that are related. Literature-based evidence is provided for some specific clusters to show the relevance of the relationships identified
The reactive metabolite target protein database (TPDB – a web-accessible resource

Directory of Open Access Journals (Sweden)

Dong Yinghua

2007-03-01

Full Text Available Abstract Background The toxic effects of many simple organic compounds stem from their biotransformation to chemically reactive metabolites which bind covalently to cellular proteins. To understand the mechanisms of cytotoxic responses it may be important to know which proteins become adducted and whether some may be common targets of multiple toxins. The literature of this field is widely scattered but expanding rapidly, suggesting the need for a comprehensive, searchable database of reactive metabolite target proteins. Description The Reactive Metabolite Target Protein Database (TPDB is a comprehensive, curated, searchable, documented compilation of publicly available information on the protein targets of reactive metabolites of 18 well-studied chemicals and drugs of known toxicity. TPDB software enables i string searches for author names and proteins names/synonyms, ii more complex searches by selecting chemical compound, animal species, target tissue and protein names/synonyms from pull-down menus, and iii commonality searches over multiple chemicals. Tabulated search results provide information, references and links to other databases. Conclusion The TPDB is a unique on-line compilation of information on the covalent modification of cellular proteins by reactive metabolites of chemicals and drugs. Its comprehensiveness and searchability should facilitate the elucidation of mechanisms of reactive metabolite toxicity. The database is freely available at http://tpdb.medchem.ku.edu/tpdb.html
PACSY, a relational database management system for protein structure and chemical shift analysis.

Science.gov (United States)

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

2012-10-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
PACSY, a relational database management system for protein structure and chemical shift analysis

Energy Technology Data Exchange (ETDEWEB)

Lee, Woonghee, E-mail: whlee@nmrfam.wisc.edu [University of Wisconsin-Madison, National Magnetic Resonance Facility at Madison, and Biochemistry Department (United States); Yu, Wookyung [Center for Proteome Biophysics, Pusan National University, Department of Physics (Korea, Republic of); Kim, Suhkmann [Pusan National University, Department of Chemistry and Chemistry Institute for Functional Materials (Korea, Republic of); Chang, Iksoo [Center for Proteome Biophysics, Pusan National University, Department of Physics (Korea, Republic of); Lee, Weontae, E-mail: wlee@spin.yonsei.ac.kr [Yonsei University, Structural Biochemistry and Molecular Biophysics Laboratory, Department of Biochemistry (Korea, Republic of); Markley, John L., E-mail: markley@nmrfam.wisc.edu [University of Wisconsin-Madison, National Magnetic Resonance Facility at Madison, and Biochemistry Department (United States)

2012-10-15

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.eduhttp://pacsy.nmrfam.wisc.edu.
PACSY, a relational database management system for protein structure and chemical shift analysis

Science.gov (United States)

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo

2012-01-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636
PACSY, a relational database management system for protein structure and chemical shift analysis

International Nuclear Information System (INIS)

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L.

2012-01-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.eduhttp://pacsy.nmrfam.wisc.edu.
ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

Science.gov (United States)

Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

2017-01-01

Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.

Computer systems and methods for the query and visualization of multidimensional databases

Science.gov (United States)

Stolte, Chris; Tang, Diane L; Hanrahan, Patrick

2015-03-03

A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes multiple operand names, each operand corresponding to one or more fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first operands with the columns shelf and to associate one or more second operands with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first operands, and each pane has a y-axis defined based on data for the one or more second operands.
Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry

OpenAIRE

Pevzner, Pavel A.; Mulyukov, Zufar; Dancik, Vlado; Tang, Chris L

2001-01-01

Although protein identification by matching tandem mass spectra (MS/MS) against protein databases is a widespread tool in mass spectrometry, the question about reliability of such searches remains open. Absence of rigorous significance scores in MS/MS database search makes it difficult to discard random database hits and may lead to erroneous protein identification, particularly in the case of mutated or post-translationally modified peptides. This problem is especially important for high-thr...
FaceWarehouse: a 3D facial expression database for visual computing.

Science.gov (United States)

Cao, Chen; Weng, Yanlin; Zhou, Shun; Tong, Yiying; Zhou, Kun

2014-03-01

We present FaceWarehouse, a database of 3D facial expressions for visual computing applications. We use Kinect, an off-the-shelf RGBD camera, to capture 150 individuals aged 7-80 from various ethnic backgrounds. For each person, we captured the RGBD data of her different expressions, including the neutral expression and 19 other expressions such as mouth-opening, smile, kiss, etc. For every RGBD raw data record, a set of facial feature points on the color image such as eye corners, mouth contour, and the nose tip are automatically localized, and manually adjusted if better accuracy is required. We then deform a template facial mesh to fit the depth data as closely as possible while matching the feature points on the color image to their corresponding points on the mesh. Starting from these fitted face meshes, we construct a set of individual-specific expression blendshapes for each person. These meshes with consistent topology are assembled as a rank-3 tensor to build a bilinear face model with two attributes: identity and expression. Compared with previous 3D facial databases, for every person in our database, there is a much richer matching collection of expressions, enabling depiction of most human facial actions. We demonstrate the potential of FaceWarehouse for visual computing with four applications: facial image manipulation, face component transfer, real-time performance-based facial image animation, and facial animation retargeting from video to image.
HIP2: An online database of human plasma proteins from healthy individuals

Directory of Open Access Journals (Sweden)

Shen Changyu

2008-04-01

Full Text Available Abstract Background With the introduction of increasingly powerful mass spectrometry (MS techniques for clinical research, several recent large-scale MS proteomics studies have sought to characterize the entire human plasma proteome with a general objective for identifying thousands of proteins leaked from tissues in the circulating blood. Understanding the basic constituents, diversity, and variability of the human plasma proteome is essential to the development of sensitive molecular diagnosis and treatment monitoring solutions for future biomedical applications. Biomedical researchers today, however, do not have an integrated online resource in which they can search for plasma proteins collected from different mass spectrometry platforms, experimental protocols, and search software for healthy individuals. The lack of such a resource for comparisons has made it difficult to interpret proteomics profile changes in patients' plasma and to design protein biomarker discovery experiments. Description To aid future protein biomarker studies of disease and health from human plasma, we developed an online database, HIP2 (Healthy Human Individual's Integrated Plasma Proteome. The current version contains 12,787 protein entries linked to 86,831 peptide entries identified using different MS platforms. Conclusion This web-based database will be useful to biomedical researchers involved in biomarker discovery research. This database has been developed to be the comprehensive collection of healthy human plasma proteins, and has protein data captured in a relational database schema built to contain mappings of supporting peptide evidence from several high-quality and high-throughput mass-spectrometry (MS experimental data sets. Users can search for plasma protein/peptide annotations, peptide/protein alignments, and experimental/sample conditions with options for filter-based retrieval to achieve greater analytical power for discovery and validation.
CPLA 1.0: an integrated database of protein lysine acetylation.

Science.gov (United States)

Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu

2011-01-01

As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein-protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org.
Protein - TP Atlas | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...p_atlas_protein.zip File URL: ftp://ftp.biosciencedbc.jp/archive/tp_atlas/LATEST/...story of This Database Site Policy | Contact Us Protein - TP Atlas | LSDB Archive ...
An update of the DEF database of protein fold class predictions

DEFF Research Database (Denmark)

Reczko, Martin; Karras, Dimitris; Bohr, Henrik

1997-01-01

An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure re...... related to the sequence with high accuracy. The updated predictions system is developed using data from the new version of the 3D-ALI database of aligned protein structures and thus is giving more reliable and more detailed predictions than the previous DEF system.......An update is given on the Database of Expected Fold classes (DEF) that contains a collection of fold-class predictions made from protein sequences and a mail server that provides new predictions for new sequences. To any given sequence one of 49 fold-classes is chosen to classify the structure...
VASCo: computation and visualization of annotated protein surface contacts

Directory of Open Access Journals (Sweden)

Thallinger Gerhard G

2009-01-01

Full Text Available Abstract Background Structural data from crystallographic analyses contain a vast amount of information on protein-protein contacts. Knowledge on protein-protein interactions is essential for understanding many processes in living cells. The methods to investigate these interactions range from genetics to biophysics, crystallography, bioinformatics and computer modeling. Also crystal contact information can be useful to understand biologically relevant protein oligomerisation as they rely in principle on the same physico-chemical interaction forces. Visualization of crystal and biological contact data including different surface properties can help to analyse protein-protein interactions. Results VASCo is a program package for the calculation of protein surface properties and the visualization of annotated surfaces. Special emphasis is laid on protein-protein interactions, which are calculated based on surface point distances. The same approach is used to compare surfaces of two aligned molecules. Molecular properties such as electrostatic potential or hydrophobicity are mapped onto these surface points. Molecular surfaces and the corresponding properties are calculated using well established programs integrated into the package, as well as using custom developed programs. The modular package can easily be extended to include new properties for annotation. The output of the program is most conveniently displayed in PyMOL using a custom-made plug-in. Conclusion VASCo supplements other available protein contact visualisation tools and provides additional information on biological interactions as well as on crystal contacts. The tool provides a unique feature to compare surfaces of two aligned molecules based on point distances and thereby facilitates the visualization and analysis of surface differences.
Morphing methods to visualize coarse-grained protein dynamics.

Science.gov (United States)

Weiss, Dahlia R; Koehl, Patrice

2014-01-01

Morphing was initially developed as a cinematic effect, where one image is seamlessly transformed into another image. The technique was widely adopted by biologists to visualize the transition between protein conformational states, generating an interpolated pathway from an initial to a final protein structure. Geometric morphing seeks to create visually suggestive movies that illustrate structural changes between conformations but do not necessarily represent a biologically relevant pathway, while minimum energy path (MEP) interpolations aim at describing the true transition state between the crystal structure minima in the energy landscape.
Exploring Protein Function Using the Saccharomyces Genome Database.

Science.gov (United States)

Wong, Edith D

2017-01-01

Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.
VaProS: a database-integration approach for protein/genome information retrieval

KAUST Repository

Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R.; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

2016-01-01

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.
VaProS: a database-integration approach for protein/genome information retrieval

KAUST Repository

Gojobori, Takashi

2016-12-24

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.
muBLASTP: database-indexed protein sequence search on multicore CPUs.

Science.gov (United States)

Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun

2016-11-04

The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
Integrated remote sensing and visualization (IRSV) system for transportation infrastructure operations and management, phase two, volume 4 : web-based bridge information database--visualization analytics and distributed sensing.

Science.gov (United States)

2012-03-01

This report introduces the design and implementation of a Web-based bridge information visual analytics system. This : project integrates Internet, multiple databases, remote sensing, and other visualization technologies. The result : combines a GIS ...
Protein Structural Change Data - PSCDB | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us PSCDB Protein Structural Change Data Data detail Data name Protein Structural Change Data DO...History of This Database Site Policy | Contact Us Protein Structural Change Data - PSCDB | LSDB Archive ...
Development of human protein reference database as an initial platform for approaching systems biology in humans

DEFF Research Database (Denmark)

Peri, Suraj; Navarro, J Daniel; Amanchy, Ramars

2003-01-01

Human Protein Reference Database (HPRD) is an object database that integrates a wealth of information relevant to the function of human proteins in health and disease. Data pertaining to thousands of protein-protein interactions, posttranslational modifications, enzyme/substrate relationships...
Moving Observer Support for Databases

DEFF Research Database (Denmark)

Bukauskas, Linas

Interactive visual data explorations impose rigid requirements on database and visualization systems. Systems that visualize huge amounts of data tend to request large amounts of memory resources and heavily use the CPU to process and visualize data. Current systems employ a loosely coupled...... architecture to exchange data between database and visualization. Thus, the interaction of the visualizer and the database is kept to the minimum, which most often leads to superfluous data being passed from database to visualizer. This Ph.D. thesis presents a novel tight coupling of database and visualizer....... The thesis discusses the VR-tree, an extension of the R-tree that enables observer relative data extraction. To support incremental observer position relative data extraction the thesis proposes the Volatile Access Structure (VAST). VAST is a main memory structure that caches nodes of the VR-tree. VAST...
Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

Science.gov (United States)

Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

2018-01-16

Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.
Integrating protein structures and precomputed genealogies in the Magnum database: Examples with cellular retinoid binding proteins

Directory of Open Access Journals (Sweden)

Bradley Michael E

2006-02-01

Full Text Available Abstract Background When accurate models for the divergent evolution of protein sequences are integrated with complementary biological information, such as folded protein structures, analyses of the combined data often lead to new hypotheses about molecular physiology. This represents an excellent example of how bioinformatics can be used to guide experimental research. However, progress in this direction has been slowed by the lack of a publicly available resource suitable for general use. Results The precomputed Magnum database offers a solution to this problem for ca. 1,800 full-length protein families with at least one crystal structure. The Magnum deliverables include 1 multiple sequence alignments, 2 mapping of alignment sites to crystal structure sites, 3 phylogenetic trees, 4 inferred ancestral sequences at internal tree nodes, and 5 amino acid replacements along tree branches. Comprehensive evaluations revealed that the automated procedures used to construct Magnum produced accurate models of how proteins divergently evolve, or genealogies, and correctly integrated these with the structural data. To demonstrate Magnum's capabilities, we asked for amino acid replacements requiring three nucleotide substitutions, located at internal protein structure sites, and occurring on short phylogenetic tree branches. In the cellular retinoid binding protein family a site that potentially modulates ligand binding affinity was discovered. Recruitment of cellular retinol binding protein to function as a lens crystallin in the diurnal gecko afforded another opportunity to showcase the predictive value of a browsable database containing branch replacement patterns integrated with protein structures. Conclusion We integrated two areas of protein science, evolution and structure, on a large scale and created a precomputed database, known as Magnum, which is the first freely available resource of its kind. Magnum provides evolutionary and structural
CLIPZ: a database and analysis environment for experimentally determined binding sites of RNA-binding proteins.

Science.gov (United States)

Khorshid, Mohsen; Rodak, Christoph; Zavolan, Mihaela

2011-01-01

The stability, localization and translation rate of mRNAs are regulated by a multitude of RNA-binding proteins (RBPs) that find their targets directly or with the help of guide RNAs. Among the experimental methods for mapping RBP binding sites, cross-linking and immunoprecipitation (CLIP) coupled with deep sequencing provides transcriptome-wide coverage as well as high resolution. However, partly due to their vast volume, the data that were so far generated in CLIP experiments have not been put in a form that enables fast and interactive exploration of binding sites. To address this need, we have developed the CLIPZ database and analysis environment. Binding site data for RBPs such as Argonaute 1-4, Insulin-like growth factor II mRNA-binding protein 1-3, TNRC6 proteins A-C, Pumilio 2, Quaking and Polypyrimidine tract binding protein can be visualized at the level of the genome and of individual transcripts. Individual users can upload their own sequence data sets while being able to limit the access to these data to specific users, and analyses of the public and private data sets can be performed interactively. CLIPZ, available at http://www.clipz.unibas.ch, aims to provide an open access repository of information for post-transcriptional regulatory elements.

The master two-dimensional gel database of human AMA cell proteins: towards linking protein and genome sequence and mapping information (update 1991)

DEFF Research Database (Denmark)

Celis, J E; Leffers, H; Rasmussen, H H

1991-01-01

autoantigens" and "cDNAs". For convenience we have included an alphabetical list of all known proteins recorded in this database. In the long run, the main goal of this database is to link protein and DNA sequencing and mapping information (Human Genome Program) and to provide an integrated picture......The master two-dimensional gel database of human AMA cells currently lists 3801 cellular and secreted proteins, of which 371 cellular polypeptides (306 IEF; 65 NEPHGE) were added to the master images during the last 10 months. These include: (i) very basic and acidic proteins that do not focus...
STITCH 2: an interaction network database for small molecules and proteins

DEFF Research Database (Denmark)

Kuhn, Michael; Szklarczyk, Damian; Franceschini, Andrea

2010-01-01

Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug......-target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other...... chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74,000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/....
Linking genotypes database with locus-specific database and genotype-phenotype correlation in phenylketonuria.

Science.gov (United States)

Wettstein, Sarah; Underhaug, Jarl; Perez, Belen; Marsden, Brian D; Yue, Wyatt W; Martinez, Aurora; Blau, Nenad

2015-03-01

The wide range of metabolic phenotypes in phenylketonuria is due to a large number of variants causing variable impairment in phenylalanine hydroxylase function. A total of 834 phenylalanine hydroxylase gene variants from the locus-specific database PAHvdb and genotypes of 4181 phenylketonuria patients from the BIOPKU database were characterized using FoldX, SIFT Blink, Polyphen-2 and SNPs3D algorithms. Obtained data was correlated with residual enzyme activity, patients' phenotype and tetrahydrobiopterin responsiveness. A descriptive analysis of both databases was compiled and an interactive viewer in PAHvdb database was implemented for structure visualization of missense variants. We found a quantitative relationship between phenylalanine hydroxylase protein stability and enzyme activity (r(s) = 0.479), between protein stability and allelic phenotype (r(s) = -0.458), as well as between enzyme activity and allelic phenotype (r(s) = 0.799). Enzyme stability algorithms (FoldX and SNPs3D), allelic phenotype and enzyme activity were most powerful to predict patients' phenotype and tetrahydrobiopterin response. Phenotype prediction was most accurate in deleterious genotypes (≈ 100%), followed by homozygous (92.9%), hemizygous (94.8%), and compound heterozygous genotypes (77.9%), while tetrahydrobiopterin response was correctly predicted in 71.0% of all cases. To our knowledge this is the largest study using algorithms for the prediction of patients' phenotype and tetrahydrobiopterin responsiveness in phenylketonuria patients, using data from the locus-specific and genotypes database.
3DProIN: Protein-Protein Interaction Networks and Structure Visualization.

Science.gov (United States)

Li, Hui; Liu, Chunmei

2014-06-14

3DProIN is a computational tool to visualize protein-protein interaction networks in both two dimensional (2D) and three dimensional (3D) view. It models protein-protein interactions in a graph and explores the biologically relevant features of the tertiary structures of each protein in the network. Properties such as color, shape and name of each node (protein) of the network can be edited in either 2D or 3D views. 3DProIN is implemented using 3D Java and C programming languages. The internet crawl technique is also used to parse dynamically grasped protein interactions from protein data bank (PDB). It is a java applet component that is embedded in the web page and it can be used on different platforms including Linux, Mac and Window using web browsers such as Firefox, Internet Explorer, Chrome and Safari. It also was converted into a mac app and submitted to the App store as a free app. Mac users can also download the app from our website. 3DProIN is available for academic research at http://bicompute.appspot.com.
Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database

KAUST Repository

Komatsu, Setsuko

2017-05-10

The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max ‘Enrei’). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. Biological significanceThe Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all
Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

Science.gov (United States)

Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

2017-06-23

The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from
Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database

KAUST Repository

Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

2017-01-01

The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max ‘Enrei’). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. Biological significanceThe Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all
O-GLYCOBASE version 4.0: a revised database of O-glycosylated proteins

DEFF Research Database (Denmark)

Gupta, Ramneek; Birch, Hanne; Rapacki, Krzysztof

1999-01-01

O-GLYCBASE is a database of glycoproteins with O-linked glycosylation sites. Entries with at least one experimentally verified O-glycosylation site have been complied from protein sequence databases and literature. Each entry contains information about the glycan involved, the species, sequence, ...
Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

Science.gov (United States)

Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

2009-04-21

To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease
Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

Directory of Open Access Journals (Sweden)

Mixon Mark

2009-04-01

Full Text Available Abstract Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene
Protein (Viridiplantae) - PGDBj - Ortholog DB | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available ase Description Download License Update History of This Database Site Policy | Contact Us Protein (Viridiplantae) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Protein (Viridiplantae) Data detail Data name Protein (Viridiplantae) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
UNcleProt (Universal Nuclear Protein database of barley): The first nuclear protein database that distinguishes proteins from different phases of the cell cycle

Czech Academy of Sciences Publication Activity Database

Blavet, Nicolas; Uřinovská, J.; Jeřábková, Hana; Chamrád, I.; Vrána, Jan; Lenobel, R.; Beinhauer, D.; Šebela, M.; Doležel, Jaroslav; Petrovská, Beáta

2017-01-01

Roč. 8, č. 1 (2017), s. 70-80 ISSN 1949-1034 R&D Projects: GA ČR(CZ) GA14-28443S; GA MŠk(CZ) LO1204 Institutional support: RVO:61389030 Keywords : cicer-arietinum l. * rice oryza-sativa * chromatin-associated protein s * proteomic analysis * mitotic chromosomes * dehydration * localization * chickpea * network * phosphoproteome * barley * cell cycle * database * flow-cytometry * localization * mass spectrometry * nuclear proteome * nucleus Subject RIV: CE - Biochemistry OBOR OECD: Cell biology Impact factor: 2.387, year: 2016
An Interoperable Cartographic Database

OpenAIRE

Slobodanka Ključanin; Zdravko Galić

2007-01-01

The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on t...
CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics

OpenAIRE

Verma, Mohit; Kumar, Vinay; Patel, Ravi K.; Garg, Rohini; Jain, Mukesh

2015-01-01

Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB), which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database fea...
Quality control in diagnostic radiology: software (Visual Basic 6) and database applications

International Nuclear Information System (INIS)

Md Saion Salikin; Muhammad Farid Abdul Khalid

2002-01-01

Quality Assurance programme in diagnostic Radiology is being implemented by the Ministry of Health (MoH) in Malaysia. Under this program the performance of an x-ray machine used for diagnostic purpose is tested by using the approved procedure which is commonly known as Quality Control in diagnostic radiology. The quality control or performance tests are carried out b a class H licence holder issued the Atomic Energy Licensing Act 1984. There are a few computer applications (software) that are available in the market which can be used for this purpose. A computer application (software) using Visual Basics 6 and Microsoft Access, is being developed to expedite data handling, analysis and storage as well as report writing of the quality control tests. In this paper important features of the software for quality control tests are explained in brief. A simple database is being established for this purpose which is linked to the software. Problems encountered in the preparation of database are discussed in this paper. A few examples of practical usage of the software and database applications are presented in brief. (Author)
An Interoperable Cartographic Database

Directory of Open Access Journals (Sweden)

Slobodanka Ključanin

2007-05-01

Full Text Available The concept of producing a prototype of interoperable cartographic database is explored in this paper, including the possibilities of integration of different geospatial data into the database management system and their visualization on the Internet. The implementation includes vectorization of the concept of a single map page, creation of the cartographic database in an object-relation database, spatial analysis, definition and visualization of the database content in the form of a map on the Internet.
Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual Modalities

DEFF Research Database (Denmark)

Haque, Mohammad Ahsanul; Nasrollahi, Kamal; Moeslund, Thomas B.

2018-01-01

, exploiting both spatial and temporal information of the face to assess pain level, and second, incorporating multiple visual modalities to capture complementary face information related to pain. Most works in the literature focus on merely exploiting spatial information on chromatic (RGB) video data......PAIN)' database, for RGBDT pain level recognition in sequences. We provide a first baseline results including 5 pain levels recognition by analyzing independent visual modalities and their fusion with CNN and LSTM models. From the experimental evaluation we observe that fusion of modalities helps to enhance...... recognition performance of pain levels in comparison to isolated ones. In particular, the combination of RGB, D, and T in an early fusion fashion achieved the best recognition rate....
BtoxDB: a comprehensive database of protein structural data on toxin-antitoxin systems.

Science.gov (United States)

Barbosa, Luiz Carlos Bertucci; Garrido, Saulo Santesso; Marchetto, Reinaldo

2015-03-01

Toxin-antitoxin (TA) systems are diverse and abundant genetic modules in prokaryotic cells that are typically formed by two genes encoding a stable toxin and a labile antitoxin. Because TA systems are able to repress growth or kill cells and are considered to be important actors in cell persistence (multidrug resistance without genetic change), these modules are considered potential targets for alternative drug design. In this scenario, structural information for the proteins in these systems is highly valuable. In this report, we describe the development of a web-based system, named BtoxDB, that stores all protein structural data on TA systems. The BtoxDB database was implemented as a MySQL relational database using PHP scripting language. Web interfaces were developed using HTML, CSS and JavaScript. The data were collected from the PDB, UniProt and Entrez databases. These data were appropriately filtered using specialized literature and our previous knowledge about toxin-antitoxin systems. The database provides three modules ("Search", "Browse" and "Statistics") that enable searches, acquisition of contents and access to statistical data. Direct links to matching external databases are also available. The compilation of all protein structural data on TA systems in one platform is highly useful for researchers interested in this content. BtoxDB is publicly available at http://www.gurupi.uft.edu.br/btoxdb. Copyright © 2015 Elsevier Ltd. All rights reserved.
Integration and visualization of non-coding RNA and protein interaction networks

OpenAIRE

Junge, Alexander; Refsgaard, Jan Christian; Garde, Christian; Pan, Xiaoyong; Santos Delgado, Alberto; Anthon, Christian; Alkan, Ferhat; von Mering, Christian; Workman, Christopher; Jensen, Lars Juhl; Gorodkin, Jan

2015-01-01

Non-coding RNAs (ncRNAs) fulfill a diverse set of biological functions relying on interactions with other molecular entities. The advent of new experimental and computational approaches makes it possible to study ncRNAs and their associations on an unprecedented scale. We present RAIN (RNA Association and Interaction Networks) - a database that combines ncRNA-ncRNA, ncRNA-mRNA and ncRNA-protein interactions with large-scale protein association networks available in the STRING database. By int...
MannDB – A microbial database of automated protein sequence analyses and evidence integration for protein characterization

Directory of Open Access Journals (Sweden)

Kuczmarski Thomas A

2006-10-01

Full Text Available Abstract Background MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. Description MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-source tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. Conclusion MannDB comprises a large number of genomes and comprehensive protein

CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation.

Science.gov (United States)

Thangakani, A Mary; Nagarajan, R; Kumar, Sandeep; Sakthivel, R; Velmurugan, D; Gromiha, M Michael

2016-01-01

Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed.
ChemProt-2.0: visual navigation in a disease chemical biology database

DEFF Research Database (Denmark)

Kjærulff, Sonny Kim; Wich, Louis; Kringelum, Jens Vindahl

2013-01-01

ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical-protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to > 1.15 million compounds with 5.32 millions bioactivity measu...
THPdb: Database of FDA-approved peptide and protein therapeutics.

Directory of Open Access Journals (Sweden)

Salman Sadullah Usmani

Full Text Available THPdb (http://crdd.osdd.net/raghava/thpdb/ is a manually curated repository of Food and Drug Administration (FDA approved therapeutic peptides and proteins. The information in THPdb has been compiled from 985 research publications, 70 patents and other resources like DrugBank. The current version of the database holds a total of 852 entries, providing comprehensive information on 239 US-FDA approved therapeutic peptides and proteins and their 380 drug variants. The information on each peptide and protein includes their sequences, chemical properties, composition, disease area, mode of activity, physical appearance, category or pharmacological class, pharmacodynamics, route of administration, toxicity, target of activity, etc. In addition, we have annotated the structure of most of the protein and peptides. A number of user-friendly tools have been integrated to facilitate easy browsing and data analysis. To assist scientific community, a web interface and mobile App have also been developed.
Visualization and targeted disruption of protein interactions in living cells

Science.gov (United States)

Herce, Henry D.; Deng, Wen; Helma, Jonas; Leonhardt, Heinrich; Cardoso, M. Cristina

2013-01-01

Protein–protein interactions are the basis of all processes in living cells, but most studies of these interactions rely on biochemical in vitro assays. Here we present a simple and versatile fluorescent-three-hybrid (F3H) strategy to visualize and target protein–protein interactions. A high-affinity nanobody anchors a GFP-fusion protein of interest at a defined cellular structure and the enrichment of red-labelled interacting proteins is measured at these sites. With this approach, we visualize the p53–HDM2 interaction in living cells and directly monitor the disruption of this interaction by Nutlin 3, a drug developed to boost p53 activity in cancer therapy. We further use this approach to develop a cell-permeable vector that releases a highly specific peptide disrupting the p53 and HDM2 interaction. The availability of multiple anchor sites and the simple optical readout of this nanobody-based capture assay enable systematic and versatile analyses of protein–protein interactions in practically any cell type and species. PMID:24154492
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

Directory of Open Access Journals (Sweden)

Bányai László

2008-08-01

Full Text Available Abstract Background Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii co-occurrence of extracellular and nuclear domains; (iv violation of domain integrity; (v chimeras encoded by two or more genes located on different chromosomes. Results Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis and two protostome species (Caenorhabditis elegans and Drosophila melanogaster have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON
The drug-minded protein interaction database (DrumPID) for efficient target analysis and drug development.

Science.gov (United States)

Kunz, Meik; Liang, Chunguang; Nilla, Santosh; Cecil, Alexander; Dandekar, Thomas

2016-01-01

The drug-minded protein interaction database (DrumPID) has been designed to provide fast, tailored information on drugs and their protein networks including indications, protein targets and side-targets. Starting queries include compound, target and protein interactions and organism-specific protein families. Furthermore, drug name, chemical structures and their SMILES notation, affected proteins (potential drug targets), organisms as well as diseases can be queried including various combinations and refinement of searches. Drugs and protein interactions are analyzed in detail with reference to protein structures and catalytic domains, related compound structures as well as potential targets in other organisms. DrumPID considers drug functionality, compound similarity, target structure, interactome analysis and organismic range for a compound, useful for drug development, predicting drug side-effects and structure-activity relationships.Database URL:http://drumpid.bioapps.biozentrum.uni-wuerzburg.de. © The Author(s) 2016. Published by Oxford University Press.
Visualizing the principal component of 1H,15N-HSQC NMR spectral changes that reflect protein structural or functional properties: application to troponin C

International Nuclear Information System (INIS)

Robertson, Ian M.; Boyko, Robert F.; Sykes, Brian D.

2011-01-01

Laboratories often repeatedly determine the structure of a given protein under a variety of conditions, mutations, modifications, or in a number of states. This approach can be cumbersome and tedious. Given then a database of structures, identifiers, and corresponding 1 H, 15 N-HSQC NMR spectra for homologous proteins, we investigated whether structural information could be ascertained for a new homolog solely from its 1 H, 15 N-HSQC NMR spectrum. We addressed this question with two different approaches. First, we used a semi-automated approach with the program, ORBplus. ORBplus looks for patterns in the chemical shifts and correlates these commonalities to the explicit property of interest. ORBplus ranks resonances based on consistency of the magnitude and direction of the chemical shifts within the database, and the chemical shift correlation of the unknown protein with the database. ORBplus visualizes the results by a histogram and a vector diagram, and provides residue specific predictions on structural similarities with the database. The second method we used was partial least squares (PLS), which is a multivariate statistical technique used to correlate response and predictor variables. We investigated the ability of these methods to predict the tertiary structure of the contractile regulatory protein troponin C. Troponin C undergoes a closed-to-open conformational change, which is coupled to its function in muscle. We found that both ORBplus and PLS were able to identify patterns in the 1 H, 15 N-HSQC NMR data from different states of troponin C that correlated to its conformation.
Database Description - RPSD | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available base Description General information of database Database name RPSD Alternative nam...e Rice Protein Structure Database DOI 10.18908/lsdba.nbdc00749-000 Creator Creator Name: Toshimasa Yamazaki ... Ibaraki 305-8602, Japan National Institute of Agrobiological Sciences Toshimasa Yamazaki E-mail : Databas...e classification Structure Databases - Protein structure Organism Taxonomy Name: Or...or name(s): Journal: External Links: Original website information Database maintenance site National Institu
[Better performance of Western blotting: quick vs slow protein transfer, blotting membranes and the visualization methods].

Science.gov (United States)

Kong, Ling-Quan; Pu, Ying-Hui; Ma, Shi-Kun

2008-01-01

To study how the choices of the quick vs slow protein transfer, the blotting membranes and the visualization methods influence the performance of Western blotting. The cellular proteins were abstracted from human breast cell line MDA-MB-231 for analysis with Western blotting using quick (2 h) and slow (overnight) protein transfer, different blotting membranes (nitrocellulose, PVDF and nylon membranes) and different visualization methods (ECL and DAB). In Western blotting with slow and quick protein transfer, the prestained marker presented more distinct bands on nitrocellulose membrane than on the nylon and PVDF membranes, and the latter also showed clear bands on the back of the membrane to very likely cause confusion, which did not occur with nitrocellulose membrane. PVDF membrane allowed slightly clearer visualization of the proteins with DAB method as compared with nitrocellulose and nylon membranes, and on the latter two membranes, quick protein transfer was likely to result in somehow irregular bands in comparison with slow protein transfer. With slow protein transfer and chemiluminescence for visualization, all the 3 membranes showed clear background, while with quick protein transfer, nylon membrane gave rise to obvious background noise but the other two membranes did not. Different membranes should be selected for immunoblotting according to the actual needs of the experiment. Slow transfer of the proteins onto the membranes often has better effect than quick transfer, and enhanced chemiluminescence is superior to DAB for protein visualization and allows highly specific and sensitive analysis of the protein expressions.
Database Description - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Trypanosomes Database Database Description General information of database Database name Trypanosomes Database...stitute of Genetics Research Organization of Information and Systems Yata 1111, Mishima, Shizuoka 411-8540, JAPAN E mail: Database...y Name: Trypanosoma Taxonomy ID: 5690 Taxonomy Name: Homo sapiens Taxonomy ID: 9606 Database description The... Article title: Author name(s): Journal: External Links: Original website information Database maintenance s...DB (Protein Data Bank) KEGG PATHWAY Database DrugPort Entry list Available Query search Available Web servic
Phi-square Lexical Competition Database (Phi-Lex): an online tool for quantifying auditory and visual lexical competition.

Science.gov (United States)

Strand, Julia F

2014-03-01

A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .
dbPAF: an integrative database of protein phosphorylation in animals and fungi.

Science.gov (United States)

Ullah, Shahid; Lin, Shaofeng; Xu, Yang; Deng, Wankun; Ma, Lili; Zhang, Ying; Liu, Zexian; Xue, Yu

2016-03-24

Protein phosphorylation is one of the most important post-translational modifications (PTMs) and regulates a broad spectrum of biological processes. Recent progresses in phosphoproteomic identifications have generated a flood of phosphorylation sites, while the integration of these sites is an urgent need. In this work, we developed a curated database of dbPAF, containing known phosphorylation sites in H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe and S. cerevisiae. From the scientific literature and public databases, we totally collected and integrated 54,148 phosphoproteins with 483,001 phosphorylation sites. Multiple options were provided for accessing the data, while original references and other annotations were also present for each phosphoprotein. Based on the new data set, we computationally detected significantly over-represented sequence motifs around phosphorylation sites, predicted potential kinases that are responsible for the modification of collected phospho-sites, and evolutionarily analyzed phosphorylation conservation states across different species. Besides to be largely consistent with previous reports, our results also proposed new features of phospho-regulation. Taken together, our database can be useful for further analyses of protein phosphorylation in human and other model organisms. The dbPAF database was implemented in PHP + MySQL and freely available at http://dbpaf.biocuckoo.org.
Two-dimensional gel human protein databases offer a systematic approach to the study of cell proliferation and differentiation

DEFF Research Database (Denmark)

Celis, julio E.; Gesser, Borbala; Dejgaard, Kurt

1989-01-01

Human cellular protein databases have been established using computer-analyzed 2D gel electrophoresis. These databases, which include information on various properties of proteins, offer a global approach to the study of regulation of cell proliferation and differentiation. Furthermore, thanks...
Two dimensional gel human protein databases offer a systematic approach to the study of cell proliferation and differentiation

DEFF Research Database (Denmark)

Celis, J E; Gesser, B; Dejgaard, K

1989-01-01

Human cellular protein databases have been established using computer-analyzed 2D gel electrophoresis. These databases, which include information on various properties of proteins, offer a global approach to the study of regulation of cell proliferation and differentiation. Furthermore, thanks to...
Reef-coral proteins as visual, non-destructive reporters for plant transformation.

Science.gov (United States)

Wenck, A; Pugieux, C; Turner, M; Dunn, M; Stacy, C; Tiozzo, A; Dunder, E; van Grinsven, E; Khan, R; Sigareva, M; Wang, W C; Reed, J; Drayton, P; Oliver, D; Trafford, H; Legris, G; Rushton, H; Tayab, S; Launis, K; Chang, Y-F; Chen, D-F; Melchers, L

2003-11-01

Recently, five novel fluorescent proteins have been isolated from non-bioluminescent species of reef-coral organisms and have been made available through ClonTech. They are AmCyan, AsRed, DsRed, ZsGreen and ZsYellow. These proteins are valuable as reporters for transformation because they do not require a substrate or external co-factor to emit fluorescence and can be tested in vivo without destruction of the tissue under study. We have evaluated them in a large range of plants, both monocots and dicots, and our results indicate that they are valuable reporting tools for transformation in a wide variety of crops. We report here their successful expression in wheat, maize, barley, rice, banana, onion, soybean, cotton, tobacco, potato and tomato. Transient expression could be observed as early as 24 h after DNA delivery in some cases, allowing for very clear visualization of individually transformed cells. Stable transgenic events were generated, using mannose, kanamycin or hygromycin selection. Transgenic plants were phenotypically normal, showing a wide range of fluorescence levels, and were fertile. Expression of AmCyan, ZsGreen and AsRed was visible in maize T1 seeds, allowing visual segregation to more than 99% accuracy. The excitation and emission wavelengths of some of these proteins are significantly different; the difference is enough for the simultaneous visualization of cells transformed with more than one of the fluorescent proteins. These proteins will become useful tools for transformation optimization and other studies. The wide variety of plants successfully tested demonstrates that these proteins will potentially find broad use in plant biology.
ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome

Directory of Open Access Journals (Sweden)

Zhang Zhenhai

2010-10-01

Full Text Available Abstract Background Maize (Zea mays ssp. mays L. is an important model for plant basic and applied research. In 2009, the B73 maize genome sequencing made a great step forward, using clone by clone strategy; however, functional annotation and gene classification of the maize genome are still limited. Thus, a well-annotated datasets and informative database will be important for further research discoveries. Signal transduction is a fundamental biological process in living cells, and many protein families participate in this process in sensing, amplifying and responding to various extracellular or internal stimuli. Therefore, it is a good starting point to integrate information on the maize functional genes involved in signal transduction. Results Here we introduce a comprehensive database 'ProFITS' (Protein Families Involved in the Transduction of Signalling, which endeavours to identify and classify protein kinases/phosphatases, transcription factors and ubiquitin-proteasome-system related genes in the B73 maize genome. Users can explore gene models, corresponding transcripts and FLcDNAs using the three abovementioned protein hierarchical categories, and visualize them using an AJAX-based genome browser (JBrowse or Generic Genome Browser (GBrowse. Functional annotations such as GO annotation, protein signatures, protein best-hits in the Arabidopsis and rice genome are provided. In addition, pre-calculated transcription factor binding sites of each gene are generated and mutant information is incorporated into ProFITS. In short, ProFITS provides a user-friendly web interface for studies in signal transduction process in maize. Conclusion ProFITS, which utilizes both the B73 maize genome and full length cDNA (FLcDNA datasets, provides users a comprehensive platform of maize annotation with specific focus on the categorization of families involved in the signal transduction process. ProFITS is designed as a user-friendly web interface and it is
YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research.

Science.gov (United States)

Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R

2015-02-01

We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Ebolavirus Database: Gene and Protein Information Resource for Ebolaviruses

Directory of Open Access Journals (Sweden)

Rayapadi G. Swetha

2016-01-01

Full Text Available Ebola Virus Disease (EVD is a life-threatening haemorrhagic fever in humans. Even though there are many reports on EVD, the protein precursor functions and virulent factors of ebolaviruses remain poorly understood. Comparative analyses of Ebolavirus genomes will help in the identification of these important features. This prompted us to develop the Ebolavirus Database (EDB and we have provided links to various tools that will aid researchers to locate important regions in both the genomes and proteomes of Ebolavirus. The genomic analyses of ebolaviruses will provide important clues for locating the essential and core functional genes. The aim of EDB is to act as an integrated resource for ebolaviruses and we strongly believe that the database will be a useful tool for clinicians, microbiologists, health care workers, and bioscience researchers.
HMMEditor: a visual editing tool for profile hidden Markov model

Directory of Open Access Journals (Sweden)

Cheng Jianlin

2008-03-01

Full Text Available Abstract Background Profile Hidden Markov Model (HMM is a powerful statistical model to represent a family of DNA, RNA, and protein sequences. Profile HMM has been widely used in bioinformatics research such as sequence alignment, gene structure prediction, motif identification, protein structure prediction, and biological database search. However, few comprehensive, visual editing tools for profile HMM are publicly available. Results We develop a visual editor for profile Hidden Markov Models (HMMEditor. HMMEditor can visualize the profile HMM architecture, transition probabilities, and emission probabilities. Moreover, it provides functions to edit and save HMM and parameters. Furthermore, HMMEditor allows users to align a sequence against the profile HMM and to visualize the corresponding Viterbi path. Conclusion HMMEditor provides a set of unique functions to visualize and edit a profile HMM. It is a useful tool for biological sequence analysis and modeling. Both HMMEditor software and web service are freely available.
Database Description - eSOL | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available base Description General information of database Database name eSOL Alternative nam...eator Affiliation: The Research and Development of Biological Databases Project, National Institute of Genet...nology 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8501 Japan Email: Tel.: +81-45-924-5785 Database... classification Protein sequence databases - Protein properties Organism Taxonomy Name: Escherichia coli Taxonomy ID: 562 Database...i U S A. 2009 Mar 17;106(11):4201-6. External Links: Original website information Database maintenance site

Neutron cross-sections database for amino acids and proteins analysis

Energy Technology Data Exchange (ETDEWEB)

Voi, Dante L.; Ferreira, Francisco de O.; Nunes, Rogerio Chaffin, E-mail: dante@ien.gov.br, E-mail: fferreira@ien.gov.br, E-mail: Chaffin@ien.gov.br [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Rocha, Helio F. da, E-mail: hrocha@gbl.com.br [Universidade Federal do Rio de Janeiro (IPPMG/UFRJ), Rio de Janeiro, RJ (Brazil). Instituto de Pediatria

2015-07-01

Biological materials may be studied using neutrons as an unconventional tool of analysis. Dynamics and structures data can be obtained for amino acids, protein and others cellular components by neutron cross sections determinations especially for applications in nuclear purity and conformation analysis. The instrument used for this is the crystal spectrometer of the Instituto de Engenharia Nuclear (IEN-CNEN-RJ), the only one in Latin America that uses neutrons for this type of analyzes and it is installed in one of the reactor Argonauta irradiation channels. The experimentally values obtained are compared with calculated values using literature data with a rigorous analysis of the chemical composition, conformation and molecular structure analysis of the materials. A neutron cross-section database was constructed to assist in determining molecular dynamic, structure and formulae of biological materials. The database contains neutron cross-sections values of all amino acids, chemical elements, molecular groups, auxiliary radicals, as well as values of constants and parameters necessary for the analysis. An unprecedented analytical procedure was developed using the neutron cross section parceling and grouping method for data manipulation. This database is a result of measurements obtained from twenty amino acids that were provided by different manufactories and are used in oral administration in hospital individuals for nutritional applications. It was also constructed a small data file of compounds with different molecular groups including carbon, nitrogen, sulfur and oxygen, all linked to hydrogen atoms. A review of global and national scene in the acquisition of neutron cross sections data, the formation of libraries and the application of neutrons for analyzing biological materials is presented. This database has further application in protein analysis and the neutron cross-section from the insulin was estimated. (author)
Neutron cross-sections database for amino acids and proteins analysis

International Nuclear Information System (INIS)

Voi, Dante L.; Ferreira, Francisco de O.; Nunes, Rogerio Chaffin; Rocha, Helio F. da

2015-01-01

Biological materials may be studied using neutrons as an unconventional tool of analysis. Dynamics and structures data can be obtained for amino acids, protein and others cellular components by neutron cross sections determinations especially for applications in nuclear purity and conformation analysis. The instrument used for this is the crystal spectrometer of the Instituto de Engenharia Nuclear (IEN-CNEN-RJ), the only one in Latin America that uses neutrons for this type of analyzes and it is installed in one of the reactor Argonauta irradiation channels. The experimentally values obtained are compared with calculated values using literature data with a rigorous analysis of the chemical composition, conformation and molecular structure analysis of the materials. A neutron cross-section database was constructed to assist in determining molecular dynamic, structure and formulae of biological materials. The database contains neutron cross-sections values of all amino acids, chemical elements, molecular groups, auxiliary radicals, as well as values of constants and parameters necessary for the analysis. An unprecedented analytical procedure was developed using the neutron cross section parceling and grouping method for data manipulation. This database is a result of measurements obtained from twenty amino acids that were provided by different manufactories and are used in oral administration in hospital individuals for nutritional applications. It was also constructed a small data file of compounds with different molecular groups including carbon, nitrogen, sulfur and oxygen, all linked to hydrogen atoms. A review of global and national scene in the acquisition of neutron cross sections data, the formation of libraries and the application of neutrons for analyzing biological materials is presented. This database has further application in protein analysis and the neutron cross-section from the insulin was estimated. (author)
Protein structure determination by exhaustive search of Protein Data Bank derived databases.

Science.gov (United States)

Stokes-Rees, Ian; Sliz, Piotr

2010-12-14

Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
ASAView: Database and tool for solvent accessibility representation in proteins

Directory of Open Access Journals (Sweden)

Fawareh Hamed

2004-05-01

Full Text Available Abstract Background Accessible surface area (ASA or solvent accessibility of amino acids in a protein has important implications. Knowledge of surface residues helps in locating potential candidates of active sites. Therefore, a method to quickly see the surface residues in a two dimensional model would help to immediately understand the population of amino acid residues on the surface and in the inner core of the proteins. Results ASAView is an algorithm, an application and a database of schematic representations of solvent accessibility of amino acid residues within proteins. A characteristic two-dimensional spiral plot of solvent accessibility provides a convenient graphical view of residues in terms of their exposed surface areas. In addition, sequential plots in the form of bar charts are also provided. Online plots of the proteins included in the entire Protein Data Bank (PDB, are provided for the entire protein as well as their chains separately. Conclusions These graphical plots of solvent accessibility are likely to provide a quick view of the overall topological distribution of residues in proteins. Chain-wise computation of solvent accessibility is also provided.
Sex pheromone receptor proteins. Visualization using a radiolabeled photoaffinity analog

International Nuclear Information System (INIS)

Vogt, R.G.; Prestwich, G.D.; Riddiford, L.M.

1988-01-01

A tritium-labeled photoaffinity analog of a moth pheromone was used to covalently modify pheromone-selective binding proteins in the antennal sensillum lymph and sensory dendritic membranes of the male silk moth, Antheraea polyphemus. This analog, (E,Z)-6,11-[ 3 H]hexadecadienyl diazoacetate, allowed visualization of a 15-kilodalton soluble protein and a 69-kilodalton membrane protein in fluorescence autoradiograms of electrophoretically separated antennal proteins. Covalent modification of these proteins was specifically reduced when incubation and UV irradiation were conducted in the presence of excess unlabeled pheromone, (E,Z)-6,11-hexadecadienyl acetate. These experiments constitute the first direct evidence for a membrane protein of a chemosensory neuron interacting in a specific fashion with a biologically relevant odorant
Protein backbone angle restraints from searching a database for chemical shift and sequence homology

Energy Technology Data Exchange (ETDEWEB)

Cornilescu, Gabriel; Delaglio, Frank; Bax, Ad [National Institutes of Health, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases (United States)

1999-03-15

Chemical shifts of backbone atoms in proteins are exquisitely sensitive to local conformation, and homologous proteins show quite similar patterns of secondary chemical shifts. The inverse of this relation is used to search a database for triplets of adjacent residues with secondary chemical shifts and sequence similarity which provide the best match to the query triplet of interest. The database contains 13C{alpha}, 13C{beta}, 13C', 1H{alpha} and 15N chemical shifts for 20 proteins for which a high resolution X-ray structure is available. The computer program TALOS was developed to search this database for strings of residues with chemical shift and residue type homology. The relative importance of the weighting factors attached to the secondary chemical shifts of the five types of resonances relative to that of sequence similarity was optimized empirically. TALOS yields the 10 triplets which have the closest similarity in secondary chemical shift and amino acid sequence to those of the query sequence. If the central residues in these 10 triplets exhibit similar {phi} and {psi} backbone angles, their averages can reliably be used as angular restraints for the protein whose structure is being studied. Tests carried out for proteins of known structure indicate that the root-mean-square difference (rmsd) between the output of TALOS and the X-ray derived backbone angles is about 15 deg. Approximately 3% of the predictions made by TALOS are found to be in error.
DSFL database: A hub of target proteins of Leishmania sp. to combat leishmaniasis

Directory of Open Access Journals (Sweden)

Ameer Khusro

2017-07-01

Full Text Available Leishmaniasis is a vector-borne chronic infectious tropical dermal disease caused by the protozoa parasite of the genus Leishmania that causes high mortality globally. Among three different clinical forms of leishmaniasis, visceral leishmaniasis (VL or kala-azar is a systemic public health disease with high morbidity and mortality in developing countries, caused by Leishmania donovani, Leishmania infantum or Leishmania chagasi. Unfortunately, there is no vaccine available till date for the treatment of leishmaniasis. On the other hand, the therapeutics approved to treat this fatal disease is expensive, toxic, and associated with serious side effects. Furthermore, the emergence of drug-resistant Leishmania parasites in most endemic countries due to the incessant utilization of existing drugs is a major concern at present. Drug Search for Leishmaniasis (DSFL is a unique database that involves 50 crystallized target proteins of varied Leishmania sp. in order to develop new drugs in future by interacting several antiparasitic compounds or molecules with specific protein through computational tools. The structure of target protein from different Leishmania sp. is available in this database. In this review, we spotlighted not only the current global status of leishmaniasis in brief but also detailed information about target proteins of various Leishmania sp. available in DSFL. DSFL has created a new expectation for mankind in order to combat leishmaniasis by targeting parasitic proteins and commence a new era to get rid of drug resistance parasites. The database will substantiate to be a worthwhile project for further development of new, non-toxic, and cost-effective antileishmanial drugs as targeted therapies using in vitro/in vivo assays.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases

Directory of Open Access Journals (Sweden)

Ma'ayan Avi

2007-10-01

Full Text Available Abstract Background In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP, generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Results Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Conclusion Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Directory of Open Access Journals (Sweden)

Adeel Malik

2010-01-01

Full Text Available Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins as well as noncovalently (protein-carbohydrate complexes is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB, (ii Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.
UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

Science.gov (United States)

Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

2016-01-04

The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Conformational dynamics data bank: a database for conformational dynamics of proteins and supramolecular protein assemblies.

Science.gov (United States)

Kim, Do-Nyun; Altschuler, Josiah; Strong, Campbell; McGill, Gaël; Bathe, Mark

2011-01-01

The conformational dynamics data bank (CDDB, http://www.cdyn.org) is a database that aims to provide comprehensive results on the conformational dynamics of high molecular weight proteins and protein assemblies. Analysis is performed using a recently introduced coarse-grained computational approach that is applied to the majority of structures present in the electron microscopy data bank (EMDB). Results include equilibrium thermal fluctuations and elastic strain energy distributions that identify rigid versus flexible protein domains generally, as well as those associated with specific functional transitions, and correlations in molecular motions that identify molecular regions that are highly coupled dynamically, with implications for allosteric mechanisms. A practical web-based search interface enables users to easily collect conformational dynamics data in various formats. The data bank is maintained and updated automatically to include conformational dynamics results for new structural entries as they become available in the EMDB. The CDDB complements static structural information to facilitate the investigation and interpretation of the biological function of proteins and protein assemblies essential to cell function.
Searching the protein structure database for ligand-binding site similarities using CPASS v.2

Directory of Open Access Journals (Sweden)

Caprez Adam

2011-01-01

Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores
Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.

Science.gov (United States)

Hiscock, D; Upton, C

2000-05-01

The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .
AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins.

Science.gov (United States)

Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert

2010-06-01

Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further
Probabilistic Graph Layout for Uncertain Network Visualization.

Science.gov (United States)

Schulz, Christoph; Nocaj, Arlind; Goertler, Jochen; Deussen, Oliver; Brandes, Ulrik; Weiskopf, Daniel

2017-01-01

We present a novel uncertain network visualization technique based on node-link diagrams. Nodes expand spatially in our probabilistic graph layout, depending on the underlying probability distributions of edges. The visualization is created by computing a two-dimensional graph embedding that combines samples from the probabilistic graph. A Monte Carlo process is used to decompose a probabilistic graph into its possible instances and to continue with our graph layout technique. Splatting and edge bundling are used to visualize point clouds and network topology. The results provide insights into probability distributions for the entire network-not only for individual nodes and edges. We validate our approach using three data sets that represent a wide range of network types: synthetic data, protein-protein interactions from the STRING database, and travel times extracted from Google Maps. Our approach reveals general limitations of the force-directed layout and allows the user to recognize that some nodes of the graph are at a specific position just by chance.
ZifBASE: a database of zinc finger proteins and associated resources

Directory of Open Access Journals (Sweden)

Punetha Ankita

2009-09-01

databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. Conclusion A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.
cisPath: an R/Bioconductor package for cloud users for visualization and management of functional protein interaction networks.

Science.gov (United States)

Wang, Likun; Yang, Luhe; Peng, Zuohan; Lu, Dan; Jin, Yan; McNutt, Michael; Yin, Yuxin

2015-01-01

With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This paper presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks. With this R package, users can easily integrate downloaded protein-protein interaction information from different online databases with private data to construct new and personalized interaction networks. Additional functions allow users to generate specific networks based on private databases. Since the results produced with the use of this package are in the form of web pages, cloud users can easily view and edit the network graphs via the browser, using a mouse or touch screen, without the need to download them to a local computer. This package can also be installed and run on a local desktop computer. Depending on user preference, results can be publicized or shared by uploading to a web server or cloud driver, allowing other users to directly access results via a web browser. This package can be installed and run on a variety of platforms. Since all network views are shown in web pages, such package is particularly useful for cloud users. The easy installation and operation is an attractive quality for R beginners and users with no previous experience with cloud services.
IIS--Integrated Interactome System: a web-based platform for the annotation, analysis and visualization of protein-metabolite-gene-drug interactions by integrating a variety of data sources and tools.

Science.gov (United States)

Carazzolle, Marcelo Falsarella; de Carvalho, Lucas Miguel; Slepicka, Hugo Henrique; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Kobarg, Jörg; Meirelles, Gabriela Vaz

2014-01-01

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two
AllergenOnline: A peer-reviewed, curated allergen database to assess novel food proteins for potential cross-reactivity.

Science.gov (United States)

Goodman, Richard E; Ebisawa, Motohiro; Ferreira, Fatima; Sampson, Hugh A; van Ree, Ronald; Vieths, Stefan; Baumert, Joseph L; Bohle, Barbara; Lalithambika, Sreedevi; Wise, John; Taylor, Steve L

2016-05-01

Increasingly regulators are demanding evaluation of potential allergenicity of foods prior to marketing. Primary risks are the transfer of allergens or potentially cross-reactive proteins into new foods. AllergenOnline was developed in 2005 as a peer-reviewed bioinformatics platform to evaluate risks of new dietary proteins in genetically modified organisms (GMO) and novel foods. The process used to identify suspected allergens and evaluate the evidence of allergenicity was refined between 2010 and 2015. Candidate proteins are identified from the NCBI database using keyword searches, the WHO/IUIS nomenclature database and peer reviewed publications. Criteria to classify proteins as allergens are described. Characteristics of the protein, the source and human subjects, test methods and results are evaluated by our expert panel and archived. Food, inhalant, salivary, venom, and contact allergens are included. Users access allergen sequences through links to the NCBI database and relevant references are listed online. Version 16 includes 1956 sequences from 778 taxonomic-protein groups that are accepted with evidence of allergic serum IgE-binding and/or biological activity. AllergenOnline provides a useful peer-reviewed tool for identifying the primary potential risks of allergy for GMOs and novel foods based on criteria described by the Codex Alimentarius Commission (2003). © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

Science.gov (United States)

López, Yosvany; Nakai, Kenta; Patil, Ashwini

2015-01-01

HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.

Incremental Visualizer for Visible Objects

DEFF Research Database (Denmark)

Bukauskas, Linas; Bøhlen, Michael Hanspeter

This paper discusses the integration of database back-end and visualizer front-end into a one tightly coupled system. The main aim which we achieve is to reduce the data pipeline from database to visualization by using incremental data extraction of visible objects in a fly-through scenarios. We...... also argue that passing only relevant data from the database will substantially reduce the overall load of the visualization system. We propose the system Incremental Visualizer for Visible Objects (IVVO) which considers visible objects and enables incremental visualization along the observer movement...... path. IVVO is the novel solution which allows data to be visualized and loaded on the fly from the database and which regards visibilities of objects. We run a set of experiments to convince that IVVO is feasible in terms of I/O operations and CPU load. We consider the example of data which uses...
The human interactome knowledge base (hint-kb): An integrative human protein interaction database enriched with predicted protein–protein interaction scores using a novel hybrid technique

KAUST Repository

Theofilatos, Konstantinos A.

2013-07-12

Proteins are the functional components of many cellular processes and the identification of their physical protein–protein interactions (PPIs) is an area of mature academic research. Various databases have been developed containing information about experimentally and computationally detected human PPIs as well as their corresponding annotation data. However, these databases contain many false positive interactions, are partial and only a few of them incorporate data from various sources. To overcome these limitations, we have developed HINT-KB (http://biotools.ceid.upatras.gr/hint-kb/), a knowledge base that integrates data from various sources, provides a user-friendly interface for their retrieval, cal-culatesasetoffeaturesofinterest and computesaconfidence score for every candidate protein interaction. This confidence score is essential for filtering the false positive interactions which are present in existing databases, predicting new protein interactions and measuring the frequency of each true protein interaction. For this reason, a novel machine learning hybrid methodology, called (Evolutionary Kalman Mathematical Modelling—EvoKalMaModel), was used to achieve an accurate and interpretable scoring methodology. The experimental results indicated that the proposed scoring scheme outperforms existing computational methods for the prediction of PPIs.
HMM Logos for visualization of protein families

Directory of Open Access Journals (Sweden)

Schultz Jörg

2004-01-01

Full Text Available Abstract Background Profile Hidden Markov Models (pHMMs are a widely used tool for protein family research. Up to now, however, there exists no method to visualize all of their central aspects graphically in an intuitively understandable way. Results We present a visualization method that incorporates both emission and transition probabilities of the pHMM, thus extending sequence logos introduced by Schneider and Stephens. For each emitting state of the pHMM, we display a stack of letters. The stack height is determined by the deviation of the position's letter emission frequencies from the background frequencies. The stack width visualizes both the probability of reaching the state (the hitting probability and the expected number of letters the state emits during a pass through the model (the state's expected contribution. A web interface offering online creation of HMM Logos and the corresponding source code can be found at the Logos web server of the Max Planck Institute for Molecular Genetics http://logos.molgen.mpg.de. Conclusions We demonstrate that HMM Logos can be a useful tool for the biologist: We use them to highlight differences between two homologous subfamilies of GTPases, Rab and Ras, and we show that they are able to indicate structural elements of Ras.
Characterizing synaptic protein development in human visual cortex enables alignment of synaptic age with rat visual cortex

Directory of Open Access Journals (Sweden)

Joshua G.A Pinto

2015-02-01

Full Text Available Although many potential neuroplasticity based therapies have been developed in the lab, few have translated into established clinical treatments for human neurologic or neuropsychiatric diseases. Animal models, especially of the visual system, have shaped our understanding of neuroplasticity by characterizing the mechanisms that promote neural changes and defining timing of the sensitive period. The lack of knowledge about development of synaptic plasticity mechanisms in human cortex, and about alignment of synaptic age between animals and humans, has limited translation of neuroplasticity therapies. In this study, we quantified expression of a set of highly conserved pre- and post-synaptic proteins (Synapsin, Synaptophysin, PSD-95, Gephyrin and found that synaptic development in human primary visual cortex continues into late childhood. Indeed, this is many years longer than suggested by neuroanatomical studies and points to a prolonged sensitive period for plasticity in human sensory cortex. In addition, during childhood we found waves of inter-individual variability that are different for the 4 proteins and include a stage during early development (<1 year when only Gephyrin has high inter-individual variability. We also found that pre- and post-synaptic protein balances develop quickly, suggesting that maturation of certain synaptic functions happens within the first year or two of life. A multidimensional analysis (principle component analysis showed that most of the variance was captured by the sum of the 4 synaptic proteins. We used that sum to compare development of human and rat visual cortex and identified a simple linear equation that provides robust alignment of synaptic age between humans and rats. Alignment of synaptic ages is important for age-appropriate targeting and effective translation of neuroplasticity therapies from the lab to the clinic.
Vivaldi: Visualization and validation of biomacromolecular NMR structures from the PDB

Science.gov (United States)

Hendrickx, Pieter M S; Gutmanas, Aleksandras; Kleywegt, Gerard J

2013-01-01

We describe Vivaldi (VIsualization and VALidation DIsplay; http://pdbe.org/vivaldi), a web-based service for the analysis, visualization, and validation of NMR structures in the Protein Data Bank (PDB). Vivaldi provides access to model coordinates and several types of experimental NMR data using interactive visualization tools, augmented with structural annotations and model-validation information. The service presents information about the modeled NMR ensemble, validation of experimental chemical shifts, residual dipolar couplings, distance and dihedral angle constraints, as well as validation scores based on empirical knowledge and databases. Vivaldi was designed for both expert NMR spectroscopists and casual non-expert users who wish to obtain a better grasp of the information content and quality of NMR structures in the public archive. © Proteins 2013. © 2012 Wiley Periodicals, Inc. PMID:23180575
Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Directory of Open Access Journals (Sweden)

Holly J Atkinson

Full Text Available The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Science.gov (United States)

Atkinson, Holly J; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C

2009-01-01

The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
Database Description - ConfC | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available abase Description General information of database Database name ConfC Alternative name Database...amotsu Noguchi Tel: 042-495-8736 E-mail: Database classification Structure Database...s - Protein structure Structure Databases - Small molecules Structure Databases - Nucleic acid structure Database... services - Need for user registration - About This Database Database Description Download License Update History of This Database... Site Policy | Contact Us Database Description - ConfC | LSDB Archive ...
Database Description - TMFunction | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available sidue (or mutant) in a protein. The experimental data are collected from the literature both by searching th...the sequence database, UniProt, structural database, PDB, and literature database
A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

Science.gov (United States)

Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

2017-06-23

The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass
The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan.

Science.gov (United States)

Schröter, Pauline; Schroeder, Sascha

2017-12-01

With the Developmental Lexicon Project (DeveL), we present a large-scale study that was conducted to collect data on visual word recognition in German across the lifespan. A total of 800 children from Grades 1 to 6, as well as two groups of younger and older adults, participated in the study and completed a lexical decision and a naming task. We provide a database for 1,152 German words, comprising behavioral data from seven different stages of reading development, along with sublexical and lexical characteristics for all stimuli. The present article describes our motivation for this project, explains the methods we used to collect the data, and reports analyses on the reliability of our results. In addition, we explored developmental changes in three marker effects in psycholinguistic research: word length, word frequency, and orthographic similarity. The database is available online.
Database Description - PSCDB | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available abase Description General information of database Database name PSCDB Alternative n...rial Science and Technology (AIST) Takayuki Amemiya E-mail: Database classification Structure Databases - Protein structure Database...554-D558. External Links: Original website information Database maintenance site Graduate School of Informat...available URL of Web services - Need for user registration Not available About This Database Database Descri...ption Download License Update History of This Database Site Policy | Contact Us Database Description - PSCDB | LSDB Archive ...
Database Description - SAHG | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available base Description General information of database Database name SAHG Alternative nam...h: Contact address Chie Motono Tel : +81-3-3599-8067 E-mail : Database classification Structure Databases - ...e databases - Protein properties Organism Taxonomy Name: Homo sapiens Taxonomy ID: 9606 Database description... Links: Original website information Database maintenance site The Molecular Profiling Research Center for D...stration Not available About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Database Description - SAHG | LSDB Archive ...
High Performance Protein Sequence Database Scanning on the Cell Broadband Engine

Directory of Open Access Journals (Sweden)

Adrianto Wirawan

2009-01-01

Full Text Available The enormous growth of biological sequence databases has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of low cost parallel multicore accelerator technologies has made it possible to reduce execution times of many bioinformatics applications. In this paper, we demonstrate how the Cell Broadband Engine can be used as a computational platform to accelerate two approaches for protein sequence database scanning: exhaustive and heuristic. We present efficient parallelization techniques for two representative algorithms: the dynamic programming based Smith–Waterman algorithm and the popular BLASTP heuristic. Their implementation on a Playstation®3 leads to significant runtime savings compared to corresponding sequential implementations.
Earth History databases and visualization - the TimeScale Creator system

Science.gov (United States)

Ogg, James; Lugowski, Adam; Gradstein, Felix

2010-05-01

The "TimeScale Creator" team (www.tscreator.org) and the Subcommission on Stratigraphic Information (stratigraphy.science.purdue.edu) of the International Commission on Stratigraphy (www.stratigraphy.org) has worked with numerous geoscientists and geological surveys to prepare reference datasets for global and regional stratigraphy. All events are currently calibrated to Geologic Time Scale 2004 (Gradstein et al., 2004, Cambridge Univ. Press) and Concise Geologic Time Scale (Ogg et al., 2008, Cambridge Univ. Press); but the array of intercalibrations enable dynamic adjustment to future numerical age scales and interpolation methods. The main "global" database contains over 25,000 events/zones from paleontology, geomagnetics, sea-level and sequence stratigraphy, igneous provinces, bolide impacts, plus several stable isotope curves and image sets. Several regional datasets are provided in conjunction with geological surveys, with numerical ages interpolated using a similar flexible inter-calibration procedure. For example, a joint program with Geoscience Australia has compiled an extensive Australian regional biostratigraphy and a full array of basin lithologic columns with each formation linked to public lexicons of all Proterozoic through Phanerozoic basins - nearly 500 columns of over 9,000 data lines plus hot-curser links to oil-gas reference wells. Other datapacks include New Zealand biostratigraphy and basin transects (ca. 200 columns), Russian biostratigraphy, British Isles regional stratigraphy, Gulf of Mexico biostratigraphy and lithostratigraphy, high-resolution Neogene stable isotope curves and ice-core data, human cultural episodes, and Circum-Arctic stratigraphy sets. The growing library of datasets is designed for viewing and chart-making in the free "TimeScale Creator" JAVA package. This visualization system produces a screen display of the user-selected time-span and the selected columns of geologic time scale information. The user can change the
Characterizing synaptic protein development in human visual cortex enables alignment of synaptic age with rat visual cortex

Science.gov (United States)

Pinto, Joshua G. A.; Jones, David G.; Williams, C. Kate; Murphy, Kathryn M.

2015-01-01

Although many potential neuroplasticity based therapies have been developed in the lab, few have translated into established clinical treatments for human neurologic or neuropsychiatric diseases. Animal models, especially of the visual system, have shaped our understanding of neuroplasticity by characterizing the mechanisms that promote neural changes and defining timing of the sensitive period. The lack of knowledge about development of synaptic plasticity mechanisms in human cortex, and about alignment of synaptic age between animals and humans, has limited translation of neuroplasticity therapies. In this study, we quantified expression of a set of highly conserved pre- and post-synaptic proteins (Synapsin, Synaptophysin, PSD-95, Gephyrin) and found that synaptic development in human primary visual cortex (V1) continues into late childhood. Indeed, this is many years longer than suggested by neuroanatomical studies and points to a prolonged sensitive period for plasticity in human sensory cortex. In addition, during childhood we found waves of inter-individual variability that are different for the four proteins and include a stage during early development (visual cortex and identified a simple linear equation that provides robust alignment of synaptic age between humans and rats. Alignment of synaptic ages is important for age-appropriate targeting and effective translation of neuroplasticity therapies from the lab to the clinic. PMID:25729353
Visualizing data mining results with the Brede tools

Directory of Open Access Journals (Sweden)

Finn A Nielsen

2009-07-01

Full Text Available A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them --- the BrainMap database. Since then the Brede Toolbox has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web presence of the Brede Database is established by a single script executing a workflow involving these steps together with a final generation of Web pages with embedded visualizations and links to interactive three-dimensional models in the Virtual Reality Modeling Language. Apart from the Brede tools I briefly review alternate visualization tools and methods for Internet-based visualization and information visualization as well as portals for visualization tools.
The human interactome knowledge base (hint-kb): An integrative human protein interaction database enriched with predicted protein–protein interaction scores using a novel hybrid technique

KAUST Repository

Theofilatos, Konstantinos A.; Dimitrakopoulos, Christos M.; Likothanassis, Spiridon D.; Kleftogiannis, Dimitrios A.; Moschopoulos, Charalampos N.; Alexakos, Christos; Papadimitriou, Stergios; Mavroudi, Seferina P.

2013-01-01

Proteins are the functional components of many cellular processes and the identification of their physical protein–protein interactions (PPIs) is an area of mature academic research. Various databases have been developed containing information about
PATtyFams: Protein families for the microbial genomes in the PATRIC database

Directory of Open Access Journals (Sweden)

James J Davis

2016-02-01

Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.
Knitting Relational Documentary Networks: The Database Meta-Documentary Filming Revolution as a paradigm of bringing interactive audio-visual archives alive

NARCIS (Netherlands)

Wiehl, Anna

2016-01-01

abstractOne phenomenon in the emerging field of digital documentary are experiments with rhizomatic interfaces and database-logics to bring audio-visual archives 'alive'. A paradigm hereof is Filming Revolution (2015), an interactive platform which gathers and interlinks films of the uprisings in

Visualization of protein folding funnels in lattice models.

Directory of Open Access Journals (Sweden)

Antonio B Oliveira

Full Text Available Protein folding occurs in a very high dimensional phase space with an exponentially large number of states, and according to the energy landscape theory it exhibits a topology resembling a funnel. In this statistical approach, the folding mechanism is unveiled by describing the local minima in an effective one-dimensional representation. Other approaches based on potential energy landscapes address the hierarchical structure of local energy minima through disconnectivity graphs. In this paper, we introduce a metric to describe the distance between any two conformations, which also allows us to go beyond the one-dimensional representation and visualize the folding funnel in 2D and 3D. In this way it is possible to assess the folding process in detail, e.g., by identifying the connectivity between conformations and establishing the paths to reach the native state, in addition to regions where trapping may occur. Unlike the disconnectivity maps method, which is based on the kinetic connections between states, our methodology is based on structural similarities inferred from the new metric. The method was developed in a 27-mer protein lattice model, folded into a 3×3×3 cube. Five sequences were studied and distinct funnels were generated in an analysis restricted to conformations from the transition-state to the native configuration. Consistent with the expected results from the energy landscape theory, folding routes can be visualized to probe different regions of the phase space, as well as determine the difficulty in folding of the distinct sequences. Changes in the landscape due to mutations were visualized, with the comparison between wild and mutated local minima in a single map, which serves to identify different trapping regions. The extension of this approach to more realistic models and its use in combination with other approaches are discussed.
ProOpDB: Prokaryotic Operon DataBase.

Science.gov (United States)

Taboada, Blanca; Ciria, Ricardo; Martinez-Guerrero, Cristian E; Merino, Enrique

2012-01-01

The Prokaryotic Operon DataBase (ProOpDB, http://operons.ibt.unam.mx/OperonPredictor) constitutes one of the most precise and complete repositories of operon predictions now available. Using our novel and highly accurate operon identification algorithm, we have predicted the operon structures of more than 1200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: (i) organism name, (ii) metabolic pathways, as defined by the KEGG database, (iii) gene orthology, as defined by the COG database, (iv) conserved protein domains, as defined by the Pfam database, (v) reference gene and (vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient method to select the most representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool to visualize their genomic context and retrieve the sequence of their corresponding 5' regulatory regions, as well as the nucleotide or amino acid sequences of their genes.
HotRegion: a database of predicted hot spot clusters.

Science.gov (United States)

Cukuroglu, Engin; Gursoy, Attila; Keskin, Ozlem

2012-01-01

Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided. HotRegion is accessible at http://prism.ccbb.ku.edu.tr/hotregion.
Analysis of high accuracy, quantitative proteomics data in the MaxQB database.

Science.gov (United States)

Schaab, Christoph; Geiger, Tamar; Stoehr, Gabriele; Cox, Juergen; Mann, Matthias

2012-03-01

MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database
Phenylglyoxal-Based Visualization of Citrullinated Proteins on Western Blots

Directory of Open Access Journals (Sweden)

Sanne M. M. Hensen

2015-04-01

Full Text Available Citrullination is the conversion of peptidylarginine to peptidylcitrulline, which is catalyzed by peptidylarginine deiminases. This conversion is involved in different physiological processes and is associated with several diseases, including cancer and rheumatoid arthritis. A common method to detect citrullinated proteins relies on anti-modified citrulline antibodies directed to a specific chemical modification of the citrulline side chain. Here, we describe a versatile, antibody-independent method for the detection of citrullinated proteins on a membrane, based on the selective reaction of phenylglyoxal with the ureido group of citrulline under highly acidic conditions. The method makes use of 4-azidophenylglyoxal, which, after reaction with citrullinated proteins, can be visualized with alkyne-conjugated probes. The sensitivity of this procedure, using an alkyne-biotin probe, appeared to be comparable to the antibody-based detection method and independent of the sequence surrounding the citrulline.
The human keratinocyte two-dimensional protein database (update 1994): towards an integrated approach to the study of cell proliferation, differentiation and skin diseases

DEFF Research Database (Denmark)

Celis, J E; Rasmussen, H H; Olsen, E

1994-01-01

The master two-dimensional (2-D) gel database of human keratinocytes currently lists 3087 cellular proteins (2168 isoelectric focusing, IEF; and 919 none-quilibrium pH gradient electrophoresis, NEPHGE), many of which correspond to posttranslational modifications, 890 polypeptides have been...... in the database. We also report a database of proteins recovered from the medium of noncultured, unfractionated keratinocytes. This database lists 398 polypeptides (309 IEF; 89 NEPHGE) of which 76 have been identified. The aim of the comprehensive databases is to gather, through a systematic study...
PharmDB-K: Integrated Bio-Pharmacological Network Database for Traditional Korean Medicine.

Directory of Open Access Journals (Sweden)

Ji-Hyun Lee

Full Text Available Despite the growing attention given to Traditional Medicine (TM worldwide, there is no well-known, publicly available, integrated bio-pharmacological Traditional Korean Medicine (TKM database for researchers in drug discovery. In this study, we have constructed PharmDB-K, which offers comprehensive information relating to TKM-associated drugs (compound, disease indication, and protein relationships. To explore the underlying molecular interaction of TKM, we integrated fourteen different databases, six Pharmacopoeias, and literature, and established a massive bio-pharmacological network for TKM and experimentally validated some cases predicted from the PharmDB-K analyses. Currently, PharmDB-K contains information about 262 TKMs, 7,815 drugs, 3,721 diseases, 32,373 proteins, and 1,887 side effects. One of the unique sets of information in PharmDB-K includes 400 indicator compounds used for standardization of herbal medicine. Furthermore, we are operating PharmDB-K via phExplorer (a network visualization software and BioMart (a data federation framework for convenient search and analysis of the TKM network. Database URL: http://pharmdb-k.org, http://biomart.i-pharm.org.
The impact of visual media to encourage low protein cooking in inherited metabolic disorders.

Science.gov (United States)

Evans, S; Daly, A; Hopkins, V; Davies, P; MacDonald, A

2009-10-01

The use of educational visual aids is one way to help children with inherited metabolic disorders (IMD) understand and develop a positive attitude towards their low protein diet. However, it is difficult to establish their effectiveness in the clinical setting. The present study aimed to evaluate the impact of a low protein recipe book and accompanying DVD for children with IMD. One hundred and five children (53% female; median age = 6-8 years) with IMD on low protein diets were each given a low protein recipe book and DVD. After 6 months, children and carers were posted a questionnaire asking whether they used these resources; identifying any change in frequency of low protein cooking; and the outcome when preparing recipes. One hundred and two questionnaires were returned, representing 105 patients. Seventy percent (n = 71) of questionnaires were from carers. Ninety-three percent (n = 66) of carers acknowledged receipt of the resource; one-third (n = 22) had not watched the DVD and 23% (n = 15) had not opened the recipe book; 55% (n = 36) had tried the recipes; and 71% (n = 47) said the recipe book and/or DVD motivated them to try new recipes. Children were more likely to have watched the DVD (75%; n = 21/28) and read the recipe book (86%; n = 24/28) than carers. Although a helpful educational tool, just over one-half of respondents had used the resource. Identifying visual media that, by itself, will motivate most families of children with IMD to prepare low protein recipes may be unrealistic. The combined approach of visual aids and 'hands-on' practical experience, such as low protein cooking workshops and individual counselling, may be more beneficial.
Visualization and characterization of individual type III protein secretion machines in live bacteria.

Science.gov (United States)

Zhang, Yongdeng; Lara-Tejero, María; Bewersdorf, Jörg; Galán, Jorge E

2017-06-06

Type III protein secretion machines have evolved to deliver bacterially encoded effector proteins into eukaryotic cells. Although electron microscopy has provided a detailed view of these machines in isolation or fixed samples, little is known about their organization in live bacteria. Here we report the visualization and characterization of the Salmonella type III secretion machine in live bacteria by 2D and 3D single-molecule switching superresolution microscopy. This approach provided access to transient components of this machine, which previously could not be analyzed. We determined the subcellular distribution of individual machines, the stoichiometry of the different components of this machine in situ, and the spatial distribution of the substrates of this machine before secretion. Furthermore, by visualizing this machine in Salmonella mutants we obtained major insights into the machine's assembly. This study bridges a major resolution gap in the visualization of this nanomachine and may serve as a paradigm for the examination of other bacterially encoded molecular machines.
ExtraTrain: a database of Extragenic regions and Transcriptional information in prokaryotic organisms

Science.gov (United States)

Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel

2006-01-01

Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733
The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

Science.gov (United States)

Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S. A.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.

2006-12-01

The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all measurements and the derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. The query result set is displayed in a digestible tabular format allowing the user to descend through hierarchical levels such as from locations to sites, samples, specimens, and measurements. At each stage, the result set can be saved and, if supported by the data, can be visualized by plotting global location maps, equal area plots, or typical Zijderveld, hysteresis, and various magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (Version 2.1) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload and takes only a few minutes to process several thousand data records. The standardized MagIC template files are stored in the digital archives of EarthRef.org where they
Yeast Interacting Proteins Database: YNR006W, YHL002W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ling Golgi proteins, forming lumenal membranes and sorting ubiquitinated proteins destined for degradation; ..., as well as for recycling of Golgi proteins and formation of lumenal membranes Rows with this prey as prey ...1p; required for recycling Golgi proteins, forming lumenal membranes and sorting ubiquitinated proteins dest...degradation, as well as for recycling of Golgi proteins and formation of lumenal membranes
SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

Science.gov (United States)

Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

2018-04-20

Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The
Development of Glutamatergic Proteins in Human Visual Cortex across the Lifespan.

Science.gov (United States)

Siu, Caitlin R; Beshara, Simon P; Jones, David G; Murphy, Kathryn M

2017-06-21

Traditionally, human primary visual cortex (V1) has been thought to mature within the first few years of life, based on anatomical studies of synapse formation, and establishment of intracortical and intercortical connections. Human vision, however, develops well beyond the first few years. Previously, we found prolonged development of some GABAergic proteins in human V1 (Pinto et al., 2010). Yet as >80% of synapses in V1 are excitatory, it remains unanswered whether the majority of synapses regulating experience-dependent plasticity and receptive field properties develop late, like their inhibitory counterparts. To address this question, we used Western blotting of postmortem tissue from human V1 (12 female, 18 male) covering a range of ages. Then we quantified a set of postsynaptic glutamatergic proteins (PSD-95, GluA2, GluN1, GluN2A, GluN2B), calculated indices for functional pairs that are developmentally regulated (GluA2:GluN1; GluN2A:GluN2B), and determined interindividual variability. We found early loss of GluN1, prolonged development of PSD-95 and GluA2 into late childhood, protracted development of GluN2A until ∼40 years, and dramatic loss of GluN2A in aging. The GluA2:GluN1 index switched at ∼1 year, but the GluN2A:GluN2B index continued to shift until ∼40 year before changing back to GluN2B in aging. We also identified young childhood as a stage of heightened interindividual variability. The changes show that human V1 develops gradually through a series of five orchestrated stages, making it likely that V1 participates in visual development and plasticity across the lifespan. SIGNIFICANCE STATEMENT Anatomical structure of human V1 appears to mature early, but vision changes across the lifespan. This discrepancy has fostered two hypotheses: either other aspects of V1 continue changing, or later changes in visual perception depend on extrastriate areas. Previously, we showed that some GABAergic synaptic proteins change across the lifespan, but most
The Molecule Cloud - compact visualization of large collections of molecules

Directory of Open Access Journals (Sweden)

Ertl Peter

2012-07-01

Full Text Available Abstract Background Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance. Summary A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach “Molecule Cloud”. The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code. Conclusions Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large
The CATH database

Directory of Open Access Journals (Sweden)

Knudsen Michael

2010-02-01

Full Text Available Abstract The CATH database provides hierarchical classification of protein domains based on their folding patterns. Domains are obtained from protein structures deposited in the Protein Data Bank and both domain identification and subsequent classification use manual as well as automated procedures. The accompanying website http://www.cathdb.info provides an easy-to-use entry to the classification, allowing for both browsing and downloading of data. Here, we give a brief review of the database, its corresponding website and some related tools.
Yeast Interacting Proteins Database: YHL002W, YNR006W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ycling of Golgi proteins and formation of lumenal membranes Rows with this bait as bait (1) Rows with this b...required for recycling Golgi proteins, forming lumenal membranes and sorting ubiquitinated proteins destined...on, as well as for recycling of Golgi proteins and formation of lumenal membranes...ith Hse1p; required for recycling Golgi proteins, forming lumenal membranes and sorting ubiquitinated protei
Yeast Interacting Proteins Database: YOR047C, YKL038W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available racts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a...Bait description Protein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose senso...rs Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regulator of the tra
Yeast Interacting Proteins Database: YFR049W, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regulator... (0) YOR047C STD1 Protein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sens...ors Snf3p and Rgt2p, and TATA-binding protein Spt15p; ac
RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database.

Science.gov (United States)

Field, Helen I; Fenyö, David; Beavis, Ronald C

2002-01-01

RADARS, a rapid, automated, data archiving and retrieval software system for high-throughput proteomic mass spectral data processing and storage, is described. The majority of mass spectrometer data files are compatible with RADARS, for consistent processing. The system automatically takes unprocessed data files, identifies proteins via in silico database searching, then stores the processed data and search results in a relational database suitable for customized reporting. The system is robust, used in 24/7 operation, accessible to multiple users of an intranet through a web browser, may be monitored by Virtual Private Network, and is secure. RADARS is scalable for use on one or many computers, and is suited to multiple processor systems. It can incorporate any local database in FASTA format, and can search protein and DNA databases online. A key feature is a suite of visualisation tools (many available gratis), allowing facile manipulation of spectra, by hand annotation, reanalysis, and access to all procedures. We also described the use of Sonar MS/MS, a novel, rapid search engine requiring 40 MB RAM per process for searches against a genomic or EST database translated in all six reading frames. RADARS reduces the cost of analysis by its efficient algorithms: Sonar MS/MS can identifiy proteins without accurate knowledge of the parent ion mass and without protein tags. Statistical scoring methods provide close-to-expert accuracy and brings robust data analysis to the non-expert user.

Visualization of protein sequence features using JavaScript and SVG with pViz.js.

Science.gov (United States)

Mukhyala, Kiran; Masselot, Alexandre

2014-12-01

pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Yeast Interacting Proteins Database: YLR447C, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available xpression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Sp...; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; act
The STRING database in 2017

DEFF Research Database (Denmark)

Szklarczyk, Damian; Morris, John H; Cook, Helen

2017-01-01

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organi......A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number...... of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known...... pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer...
MutHTP: Mutations in Human Transmembrane Proteins.

Science.gov (United States)

A, Kulandaisamy; S, Binny Priya; R, Sakthivel; Tarnovskaya, Svetlana; Bizin, Ilya; Hönigschmid, Peter; Frishman, Dmitrij; Gromiha, M Michael

2018-02-01

We have developed a novel database, MutHTP, which contains information on 183395 disease-associated and 17827 neutral mutations in human transmembrane proteins. For each mutation site MutHTP provides a description of its location with respect to the membrane protein topology, structural environment (if available) and functional features. Comprehensive visualization, search, display and download options are available. The database is publicly available at http://www.iitm.ac.in/bioinfo/MutHTP/. The website is implemented using HTML, PHP and javascript and supports recent versions of all major browsers, such as Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Interleukin-1beta induced changes in the protein expression of rat islets: a computerized database

DEFF Research Database (Denmark)

Andersen, H U; Fey, S J; Larsen, Peter Mose

1997-01-01

as well as the intracellular mechanisms of action of interleukin 1-mediated beta-cell cytotoxicity are unknown. However, previous studies have found an association of beta-cell destruction with alterations in protein synthesis. Thus, two-dimensional (2-D) gel electrophoresis of pancreatic islet proteins...... may be an important tool facilitating studies of the molecular pathogenesis of insulin-dependent diabetes mellitus. 2-D gel electrophoresis of islet proteins may lead to (i) the determination of qualitative and quantitative changes in specific islet proteins induced by cytokines, (ii......) the determination of the effects of agents modulating cytokine action, and (iii) the identification of primary islet protein antigen(s) initiating the immune destruction of the beta-cells. Therefore, the aim of this study was to create databases (DB) of all reproducibly detectable protein spots on 10% and 15...
Fly-DPI: database of protein interactomes for D. melanogaster in the approach of systems biology

Directory of Open Access Journals (Sweden)

Lin Chieh-Hua

2006-12-01

Full Text Available Abstract Background Proteins control and mediate many biological activities of cells by interacting with other protein partners. This work presents a statistical model to predict protein interaction networks of Drosophila melanogaster based on insight into domain interactions. Results Three high-throughput yeast two-hybrid experiments and the collection in FlyBase were used as our starting datasets. The co-occurrences of domains in these interactive events are converted into a probability score of domain-domain interaction. These scores are used to infer putative interaction among all available open reading frames (ORFs of fruit fly. Additionally, the likelihood function is used to estimate all potential protein-protein interactions. All parameters are successfully iterated and MLE is obtained for each pair of domains. Additionally, the maximized likelihood reaches its converged criteria and maintains the probability stable. The hybrid model achieves a high specificity with a loss of sensitivity, suggesting that the model may possess major features of protein-protein interactions. Several putative interactions predicted by the proposed hybrid model are supported by literatures, while experimental data with a low probability score indicate an uncertain reliability and require further proof of interaction. Fly-DPI is the online database used to present this work. It is an integrated proteomics tool with comprehensive protein annotation information from major databases as well as an effective means of predicting protein-protein interactions. As a novel search strategy, the ping-pong search is a naïve path map between two chosen proteins based on pre-computed shortest paths. Adopting effective filtering strategies will facilitate researchers in depicting the bird's eye view of the network of interest. Fly-DPI can be accessed at http://flydpi.nhri.org.tw. Conclusion This work provides two reference systems, statistical and biological, to evaluate
Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organis...m species. Data detail Data name Amino acid sequences of predicted proteins and their annotation for 95 orga...nism species. DOI 10.18908/lsdba.nbdc00464-001 Description of data contents Amino acid sequences of predicted proteins...Database Description Download License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted prot...eins and their annotation for 95 organism species. - Gclust Server | LSDB Archive ...
RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database.

Science.gov (United States)

Miao, Zhichao; Westhof, Eric

2016-07-08

RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yeast Interacting Proteins Database: YGR013W, YKL012W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available tion U1 snRNP protein involved in splicing, interacts with the branchpoint-binding protein during the formation of the second commitm... PRP40 U1 snRNP protein involved in splicing, interacts with the branchpoint-binding protein during the form...ation of the second commitment complex Rows with this prey as prey (1) Rows with
Touching proteins with virtual bare hands - Visualizing protein-drug complexes and their dynamics in self-made virtual reality using gaming hardware

Science.gov (United States)

Ratamero, Erick Martins; Bellini, Dom; Dowson, Christopher G.; Römer, Rudolf A.

2018-06-01

The ability to precisely visualize the atomic geometry of the interactions between a drug and its protein target in structural models is critical in predicting the correct modifications in previously identified inhibitors to create more effective next generation drugs. It is currently common practice among medicinal chemists while attempting the above to access the information contained in three-dimensional structures by using two-dimensional projections, which can preclude disclosure of useful features. A more accessible and intuitive visualization of the three-dimensional configuration of the atomic geometry in the models can be achieved through the implementation of immersive virtual reality (VR). While bespoke commercial VR suites are available, in this work, we present a freely available software pipeline for visualising protein structures through VR. New consumer hardware, such as the uc(HTC Vive) and the uc(Oculus Rift) utilized in this study, are available at reasonable prices. As an instructive example, we have combined VR visualization with fast algorithms for simulating intramolecular motions of protein flexibility, in an effort to further improve structure-led drug design by exposing molecular interactions that might be hidden in the less informative static models. This is a paradigmatic test case scenario for many similar applications in computer-aided molecular studies and design.
KEGGtranslator: visualizing and converting the KEGG PATHWAY database to various formats.

Science.gov (United States)

Wrzodek, Clemens; Dräger, Andreas; Zell, Andreas

2011-08-15

The KEGG PATHWAY database provides a widely used service for metabolic and nonmetabolic pathways. It contains manually drawn pathway maps with information about the genes, reactions and relations contained therein. To store these pathways, KEGG uses KGML, a proprietary XML-format. Parsers and translators are needed to process the pathway maps for usage in other applications and algorithms. We have developed KEGGtranslator, an easy-to-use stand-alone application that can visualize and convert KGML formatted XML-files into multiple output formats. Unlike other translators, KEGGtranslator supports a plethora of output formats, is able to augment the information in translated documents (e.g. MIRIAM annotations) beyond the scope of the KGML document, and amends missing components to fragmentary reactions within the pathway to allow simulations on those. KEGGtranslator is freely available as a Java(™) Web Start application and for download at http://www.cogsys.cs.uni-tuebingen.de/software/KEGGtranslator/. KGML files can be downloaded from within the application. clemens.wrzodek@uni-tuebingen.de Supplementary data are available at Bioinformatics online.
Utilizing Biotinylated Proteins Expressed in Yeast to Visualize DNA–Protein Interactions at the Single-Molecule Level

Directory of Open Access Journals (Sweden)

Huijun Xue

2017-10-01

Full Text Available Much of our knowledge in conventional biochemistry has derived from bulk assays. However, many stochastic processes and transient intermediates are hidden when averaged over the ensemble. The powerful technique of single-molecule fluorescence microscopy has made great contributions to the understanding of life processes that are inaccessible when using traditional approaches. In single-molecule studies, quantum dots (Qdots have several unique advantages over other fluorescent probes, such as high brightness, extremely high photostability, and large Stokes shift, thus allowing long-time observation and improved signal-to-noise ratios. So far, however, there is no convenient way to label proteins purified from budding yeast with Qdots. Based on BirA–Avi and biotin–streptavidin systems, we have established a simple method to acquire a Qdot-labeled protein and visualize its interaction with DNA using total internal reflection fluorescence microscopy. For proof-of-concept, we chose replication protein A (RPA and origin recognition complex (ORC as the proteins of interest. Proteins were purified from budding yeast with high biotinylation efficiency and rapidly labeled with streptavidin-coated Qdots. Interactions between proteins and DNA were observed successfully at the single-molecule level.
Tools and procedures for visualization of proteins and other biomolecules.

Science.gov (United States)

Pan, Lurong; Aller, Stephen G

2015-04-01

Protein, peptides, and nucleic acids are biomolecules that drive biological processes in living organisms. An enormous amount of structural data for a large number of these biomolecules has been described with atomic precision in the form of structural "snapshots" that are freely available in public repositories. These snapshots can help explain how the biomolecules function, the nature of interactions between multi-molecular complexes, and even how small-molecule drugs can modulate the biomolecules for clinical benefits. Furthermore, these structural snapshots serve as inputs for sophisticated computer simulations to turn the biomolecules into moving, "breathing" molecular machines for understanding their dynamic properties in real-time computer simulations. In order for the researcher to take advantage of such a wealth of structural data, it is necessary to gain competency in the use of computer molecular visualization tools for exploring the structures and visualizing three-dimensional spatial representations. Here, we present protocols for using two common visualization tools--the Web-based Jmol and the stand-alone PyMOL package--as well as a few examples of other popular tools. Copyright © 2015 John Wiley & Sons, Inc.
Uploading, Searching and Visualizing of Paleomagnetic and Rock Magnetic Data in the Online MagIC Database

Science.gov (United States)

Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Donadini, F.

2007-12-01

The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all available measurements and derived properties from paleomagnetic studies of directions and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and will soon implement two search nodes, one for paleomagnetism and one for rock magnetism. Currently the PMAG node is operational. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. Users can also browse the database by data type or by data compilation to view all contributions associated with well known earlier collections like PINT, GMPDB or PSVRL. The query result set is displayed in a digestible tabular format allowing the user to descend from locations to sites, samples, specimens and measurements. At each stage, the result set can be saved and, where appropriate, can be visualized by plotting global location maps, equal area, XY, age, and depth plots, or typical Zijderveld, hysteresis, magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (version 2.3) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload
Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology

International Nuclear Information System (INIS)

Shen Yang; Bax, Ad

2007-01-01

Chemical shifts of nuclei in or attached to a protein backbone are exquisitely sensitive to their local environment. A computer program, SPARTA, is described that uses this correlation with local structure to predict protein backbone chemical shifts, given an input three-dimensional structure, by searching a newly generated database for triplets of adjacent residues that provide the best match in φ/ψ/χ 1 torsion angles and sequence similarity to the query triplet of interest. The database contains 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C' chemical shifts for 200 proteins for which a high resolution X-ray (≤2.4 A) structure is available. The relative importance of the weighting factors for the φ/ψ/χ 1 angles and sequence similarity was optimized empirically. The weighted, average secondary shifts of the central residues in the 20 best-matching triplets, after inclusion of nearest neighbor, ring current, and hydrogen bonding effects, are used to predict chemical shifts for the protein of known structure. Validation shows good agreement between the SPARTA-predicted and experimental shifts, with standard deviations of 2.52, 0.51, 0.27, 0.98, 1.07 and 1.08 ppm for 15 N, 1 H N , 1 H α , 13 C α , 13 C β and 13 C', respectively, including outliers
Yeast Interacting Proteins Database: YGL237C, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding prote... expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein
Yeast Interacting Proteins Database: YKL002W, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding prote...xpression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Sp
Yeast Interacting Proteins Database: YGL127C, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ith protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regula...rotein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors
Visualization of red-ox proteins on the gold surface using enzymatic polypyrrole formation

International Nuclear Information System (INIS)

Ramanaviciene, A.; Kausaite-Minkstimiene, A.; Voronovic, J.; Ramanavicius, A.; Oztekin, Y.; Carac, G.; German, N.

2011-01-01

We describe a new method for the visualization of the activity of red-ox proteins on a gold interface. Glucose oxidase was selected as a model system. Surfaces were modified by adhesion of glucose oxidase on (a) electrochemically cleaned gold; (b) gold films modified with gold nanoparticles, (c) a gold surface modified with self-assembled monolayer, and (d) covalent immobilization of protein on the gold surface modified with a self-assembled monolayer. The simple optical method for the visualization of enzyme on the surfaces is based on the enzymatic formation of polypyrrole. The activity of the enzyme was quantified via enzymatic formation of polypyrrole, which was detected and investigated by quartz microbalance and amperometric techniques. The experimental data suggest that the enzymatic formation of the polymer may serve as a method to indicate the adhesion of active redox enzyme on such surfaces. (author)
In vivo conjugation of nasal lavage proteins by hexahydrophthalic anhydride

International Nuclear Information System (INIS)

Johannesson, Gunvor; Lindh, Christian; Nielsen, Joern; Bjoerk, Birgitta; Rosqvist, Seema; Joensson, Bo A.G.

2004-01-01

Hexahydrophthalic anhydride (HHPA), an industrially important chemical, is a highly allergenic compound. The aim of this work was to identify proteins in nasal lavage fluid (NLF) that form adducts with HHPA. Such bindings may induce production of specific immunoglobulin E (IgE) or affect physiological mechanisms of the proteins. NLF was obtained from HHPA-exposed volunteers, workers and exposed guinea pigs. HHPA-binding proteins were visualized with immunoblotting using a polyclonal antiserum against HHPA. The proteins were excised from sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) gels, digested with trypsin and identified by tandem mass spectrometry (MS/MS) and database searches. The antiserum was found to be specific for HHPA-bound proteins. In vivo formed HHPA-binding proteins in humans were identified as antileukoproteinase, immunoglobulin G (IgG), immunoglobulin A (IgA), serum albumin and lactoferrin. In addition, several proteins binding to HHPA were found in NLFs from guinea pigs but these could not be identified from database searches. Hypotheses for development of airways diseases by adduction of this allergenic compound to the NLF proteins in humans were established

Yeast Interacting Proteins Database: YOR358W, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; act...rotein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt15p; acts as a regulator o
Managing Rock and Paleomagnetic Data Flow with the MagIC Database: from Measurement and Analysis to Comprehensive Archive and Visualization

Science.gov (United States)

Koppers, A. A.; Minnett, R. C.; Tauxe, L.; Constable, C.; Donadini, F.

2008-12-01

The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by rock and paleomagnetic data. The goal of MagIC is to archive all measurements and derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). Organizing data for presentation in peer-reviewed publications or for ingestion into databases is a time-consuming task, and to facilitate these activities, three tightly integrated tools have been developed: MagIC-PY, the MagIC Console Software, and the MagIC Online Database. A suite of Python scripts is available to help users port their data into the MagIC data format. They allow the user to add important metadata, perform basic interpretations, and average results at the specimen, sample and site levels. These scripts have been validated for use as Open Source software under the UNIX, Linux, PC and Macintosh© operating systems. We have also developed the MagIC Console Software program to assist in collating rock and paleomagnetic data for upload to the MagIC database. The program runs in Microsoft Excel© on both Macintosh© computers and PCs. It performs routine consistency checks on data entries, and assists users in preparing data for uploading into the online MagIC database. The MagIC website is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual FlashMap interface to browse and select locations. Users can also browse the database by data type (inclination, intensity, VGP, hysteresis, susceptibility) or by data compilation to view all contributions associated with previous databases, such as PINT, GMPDB or TAFI or other user
LeishCyc: a biochemical pathways database for Leishmania major

Directory of Open Access Journals (Sweden)

Doyle Maria A

2009-06-01

Full Text Available Abstract Background Leishmania spp. are sandfly transmitted protozoan parasites that cause a spectrum of diseases in more than 12 million people worldwide. Much research is now focusing on how these parasites adapt to the distinct nutrient environments they encounter in the digestive tract of the sandfly vector and the phagolysosome compartment of mammalian macrophages. While data mining and annotation of the genomes of three Leishmania species has provided an initial inventory of predicted metabolic components and associated pathways, resources for integrating this information into metabolic networks and incorporating data from transcript, protein, and metabolite profiling studies is currently lacking. The development of a reliable, expertly curated, and widely available model of Leishmania metabolic networks is required to facilitate systems analysis, as well as discovery and prioritization of new drug targets for this important human pathogen. Description The LeishCyc database was initially built from the genome sequence of Leishmania major (v5.2, based on the annotation published by the Wellcome Trust Sanger Institute. LeishCyc was manually curated to remove errors, correct automated predictions, and add information from the literature. The ongoing curation is based on public sources, literature searches, and our own experimental and bioinformatics studies. In a number of instances we have improved on the original genome annotation, and, in some ambiguous cases, collected relevant information from the literature in order to help clarify gene or protein annotation in the future. All genes in LeishCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger Institute. Conclusion The LeishCyc database describes Leishmania major genes, gene products, metabolites, their relationships and biochemical organization into metabolic pathways. LeishCyc provides a systematic approach to organizing the evolving information about Leishmania
Sting_RDB: a relational database of structural parameters for protein analysis with support for data warehousing and data mining.

Science.gov (United States)

Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G

2007-10-05

An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
Proximity probing assays for simultaneous visualization of protein complexes in situ

DEFF Research Database (Denmark)

Moreira, José; Thorsen, Stine Buch; Brünner, Nils

2013-01-01

EVALUATION OF: Leuchowius KJ, Clausson CM, Grannas K et al. Parallel visualization of multiple protein complexes in individual cells in tumor tissue. Mol. Cell Proteomics doi:10.1074/mcp.O112.023374 (2013) (Epub ahead of print). Techniques for in situ detection and quantification of proteins...... in fixed tissue remain an important element of both basic biological analyses and clinical biomarker research. The practical importance of such techniques can be exemplified by the everyday clinical use of immunohistochemical detection of the estrogen receptor and HER2 in tissues from breast cancer...
See me, feel me: methods to concurrently visualize and manipulate single DNA molecules and associated proteins

NARCIS (Netherlands)

van Mameren, J.; Peterman, E.J.G.; Wuite, G.J.L.

2008-01-01

Direct visualization of DNA and proteins allows researchers to investigate DNA-protein interactions with great detail. Much progress has been made in this area as a result of increasingly sensitive single-molecule fluorescence techniques. At the same time, methods that control the conformation of
ProBiS tools (algorithm, database, and web servers) for predicting and modeling of biologically interesting proteins.

Science.gov (United States)

Konc, Janez; Janežič, Dušanka

2017-09-01

ProBiS (Protein Binding Sites) Tools consist of algorithm, database, and web servers for prediction of binding sites and protein ligands based on the detection of structurally similar binding sites in the Protein Data Bank. In this article, we review the operations that ProBiS Tools perform, provide comments on the evolution of the tools, and give some implementation details. We review some of its applications to biologically interesting proteins. ProBiS Tools are freely available at http://probis.cmm.ki.si and http://probis.nih.gov. Copyright © 2017 Elsevier Ltd. All rights reserved.
ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

Science.gov (United States)

Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

2002-12-19

Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
ProtaBank: A repository for protein design and engineering data.

Science.gov (United States)

Wang, Connie Y; Chang, Paul M; Ary, Marie L; Allen, Benjamin D; Chica, Roberto A; Mayo, Stephen L; Olafson, Barry D

2018-03-25

We present ProtaBank, a repository for storing, querying, analyzing, and sharing protein design and engineering data in an actively maintained and updated database. ProtaBank provides a format to describe and compare all types of protein mutational data, spanning a wide range of properties and techniques. It features a user-friendly web interface and programming layer that streamlines data deposition and allows for batch input and queries. The database schema design incorporates a standard format for reporting protein sequences and experimental data that facilitates comparison of results across different data sets. A suite of analysis and visualization tools are provided to facilitate discovery, to guide future designs, and to benchmark and train new predictive tools and algorithms. ProtaBank will provide a valuable resource to the protein engineering community by storing and safeguarding newly generated data, allowing for fast searching and identification of relevant data from the existing literature, and exploring correlations between disparate data sets. ProtaBank invites researchers to contribute data to the database to make it accessible for search and analysis. ProtaBank is available at https://protabank.org. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Improving decoy databases for protein folding algorithms

KAUST Repository

Lindsey, Aaron; Yeh, Hsin-Yi (Cindy); Wu, Chih-Peng; Thomas, Shawna; Amato, Nancy M.

2014-01-01

energetically stable) from non-native structures. Decoy databases are collections of non-native structures used to test and verify these functions. We present a method to evaluate and improve the quality of decoy databases by adding novel structures and removing
CLMSVault: A Software Suite for Protein Cross-Linking Mass-Spectrometry Data Analysis and Visualization.

Science.gov (United States)

Courcelles, Mathieu; Coulombe-Huntington, Jasmin; Cossette, Émilie; Gingras, Anne-Claude; Thibault, Pierre; Tyers, Mike

2017-07-07

Protein cross-linking mass spectrometry (CL-MS) enables the sensitive detection of protein interactions and the inference of protein complex topology. The detection of chemical cross-links between protein residues can identify intra- and interprotein contact sites or provide physical constraints for molecular modeling of protein structure. Recent innovations in cross-linker design, sample preparation, mass spectrometry, and software tools have significantly improved CL-MS approaches. Although a number of algorithms now exist for the identification of cross-linked peptides from mass spectral data, a dearth of user-friendly analysis tools represent a practical bottleneck to the broad adoption of the approach. To facilitate the analysis of CL-MS data, we developed CLMSVault, a software suite designed to leverage existing CL-MS algorithms and provide intuitive and flexible tools for cross-platform data interpretation. CLMSVault stores and combines complementary information obtained from different cross-linkers and search algorithms. CLMSVault provides filtering, comparison, and visualization tools to support CL-MS analyses and includes a workflow for label-free quantification of cross-linked peptides. An embedded 3D viewer enables the visualization of quantitative data and the mapping of cross-linked sites onto PDB structural models. We demonstrate the application of CLMSVault for the analysis of a noncovalent Cdc34-ubiquitin protein complex cross-linked under different conditions. CLMSVault is open-source software (available at https://gitlab.com/courcelm/clmsvault.git ), and a live demo is available at http://democlmsvault.tyerslab.com/ .
Yeast Interacting Proteins Database: YGL145W, YNL258C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available ripheral membrane protein required for Golgi-to-ER retrograde traffic; component ... membrane protein required for Golgi-to-ER retrograde traffic; component of the ER target site that interact
Yeast Interacting Proteins Database: YNL152W, YMR032W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YNL152W INN1 Essential protein that associates with the contractile actomyosin ring... Bait description Essential protein that associates with the contractile actomyosin ring, required for ingre
The design of distributed database system for HIRFL

International Nuclear Information System (INIS)

Wang Hong; Huang Xinmin

2004-01-01

This paper is focused on a kind of distributed database system used in HIRFL distributed control system. The database of this distributed database system is established by SQL Server 2000, and its application system adopts the Client/Server model. Visual C ++ is used to develop the applications, and the application uses ODBC to access the database. (authors)
The evolution of a Web resource: The Galactosemia Proteins Database 2.0.

Science.gov (United States)

d'Acierno, Antonio; Scafuri, Bernardina; Facchiano, Angelo; Marabotti, Anna

2018-01-01

Galactosemia Proteins Database 2.0 is a Web-accessible resource collecting information about the structural and functional effects of the known variations associated to the three different enzymes of the Leloir pathway encoded by the genes GALT, GALE, and GALK1 and involved in the different forms of the genetic disease globally called "galactosemia." It represents an evolution of two available online resources we previously developed, with new data deriving from new structures, new analysis tools, and new interfaces and filters in order to improve the quality and quantity of information available for different categories of users. We propose this new resource both as a landmark for the entire world community of galactosemia and as a model for the development of similar tools for other proteins object of variations and involved in human diseases. © 2017 Wiley Periodicals, Inc.
Yeast Interacting Proteins Database: YPR103W, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available tein involved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors...gulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf
Yeast Interacting Proteins Database: YCL046W, YGL115W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YCL046W - Dubious open reading frame unlikely to encode a protein, based on availab...ading frame unlikely to encode a protein, based on available experimental and comparative sequence data; par
Yeast Interacting Proteins Database: YNL258C, YGL145W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YNL258C DSL1 Peripheral membrane protein required for Golgi-to-ER retrograde traffi...t description Peripheral membrane protein required for Golgi-to-ER retrograde traffic; component of the ER t
Yeast Interacting Proteins Database: YOL006C, YMR233W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available fusion protein localizes to the cytoplasm, nucleus and nucleolus Rows with this prey as prey (1) Rows with t...on protein localizes to the cytoplasm, nucleus and nucleolus Rows with this prey
Yeast Interacting Proteins Database: YJR091C, YKL002W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available g of integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly sy... integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthe

Extended functions of the database machine FREND for interactive systems

International Nuclear Information System (INIS)

Hikita, S.; Kawakami, S.; Sano, K.

1984-01-01

Well-designed visual interfaces encourage non-expert users to use relational database systems. In those systems such as office automation systems or engineering database systems, non-expert users interactively access to database from visual terminals. Some users may want to occupy database or other users may share database according to various situations. Because, those jobs need a lot of time to be completed, concurrency control must be well designed to enhance the concurrency. The extended method of concurrency control of FREND is presented in this paper. The authors assume that systems are composed of workstations, a local area network and the database machine FREND. This paper also stresses that those workstations and FREND must cooperate to complete concurrency control for interactive applications
Yeast Interacting Proteins Database: YNL216W, YLR453C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YNL216W RAP1 DNA-binding protein involved in either activation or repression of transcription, depending...NA-binding protein involved in either activation or repression of transcription, depending on binding site c
Visualization and Dissemination of Multidimensional Proteomics Data Comparing Protein Abundance During Caenorhabditis elegans Development

Science.gov (United States)

Riffle, Michael; Merrihew, Gennifer E.; Jaschob, Daniel; Sharma, Vagisha; Davis, Trisha N.; Noble, William S.; MacCoss, Michael J.

2015-11-01

Regulation of protein abundance is a critical aspect of cellular function, organism development, and aging. Alternative splicing may give rise to multiple possible proteoforms of gene products where the abundance of each proteoform is independently regulated. Understanding how the abundances of these distinct gene products change is essential to understanding the underlying mechanisms of many biological processes. Bottom-up proteomics mass spectrometry techniques may be used to estimate protein abundance indirectly by sequencing and quantifying peptides that are later mapped to proteins based on sequence. However, quantifying the abundance of distinct gene products is routinely confounded by peptides that map to multiple possible proteoforms. In this work, we describe a technique that may be used to help mitigate the effects of confounding ambiguous peptides and multiple proteoforms when quantifying proteins. We have applied this technique to visualize the distribution of distinct gene products for the whole proteome across 11 developmental stages of the model organism Caenorhabditis elegans. The result is a large multidimensional dataset for which web-based tools were developed for visualizing how translated gene products change during development and identifying possible proteoforms. The underlying instrument raw files and tandem mass spectra may also be downloaded. The data resource is freely available on the web at http://www.yeastrc.org/wormpes/.
HCSD: the human cancer secretome database

DEFF Research Database (Denmark)

Feizi, Amir; Banaei-Esfahani, Amir; Nielsen, Jens

2015-01-01

The human cancer secretome database (HCSD) is a comprehensive database for human cancer secretome data. The cancer secretome describes proteins secreted by cancer cells and structuring information about the cancer secretome will enable further analysis of how this is related with tumor biology...... database is limiting the ability to query the increasing community knowledge. We therefore developed the Human Cancer Secretome Database (HCSD) to fulfil this gap. HCSD contains >80 000 measurements for about 7000 nonredundant human proteins collected from up to 35 high-throughput studies on 17 cancer...
The MAR databases: development and implementation of databases specific for marine metagenomics.

Science.gov (United States)

Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen; Willassen, Nils P

2018-01-04

We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yeast Interacting Proteins Database: YMR280C, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available olved in control of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensor... glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt2p, an
Visualizing data mining results with the Brede tools

DEFF Research Database (Denmark)

Nielsen, Finn Årup

2009-01-01

has expanded and now includes its own database with coordinates along with ontologies for brain regions and functions: The Brede Database. With Brede Toolbox and Database combined we setup automated workflows for extraction of data, mass meta-analytic data mining and visualizations. Most of the Web......A few neuroinformatics databases now exist that record results from neuroimaging studies in the form of brain coordinates in stereotaxic space. The Brede Toolbox was originally developed to extract, analyze and visualize data from one of them --- the BrainMap database. Since then the Brede Toolbox...
Yeast Interacting Proteins Database: YOR302W, YOR047C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available rol of glucose-regulated gene expression; interacts with protein kinase Snf1p, glucose sensors Snf3p and Rgt...tein kinase Snf1p, glucose sensors Snf3p and Rgt2p, and TATA-binding protein Spt1
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.

Science.gov (United States)

Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki

2010-08-25

This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
Composition of Overlapping Protein-Protein and Protein-Ligand Interfaces.

Directory of Open Access Journals (Sweden)

Ruzianisra Mohamed

Full Text Available Protein-protein interactions (PPIs play a major role in many biological processes and they represent an important class of targets for therapeutic intervention. However, targeting PPIs is challenging because often no convenient natural substrates are available as starting point for small-molecule design. Here, we explored the characteristics of protein interfaces in five non-redundant datasets of 174 protein-protein (PP complexes, and 161 protein-ligand (PL complexes from the ABC database, 436 PP complexes, and 196 PL complexes from the PIBASE database and a dataset of 89 PL complexes from the Timbal database. In all cases, the small molecule ligands must bind at the respective PP interface. We observed similar amino acid frequencies in all three datasets. Remarkably, also the characteristics of PP contacts and overlapping PL contacts are highly similar.
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

Directory of Open Access Journals (Sweden)

Yu Liu

2015-01-01

Full Text Available The Smith-Waterman (SW algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

Science.gov (United States)

Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

2015-01-01

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Yeast Interacting Proteins Database: YDL239C, YDR273W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...it as prey (1) YDR273W DON1 Meiosis-specific component of the spindle pole body, part of the leading... edge protein (LEP) coat, forms a ring-like structure at the leading edge of the prospore...ption Protein required for spore wall formation, thought to mediate assembly of a Don1p-containing structure at the leading...description Meiosis-specific component of the spindle pole body, part of the leading edge protein (LEP) coat
The Mitochondrial Protein Atlas: A Database of Experimentally Verified Information on the Human Mitochondrial Proteome.

Science.gov (United States)

Godin, Noa; Eichler, Jerry

2017-09-01

Given its central role in various biological systems, as well as its involvement in numerous pathologies, the mitochondrion is one of the best-studied organelles. However, although the mitochondrial genome has been extensively investigated, protein-level information remains partial, and in many cases, hypothetical. The Mitochondrial Protein Atlas (MPA; URL: lifeserv.bgu.ac.il/wb/jeichler/MPA ) is a database that provides a complete, manually curated inventory of only experimentally validated human mitochondrial proteins. The MPA presently contains 911 unique protein entries, each of which is associated with at least one experimentally validated and referenced mitochondrial localization. The MPA also contains experimentally validated and referenced information defining function, structure, involvement in pathologies, interactions with other MPA proteins, as well as the method(s) of analysis used in each instance. Connections to relevant external data sources are offered for each entry, including links to NCBI Gene, PubMed, and Protein Data Bank. The MPA offers a prototype for other information sources that allow for a distinction between what has been confirmed and what remains to be verified experimentally.
Classic and Golli Myelin Basic Protein have distinct developmental trajectories in human visual cortex.

Science.gov (United States)

Siu, Caitlin R; Balsor, Justin L; Jones, David G; Murphy, Kathryn M

2015-01-01

Traditionally, myelin is viewed as insulation around axons, however, more recent studies have shown it also plays an important role in plasticity, axonal metabolism, and neuroimmune signaling. Myelin is a complex multi-protein structure composed of hundreds of proteins, with Myelin Basic Protein (MBP) being the most studied. MBP has two families: Classic-MBP that is necessary for activity driven compaction of myelin around axons, and Golli-MBP that is found in neurons, oligodendrocytes, and T-cells. Furthermore, Golli-MBP has been called a "molecular link" between the nervous and immune systems. In visual cortex specifically, myelin proteins interact with immune processes to affect experience-dependent plasticity. We studied myelin in human visual cortex using Western blotting to quantify Classic- and Golli-MBP expression in post-mortem tissue samples ranging in age from 20 days to 80 years. We found that Classic- and Golli-MBP have different patterns of change across the lifespan. Classic-MBP gradually increases to 42 years and then declines into aging. Golli-MBP has early developmental changes that are coincident with milestones in visual system sensitive period, and gradually increases into aging. There are three stages in the balance between Classic- and Golli-MBP expression, with Golli-MBP dominating early, then shifting to Classic-MBP, and back to Golli-MBP in aging. Also Golli-MBP has a wave of high inter-individual variability during childhood. These results about cortical MBP expression are timely because they compliment recent advances in MRI techniques that produce high resolution maps of cortical myelin in normal and diseased brain. In addition, the unique pattern of Golli-MBP expression across the lifespan suggests that it supports high levels of neuroimmune interaction in cortical development and in aging.
TcoF-DB: dragon database for human transcription co-factors and transcription factor interacting proteins

KAUST Repository

Schaefer, Ulf; Schmeier, Sebastian; Bajic, Vladimir B.

2010-01-01

The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF-DNA binding, protein-protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/. © The Author(s) 2010.
TcoF-DB: dragon database for human transcription co-factors and transcription factor interacting proteins

KAUST Repository

Schaefer, Ulf

2010-10-21

The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF-DNA binding, protein-protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/. © The Author(s) 2010.
ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases.

Science.gov (United States)

Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric

2014-04-15

Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Consistent two-dimensional visualization of protein-ligand complex series

Directory of Open Access Journals (Sweden)

Stierand Katrin

2011-06-01

Full Text Available Abstract Background The comparative two-dimensional graphical representation of protein-ligand complex series featuring different ligands bound to the same active site offers a quick insight in their binding mode differences. In comparison to arbitrary orientations of the residue molecules in the individual complex depictions a consistent placement improves the legibility and comparability within the series. The automatic generation of such consistent layouts offers the possibility to apply it to large data sets originating from computer-aided drug design methods. Results We developed a new approach, which automatically generates a consistent layout of interacting residues for a given series of complexes. Based on the structural three-dimensional input information, a global two-dimensional layout for all residues of the complex ensemble is computed. The algorithm incorporates the three-dimensional adjacencies of the active site residues in order to find an universally valid circular arrangement of the residues around the ligand. Subsequent to a two-dimensional ligand superimposition step, a global placement for each residue is derived from the set of already placed ligands. The method generates high-quality layouts, showing mostly overlap-free solutions with molecules which are displayed as structure diagrams providing interaction information in atomic detail. Application examples document an improved legibility compared to series of diagrams whose layouts are calculated independently from each other. Conclusions The presented method extends the field of complex series visualizations. A series of molecules binding to the same protein active site is drawn in a graphically consistent way. Compared to existing approaches these drawings substantially simplify the visual analysis of large compound series.
Visual feature extraction and establishment of visual tags in the intelligent visual internet of things

Science.gov (United States)

Zhao, Yiqun; Wang, Zhihui

2015-12-01

The Internet of things (IOT) is a kind of intelligent networks which can be used to locate, track, identify and supervise people and objects. One of important core technologies of intelligent visual internet of things ( IVIOT) is the intelligent visual tag system. In this paper, a research is done into visual feature extraction and establishment of visual tags of the human face based on ORL face database. Firstly, we use the principal component analysis (PCA) algorithm for face feature extraction, then adopt the support vector machine (SVM) for classifying and face recognition, finally establish a visual tag for face which is already classified. We conducted a experiment focused on a group of people face images, the result show that the proposed algorithm have good performance, and can show the visual tag of objects conveniently.

A method for fast energy estimation and visualization of protein-ligand interaction

Science.gov (United States)

Tomioka, Nobuo; Itai, Akiko; Iitaka, Yoichi

1987-10-01

A new computational and graphical method for facilitating ligand-protein docking studies is developed on a three-dimensional computer graphics display. Various physical and chemical properties inside the ligand binding pocket of a receptor protein, whose structure is elucidated by X-ray crystal analysis, are calculated on three-dimensional grid points and are stored in advance. By utilizing those tabulated data, it is possible to estimate the non-bonded and electrostatic interaction energy and the number of possible hydrogen bonds between protein and ligand molecules in real time during an interactive docking operation. The method also provides a comprehensive visualization of the local environment inside the binding pocket. With this method, it becomes easier to find a roughly stable geometry of ligand molecules, and one can therefore make a rapid survey of the binding capability of many drug candidates. The method will be useful for drug design as well as for the examination of protein-ligand interactions.
Yeast Interacting Proteins Database: YNL258C, YKR022C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YNL258C DSL1 Peripheral membrane protein required for Golgi-to-ER retrograde traffi...equired for Golgi-to-ER retrograde traffic; component of the ER target site that interacts with coatomer, th...it ORF YNL258C Bait gene name DSL1 Bait description Peripheral membrane protein r
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

Science.gov (United States)

Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

2018-04-06

Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.
Yucca Mountain digital database

International Nuclear Information System (INIS)

Daudt, C.R.; Hinze, W.J.

1992-01-01

This paper discusses the Yucca Mountain Digital Database (DDB) which is a digital, PC-based geographical database of geoscience-related characteristics of the proposed high-level waste (HLW) repository site of Yucca Mountain, Nevada. It was created to provide the US Nuclear Regulatory Commission's (NRC) Advisory Committee on Nuclear Waste (ACNW) and its staff with a visual perspective of geological, geophysical, and hydrological features at the Yucca Mountain site as discussed in the Department of Energy's (DOE) pre-licensing reports
Yeast Interacting Proteins Database: YKL002W, YFL034C-B [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthes...ntegral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthesi
DataSpread: Unifying Databases and Spreadsheets.

Science.gov (United States)

Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya

2015-08-01

Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current "pane" (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases.
Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases.

Science.gov (United States)

Lysenko, Artem; Lysenko, Atem; Hindle, Matthew Morritt; Taubert, Jan; Saqi, Mansoor; Rawlings, Christopher John

2009-11-01

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.
SoyDB: a knowledge database of soybean transcription factors

Directory of Open Access Journals (Sweden)

Valliyodan Babu

2010-01-01

Full Text Available Abstract Background Transcription factors play the crucial rule of regulating gene expression and influence almost all biological processes. Systematically identifying and annotating transcription factors can greatly aid further understanding their functions and mechanisms. In this article, we present SoyDB, a user friendly database containing comprehensive knowledge of soybean transcription factors. Description The soybean genome was recently sequenced by the Department of Energy-Joint Genome Institute (DOE-JGI and is publicly available. Mining of this sequence identified 5,671 soybean genes as putative transcription factors. These genes were comprehensively annotated as an aid to the soybean research community. We developed SoyDB - a knowledge database for all the transcription factors in the soybean genome. The database contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank (PDB, protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database can be accessed via an interactive and convenient web server, which supports full-text search, PSI-BLAST sequence search, database browsing by protein family, and automatic classification of a new protein sequence into one of 64 annotated transcription factor families by hidden Markov models. Conclusions A comprehensive soybean transcription factor database was constructed and made publicly accessible at http://casp.rnet.missouri.edu/soydb/.
ContaMiner and ContaBase: a webserver and database for early identification of unwantedly crystallized protein contaminants

KAUST Repository

Hungler, Arnaud; Momin, Afaque Ahmad Imtiyaz; Diederichs, Kay; Arold, Stefan T.

2016-01-01

Solving the phase problem in protein X-ray crystallography relies heavily on the identity of the crystallized protein, especially when molecular replacement (MR) methods are used. Yet, it is not uncommon that a contaminant crystallizes instead of the protein of interest. Such contaminants may be proteins from the expression host organism, protein fusion tags or proteins added during the purification steps. Many contaminants co-purify easily, crystallize and give good diffraction data. Identification of contaminant crystals may take time, since the presence of the contaminant is unexpected and its identity unknown. A webserver (ContaMiner) and a contaminant database (ContaBase) have been established, to allow fast MR-based screening of crystallographic data against currently 62 known contaminants. The web-based ContaMiner (available at http://strube.cbrc.kaust.edu.sa/contaminer/) currently produces results in 5 min to 4 h. The program is also available in a github repository and can be installed locally. ContaMiner enables screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for 'crystallization and preliminary X-ray analysis' publications. Thus, in addition to potentially saving X-ray crystallographers much time and effort, ContaMiner might considerably lower the risk of publishing erroneous data. A web server, titled ContaMiner, has been established, which allows fast molecular-replacement-based screening of crystallographic data against a database (ContaBase) of currently 62 potential contaminants. ContaMiner enables systematic screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for 'crystallization and preliminary X-ray analysis' publications. © Arnaud Hungler et al. 2016.
ContaMiner and ContaBase: a webserver and database for early identification of unwantedly crystallized protein contaminants

KAUST Repository

Hungler, Arnaud

2016-11-02

Solving the phase problem in protein X-ray crystallography relies heavily on the identity of the crystallized protein, especially when molecular replacement (MR) methods are used. Yet, it is not uncommon that a contaminant crystallizes instead of the protein of interest. Such contaminants may be proteins from the expression host organism, protein fusion tags or proteins added during the purification steps. Many contaminants co-purify easily, crystallize and give good diffraction data. Identification of contaminant crystals may take time, since the presence of the contaminant is unexpected and its identity unknown. A webserver (ContaMiner) and a contaminant database (ContaBase) have been established, to allow fast MR-based screening of crystallographic data against currently 62 known contaminants. The web-based ContaMiner (available at http://strube.cbrc.kaust.edu.sa/contaminer/) currently produces results in 5 min to 4 h. The program is also available in a github repository and can be installed locally. ContaMiner enables screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for \\'crystallization and preliminary X-ray analysis\\' publications. Thus, in addition to potentially saving X-ray crystallographers much time and effort, ContaMiner might considerably lower the risk of publishing erroneous data. A web server, titled ContaMiner, has been established, which allows fast molecular-replacement-based screening of crystallographic data against a database (ContaBase) of currently 62 potential contaminants. ContaMiner enables systematic screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for \\'crystallization and preliminary X-ray analysis\\' publications. © Arnaud Hungler et al. 2016.
Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies

Science.gov (United States)

Risch, John S [Kennewick, WA; Dowson, Scott T [West Richland, WA; Hart, Michelle L [Richland, WA; Hatley, Wes L [Kennewick, WA

2008-05-13

A method of displaying correlations among information objects comprises receiving a query against a database; obtaining a query result set; and generating a visualization representing the components of the result set, the visualization including one of a plane and line to represent a data field, nodes representing data values, and links showing correlations among fields and values. Other visualization methods and apparatus are disclosed.
Olfactory Receptor Database: a sensory chemoreceptor resource

OpenAIRE

Skoufos, Emmanouil; Marenco, Luis; Nadkarni, Prakash M.; Miller, Perry L.; Shepherd, Gordon M.

2000-01-01

The Olfactory Receptor Database (ORDB) is a WWW-accessible database that has been expanded from an olfactory receptor resource to a chemoreceptor resource. It stores data on six classes of G-protein-coupled sensory chemoreceptors: (i) olfactory receptor-like proteins, (ii) vomeronasal receptors, (iii) insect olfactory receptors, (iv) worm chemoreceptors, (v) taste papilla receptors and (vi) fungal pheromone receptors. A complementary database of the ligands of these receptors (OdorDB) has bee...
SwissPalm: Protein Palmitoylation database.

Science.gov (United States)

Blanc, Mathieu; David, Fabrice; Abrami, Laurence; Migliozzi, Daniel; Armand, Florence; Bürgi, Jérôme; van der Goot, Françoise Gisou

2015-01-01

Protein S-palmitoylation is a reversible post-translational modification that regulates many key biological processes, although the full extent and functions of protein S-palmitoylation remain largely unexplored. Recent developments of new chemical methods have allowed the establishment of palmitoyl-proteomes of a variety of cell lines and tissues from different species. As the amount of information generated by these high-throughput studies is increasing, the field requires centralization and comparison of this information. Here we present SwissPalm ( http://swisspalm.epfl.ch), our open, comprehensive, manually curated resource to study protein S-palmitoylation. It currently encompasses more than 5000 S-palmitoylated protein hits from seven species, and contains more than 500 specific sites of S-palmitoylation. SwissPalm also provides curated information and filters that increase the confidence in true positive hits, and integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments. Systems analysis of the palmitoyl-proteome screens indicate that 10% or more of the human proteome is susceptible to S-palmitoylation. Moreover, ontology and pathway analyses of the human palmitoyl-proteome reveal that key biological functions involve this reversible lipid modification. Comparative analysis finally shows a strong crosstalk between S-palmitoylation and other post-translational modifications. Through the compilation of data and continuous updates, SwissPalm will provide a powerful tool to unravel the global importance of protein S-palmitoylation.
O-GLYCBASE: a revised database of O-glycosylated proteins

DEFF Research Database (Denmark)

Hansen, Jan; Lund, Ole; Nielsen, Jens O.

1996-01-01

O-GLYCBASE is a comprehensive database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the SWISS-PROT and PIR databases as well as directly from recently published reports. Nineteen percent of the entries extracted from the databases n...... of mucin type O-glycosylation sites in mammalian glycoproteins exclusively from the primary sequence is made available by E-mail or WWW. The O-GLYCBASE database is also available electronically through our WWW server or by anonymous FTP....
Exploring the Ligand-Protein Networks in Traditional Chinese Medicine: Current Databases, Methods, and Applications

Directory of Open Access Journals (Sweden)

Mingzhu Zhao

2013-01-01

Full Text Available The traditional Chinese medicine (TCM, which has thousands of years of clinical application among China and other Asian countries, is the pioneer of the “multicomponent-multitarget” and network pharmacology. Although there is no doubt of the efficacy, it is difficult to elucidate convincing underlying mechanism of TCM due to its complex composition and unclear pharmacology. The use of ligand-protein networks has been gaining significant value in the history of drug discovery while its application in TCM is still in its early stage. This paper firstly surveys TCM databases for virtual screening that have been greatly expanded in size and data diversity in recent years. On that basis, different screening methods and strategies for identifying active ingredients and targets of TCM are outlined based on the amount of network information available, both on sides of ligand bioactivity and the protein structures. Furthermore, applications of successful in silico target identification attempts are discussed in detail along with experiments in exploring the ligand-protein networks of TCM. Finally, it will be concluded that the prospective application of ligand-protein networks can be used not only to predict protein targets of a small molecule, but also to explore the mode of action of TCM.
'The surface management system' (SuMS) database: a surface-based database to aid cortical surface reconstruction, visualization and analysis

Science.gov (United States)

Dickson, J.; Drury, H.; Van Essen, D. C.

2001-01-01

Surface reconstructions of the cerebral cortex are increasingly widely used in the analysis and visualization of cortical structure, function and connectivity. From a neuroinformatics perspective, dealing with surface-related data poses a number of challenges. These include the multiplicity of configurations in which surfaces are routinely viewed (e.g. inflated maps, spheres and flat maps), plus the diversity of experimental data that can be represented on any given surface. To address these challenges, we have developed a surface management system (SuMS) that allows automated storage and retrieval of complex surface-related datasets. SuMS provides a systematic framework for the classification, storage and retrieval of many types of surface-related data and associated volume data. Within this classification framework, it serves as a version-control system capable of handling large numbers of surface and volume datasets. With built-in database management system support, SuMS provides rapid search and retrieval capabilities across all the datasets, while also incorporating multiple security levels to regulate access. SuMS is implemented in Java and can be accessed via a Web interface (WebSuMS) or using downloaded client software. Thus, SuMS is well positioned to act as a multiplatform, multi-user 'surface request broker' for the neuroscience community.
Development of operation management database for research reactors

International Nuclear Information System (INIS)

Zhang Xinjun; Chen Wei; Yang Jun

2005-01-01

An Operation Database for Pulsed Reactor has been developed on the platform for Microsoft visual C++ 6.0. This database includes four function modules, fuel elements management, incident management, experiment management and file management. It is essential for reactor security and information management. (authors)
BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures

Energy Technology Data Exchange (ETDEWEB)

Doreleijers, Jurgen F. [University of Wisconsin-Madison, BioMagResBank, Department of Biochemistry (United States); Nederveen, Aart J. [Utrecht University, Bijvoet Center for Biomolecular Research (Netherlands); Vranken, Wim [European Bioinformatics Institute, Macromolecular Structure Database group (United Kingdom); Lin Jundong [University of Wisconsin-Madison, BioMagResBank, Department of Biochemistry (United States); Bonvin, Alexandre M.J.J.; Kaptein, Robert [Utrecht University, Bijvoet Center for Biomolecular Research (Netherlands); Markley, John L.; Ulrich, Eldon L. [University of Wisconsin-Madison, BioMagResBank, Department of Biochemistry (United States)], E-mail: elu@bmrb.wisc.edu

2005-05-15

We present two new databases of NMR-derived distance and dihedral angle restraints: the Database Of Converted Restraints (DOCR) and the Filtered Restraints Database (FRED). These databases currently correspond to 545 proteins with NMR structures deposited in the Protein Databank (PDB). The criteria for inclusion were that these should be unique, monomeric proteins with author-provided experimental NMR data and coordinates available from the PDB capable of being parsed and prepared in a consistent manner. The Wattos program was used to parse the files, and the CcpNmr FormatConverter program was used to prepare them semi-automatically. New modules, including a new implementation of Aqua in the BioMagResBank (BMRB) software Wattos were used to analyze the sets of distance restraints (DRs) for inconsistencies, redundancies, NOE completeness, classification and violations with respect to the original coordinates. Restraints that could not be associated with a known nomenclature were flagged. The coordinates of hydrogen atoms were recalculated from the positions of heavy atoms to allow for a full restraint analysis. The DOCR database contains restraint and coordinate data that is made consistent with each other and with IUPAC conventions. The FRED database is based on the DOCR data but is filtered for use by test calculation protocols and longitudinal analyses and validations. These two databases are available from websites of the BMRB and the Macromolecular Structure Database (MSD) in various formats: NMR-STAR, CCPN XML, and in formats suitable for direct use in the software packages CNS and CYANA.
BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures

International Nuclear Information System (INIS)

Doreleijers, Jurgen F.; Nederveen, Aart J.; Vranken, Wim; Lin Jundong; Bonvin, Alexandre M.J.J.; Kaptein, Robert; Markley, John L.; Ulrich, Eldon L.

2005-01-01

We present two new databases of NMR-derived distance and dihedral angle restraints: the Database Of Converted Restraints (DOCR) and the Filtered Restraints Database (FRED). These databases currently correspond to 545 proteins with NMR structures deposited in the Protein Databank (PDB). The criteria for inclusion were that these should be unique, monomeric proteins with author-provided experimental NMR data and coordinates available from the PDB capable of being parsed and prepared in a consistent manner. The Wattos program was used to parse the files, and the CcpNmr FormatConverter program was used to prepare them semi-automatically. New modules, including a new implementation of Aqua in the BioMagResBank (BMRB) software Wattos were used to analyze the sets of distance restraints (DRs) for inconsistencies, redundancies, NOE completeness, classification and violations with respect to the original coordinates. Restraints that could not be associated with a known nomenclature were flagged. The coordinates of hydrogen atoms were recalculated from the positions of heavy atoms to allow for a full restraint analysis. The DOCR database contains restraint and coordinate data that is made consistent with each other and with IUPAC conventions. The FRED database is based on the DOCR data but is filtered for use by test calculation protocols and longitudinal analyses and validations. These two databases are available from websites of the BMRB and the Macromolecular Structure Database (MSD) in various formats: NMR-STAR, CCPN XML, and in formats suitable for direct use in the software packages CNS and CYANA
A collaborative visual analytics suite for protein folding research.

Science.gov (United States)

Harvey, William; Park, In-Hee; Rübel, Oliver; Pascucci, Valerio; Bremer, Peer-Timo; Li, Chenglong; Wang, Yusu

2014-09-01

Molecular dynamics (MD) simulation is a crucial tool for understanding principles behind important biochemical processes such as protein folding and molecular interaction. With the rapidly increasing power of modern computers, large-scale MD simulation experiments can be performed regularly, generating huge amounts of MD data. An important question is how to analyze and interpret such massive and complex data. One of the (many) challenges involved in analyzing MD simulation data computationally is the high-dimensionality of such data. Given a massive collection of molecular conformations, researchers typically need to rely on their expertise and prior domain knowledge in order to retrieve certain conformations of interest. It is not easy to make and test hypotheses as the data set as a whole is somewhat "invisible" due to its high dimensionality. In other words, it is hard to directly access and examine individual conformations from a sea of molecular structures, and to further explore the entire data set. There is also no easy and convenient way to obtain a global view of the data or its various modalities of biochemical information. To this end, we present an interactive, collaborative visual analytics tool for exploring massive, high-dimensional molecular dynamics simulation data sets. The most important utility of our tool is to provide a platform where researchers can easily and effectively navigate through the otherwise "invisible" simulation data sets, exploring and examining molecular conformations both as a whole and at individual levels. The visualization is based on the concept of a topological landscape, which is a 2D terrain metaphor preserving certain topological and geometric properties of the high dimensional protein energy landscape. In addition to facilitating easy exploration of conformations, this 2D terrain metaphor also provides a platform where researchers can visualize and analyze various properties (such as contact density) overlayed on the

Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data.

Science.gov (United States)

Kehl, Catherine; Simms, Andrew M; Toofanny, Rudesh D; Daggett, Valerie

2008-06-01

The Dynameomics project is our effort to characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations, as described by Beck et al. (in Protein Eng. Des. Select., the first paper in this series). The data produced by these simulations are highly multidimensional in structure and multi-terabytes in size. Both of these features present significant challenges for storage, retrieval and analysis. For optimal data modeling and flexibility, we needed a platform that supported both multidimensional indices and hierarchical relationships between related types of data and that could be integrated within our data warehouse, as described in the accompanying paper directly preceding this one. For these reasons, we have chosen On-line Analytical Processing (OLAP), a multi-dimensional analysis optimized database, as an analytical platform for these data. OLAP is a mature technology in the financial sector, but it has not been used extensively for scientific analysis. Our project is further more unusual for its focus on the multidimensional and analytical capabilities of OLAP rather than its aggregation capacities. The dimensional data model and hierarchies are very flexible. The query language is concise for complex analysis and rapid data retrieval. OLAP shows great promise for the dynamic protein analysis for bioengineering and biomedical applications. In addition, OLAP may have similar potential for other scientific and engineering applications involving large and complex datasets.
EzMol: A Web Server Wizard for the Rapid Visualization and Image Production of Protein and Nucleic Acid Structures.

Science.gov (United States)

Reynolds, Christopher R; Islam, Suhail A; Sternberg, Michael J E

2018-01-31

EzMol is a molecular visualization Web server in the form of a software wizard, located at http://www.sbg.bio.ic.ac.uk/ezmol/. It is designed for easy and rapid image manipulation and display of protein molecules, and is intended for users who need to quickly produce high-resolution images of protein molecules but do not have the time or inclination to use a software molecular visualization system. EzMol allows the upload of molecular structure files in PDB format to generate a Web page including a representation of the structure that the user can manipulate. EzMol provides intuitive options for chain display, adjusting the color/transparency of residues, side chains and protein surfaces, and for adding labels to residues. The final adjusted protein image can then be downloaded as a high-resolution image. There are a range of applications for rapid protein display, including the illustration of specific areas of a protein structure and the rapid prototyping of images. Copyright © 2018. Published by Elsevier Ltd.
Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available List Contact us Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism spe...cies Data detail Data name Cluster based on sequence comparison of homologous proteins of 95 organism specie...istory of This Database Site Policy | Contact Us Cluster based on sequence compariso
Imaging the lipidome: omega-alkynyl fatty acids for detection and cellular visualization of lipid-modified proteins.

Science.gov (United States)

Hannoush, Rami N; Arenas-Ramirez, Natalia

2009-07-17

Fatty acylation or lipid modification of proteins controls their cellular activation and diverse roles in physiology. It mediates protein-protein and protein-membrane interactions and plays an important role in regulating cellular signaling pathways. Currently, there is need for visualizing lipid modifications of proteins in cells. Herein we report novel chemical probes based on omega-alkynyl fatty acids for biochemical detection and cellular imaging of lipid-modified proteins. Our study shows that omega-alkynyl fatty acids of varying chain length are metabolically incorporated onto cellular proteins. Using fluorescence imaging, we describe the subcellular distribution of lipid-modified proteins across a panel of different mammalian cell lines and during cell division. Our results demonstrate that this methodology is a useful diagnostic tool for analyzing the lipid content of cellular proteins and for studying the dynamic behavior of lipid-modified proteins in various disease or physiological states.
Effects of Fluoxetine and Visual Experience on Glutamatergic and GABAergic Synaptic Proteins in Adult Rat Visual Cortex123

Science.gov (United States)

Beshara, Simon; Beston, Brett R.; Pinto, Joshua G. A.

2015-01-01

Abstract Fluoxetine has emerged as a novel treatment for persistent amblyopia because in adult animals it reinstates critical period-like ocular dominance plasticity and promotes recovery of visual acuity. Translation of these results from animal models to the clinic, however, has been challenging because of the lack of understanding of how this selective serotonin reuptake inhibitor affects glutamatergic and GABAergic synaptic mechanisms that are essential for experience-dependent plasticity. An appealing hypothesis is that fluoxetine recreates a critical period (CP)-like state by shifting synaptic mechanisms to be more juvenile. To test this we studied the effect of fluoxetine treatment in adult rats, alone or in combination with visual deprivation [monocular deprivation (MD)], on a set of highly conserved presynaptic and postsynaptic proteins (synapsin, synaptophysin, VGLUT1, VGAT, PSD-95, gephyrin, GluN1, GluA2, GluN2B, GluN2A, GABAAα1, GABAAα3). We did not find evidence that fluoxetine shifted the protein amounts or balances to a CP-like state. Instead, it drove the balances in favor of the more mature subunits (GluN2A, GABAAα1). In addition, when fluoxetine was paired with MD it created a neuroprotective-like environment by normalizing the glutamatergic gain found in adult MDs. Together, our results suggest that fluoxetine treatment creates a novel synaptic environment dominated by GluN2A- and GABAAα1-dependent plasticity. PMID:26730408
Mycobacteriophage genome database.

Science.gov (United States)

Joseph, Jerrine; Rajendran, Vasanthi; Hassan, Sameer; Kumar, Vanaja

2011-01-01

Mycobacteriophage genome database (MGDB) is an exclusive repository of the 64 completely sequenced mycobacteriophages with annotated information. It is a comprehensive compilation of the various gene parameters captured from several databases pooled together to empower mycobacteriophage researchers. The MGDB (Version No.1.0) comprises of 6086 genes from 64 mycobacteriophages classified into 72 families based on ACLAME database. Manual curation was aided by information available from public databases which was enriched further by analysis. Its web interface allows browsing as well as querying the classification. The main objective is to collect and organize the complexity inherent to mycobacteriophage protein classification in a rational way. The other objective is to browse the existing and new genomes and describe their functional annotation. The database is available for free at http://mpgdb.ibioinformatics.org/mpgdb.php.
Methods for the visualization and analysis of extracellular matrix protein structure and degradation.

Science.gov (United States)

Leonard, Annemarie K; Loughran, Elizabeth A; Klymenko, Yuliya; Liu, Yueying; Kim, Oleg; Asem, Marwa; McAbee, Kevin; Ravosa, Matthew J; Stack, M Sharon

2018-01-01

This chapter highlights methods for visualization and analysis of extracellular matrix (ECM) proteins, with particular emphasis on collagen type I, the most abundant protein in mammals. Protocols described range from advanced imaging of complex in vivo matrices to simple biochemical analysis of individual ECM proteins. The first section of this chapter describes common methods to image ECM components and includes protocols for second harmonic generation, scanning electron microscopy, and several histological methods of ECM localization and degradation analysis, including immunohistochemistry, Trichrome staining, and in situ zymography. The second section of this chapter details both a common transwell invasion assay and a novel live imaging method to investigate cellular behavior with respect to collagen and other ECM proteins of interest. The final section consists of common electrophoresis-based biochemical methods that are used in analysis of ECM proteins. Use of the methods described herein will enable researchers to gain a greater understanding of the role of ECM structure and degradation in development and matrix-related diseases such as cancer and connective tissue disorders. © 2018 Elsevier Inc. All rights reserved.
Nuclear database management systems

International Nuclear Information System (INIS)

Stone, C.; Sutton, R.

1996-01-01

The authors are developing software tools for accessing and visualizing nuclear data. MacNuclide was the first software application produced by their group. This application incorporates novel database management and visualization tools into an intuitive interface. The nuclide chart is used to access properties and to display results of searches. Selecting a nuclide in the chart displays a level scheme with tables of basic, radioactive decay, and other properties. All level schemes are interactive, allowing the user to modify the display, move between nuclides, and display entire daughter decay chains
Windows on the brain: the emerging role of atlases and databases in neuroscience

Science.gov (United States)

Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

2002-01-01

Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.
Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes.

Science.gov (United States)

Santos, Alberto; Wernersson, Rasmus; Jensen, Lars Juhl

2015-01-01

The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which has been studied in several microscopy-based knockdown screens. The data from these many large-scale efforts are not easily accessed, analyzed and combined due to their inherent heterogeneity. To address this, we have created Cyclebase--available at http://www.cyclebase.org--an online database that allows users to easily visualize and download results from genome-wide cell-cycle-related experiments. In Cyclebase version 3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNA and protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface, designed around an overview figure that summarizes all the cell-cycle-related data for a gene. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Learning Visual Basic NET

CERN Document Server

Liberty, Jesse

2009-01-01

Learning Visual Basic .NET is a complete introduction to VB.NET and object-oriented programming. By using hundreds of examples, this book demonstrates how to develop various kinds of applications--including those that work with databases--and web services. Learning Visual Basic .NET will help you build a solid foundation in .NET.
Using SQL Databases for Sequence Similarity Searching and Analysis.

Science.gov (United States)

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Cancer3D: understanding cancer mutations through protein structures.

Science.gov (United States)

Porta-Pardo, Eduard; Hrabe, Thomas; Godzik, Adam

2015-01-01

The new era of cancer genomics is providing us with extensive knowledge of mutations and other alterations in cancer. The Cancer3D database at http://www.cancer3d.org gives an open and user-friendly way to analyze cancer missense mutations in the context of structures of proteins in which they are found. The database also helps users analyze the distribution patterns of the mutations as well as their relationship to changes in drug activity through two algorithms: e-Driver and e-Drug. These algorithms use knowledge of modular structure of genes and proteins to separately study each region. This approach allows users to find novel candidate driver regions or drug biomarkers that cannot be found when similar analyses are done on the whole-gene level. The Cancer3D database provides access to the results of such analyses based on data from The Cancer Genome Atlas (TCGA) and the Cancer Cell Line Encyclopedia (CCLE). In addition, it displays mutations from over 14,700 proteins mapped to more than 24,300 structures from PDB. This helps users visualize the distribution of mutations and identify novel three-dimensional patterns in their distribution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Database-independent, database-dependent, and extended interpretation of peptide mass spectra in VEMS V2.0

DEFF Research Database (Denmark)

Matthiesen, Rune; Bunkenborg, Jakob; Stensballe, Allan

2004-01-01

, and generation of protein and peptide databases. VEMS V2.0 has been developed into a fast tool for combining database-independent and -dependent protein assignments in an extended analysis of MS/MS-peptide data. MS or MS/MS data can be directly recalibrated after the first search by fitting the data to the best...... search result using polynomial equations. The score function is an improvement of known scoring algorithms and can be adapted for any MS instrument type. In addition, VEMS offers a novel statistical model for evaluating the significance of the protein assignment. The novel features are illustrated...
Deficient plasticity in the primary visual cortex of alpha-calcium/calmodulin-dependent protein kinase II mutant mice.

Science.gov (United States)

Gordon, J A; Cioffi, D; Silva, A J; Stryker, M P

1996-09-01

The recent characterization of plasticity in the mouse visual cortex permits the use of mutant mice to investigate the cellular mechanisms underlying activity-dependent development. As calcium-dependent signaling pathways have been implicated in neuronal plasticity, we examined visual cortical plasticity in mice lacking the alpha-isoform of calcium/calmodulin-dependent protein kinase II (alpha CaMKII). In wild-type mice, brief occlusion of vision in one eye during a critical period reduces responses in the visual cortex. In half of the alpha CaMKII-deficient mice, visual cortical responses developed normally, but visual cortical plasticity was greatly diminished. After intensive training, spatial learning in the Morris water maze was severely impaired in a similar fraction of mutant animals. These data indicate that loss of alpha CaMKII results in a severe but variable defect in neuronal plasticity.
PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

Science.gov (United States)

Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

2016-07-08

The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

Science.gov (United States)

Gerlt, John A

2017-08-22

The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
Identification of ace inhibitory cryptides in Tilapia protein hydrolysate by UPLC-MS/MS coupled to database analysis.

Science.gov (United States)

Yesmine, Ben Henda; Antoine, Bonnet; da Silva Ortência Leocádia, Nunes Gonzalez; Rogério, Boscolo Wilson; Ingrid, Arnaudin; Nicolas, Bridiau; Thierry, Maugard; Jean-Marie, Piot; Frédéric, Sannier; Stéphanie, Bordenave-Juchereau

2017-05-01

An ultra-performance liquid chromatography-quadrupole-time of flight mass spectrometry method was developed and applied to identify short angiotensin-I-converting enzyme (ACE) inhibitory cryptides in Tilapia (Oreochromis Niloticus) protein hydrolyzate. A database was created with previously identified ACE-inhibitory di- and tripeptides and the lowest molecular weight fraction of Tilapia hydrolysate was analysed for coincidences. Only VW and VY were identified. Further analysis of collected fractions conducted to the identification of 51 different peptides in major fractions. 19 peptides selected were synthesised and tested for their ACE inhibitory potential. TL, TI, IK, LR, LD, IQ, DI, AILE, ALLE, ALIE and AIIE were identified as new ACE inhibitors. The findings from this study point UPLC-MS/MS combined with the creation of a database as an efficient technique to identify specific short peptides within a complex hydrolysate, in addition with de novo sequencing. This efficient characterisation of bioactive factors like cryptides in protein hydrolysates will extend their use as functional foods. Copyright © 2017 Elsevier B.V. All rights reserved.
YMDB: the Yeast Metabolome Database

Science.gov (United States)

Jewison, Timothy; Knox, Craig; Neveu, Vanessa; Djoumbou, Yannick; Guo, An Chi; Lee, Jacqueline; Liu, Philip; Mandal, Rupasri; Krishnamurthy, Ram; Sinelnikov, Igor; Wilson, Michael; Wishart, David S.

2012-01-01

The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry. PMID:22064855
WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.

Science.gov (United States)

Slenter, Denise N; Kutmon, Martina; Hanspers, Kristina; Riutta, Anders; Windsor, Jacob; Nunes, Nuno; Mélius, Jonathan; Cirillo, Elisa; Coort, Susan L; Digles, Daniela; Ehrhart, Friederike; Giesbertz, Pieter; Kalafati, Marianthi; Martens, Marvin; Miller, Ryan; Nishida, Kozo; Rieswijk, Linda; Waagmeester, Andra; Eijssen, Lars M T; Evelo, Chris T; Pico, Alexander R; Willighagen, Egon L

2018-01-04

WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

TranscriptomeBrowser 3.0: introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks.

Science.gov (United States)

Lepoivre, Cyrille; Bergon, Aurélie; Lopez, Fabrice; Perumal, Narayanan B; Nguyen, Catherine; Imbert, Jean; Puthier, Denis

2012-01-31

Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i) predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices), (ii) potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii) regulatory interactions curated from the literature, (iv) predicted post-transcriptional regulation by micro-RNA, (v) protein kinase-substrate interactions and (vi) physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration of heterogeneous biological information
Discerning molecular interactions: A comprehensive review on biomolecular interaction databases and network analysis tools.

Science.gov (United States)

Miryala, Sravan Kumar; Anbarasu, Anand; Ramaiah, Sudha

2018-02-05

Computational analysis of biomolecular interaction networks is now gaining a lot of importance to understand the functions of novel genes/proteins. Gene interaction (GI) network analysis and protein-protein interaction (PPI) network analysis play a major role in predicting the functionality of interacting genes or proteins and gives an insight into the functional relationships and evolutionary conservation of interactions among the genes. An interaction network is a graphical representation of gene/protein interactome, where each gene/protein is a node, and interaction between gene/protein is an edge. In this review, we discuss the popular open source databases that serve as data repositories to search and collect protein/gene interaction data, and also tools available for the generation of interaction network, visualization and network analysis. Also, various network analysis approaches like topological approach and clustering approach to study the network properties and functional enrichment server which illustrates the functions and pathway of the genes and proteins has been discussed. Hence the distinctive attribute mentioned in this review is not only to provide an overview of tools and web servers for gene and protein-protein interaction (PPI) network analysis but also to extract useful and meaningful information from the interaction networks. Copyright © 2017 Elsevier B.V. All rights reserved.
Reading wiring diagrams made easier for maintenance operators: contribution from research in visual attention and visual search

International Nuclear Information System (INIS)

Ponthieu, L.; Wolfe, J.M.

1994-07-01

This work has been carried out while the author was visiting the Visual Psychophysics lab at the Center for Ophthalmic Research, Harvard Medical School. The general framework is the design of a wiring diagrams visualization system for maintenance operators in electric plants. This study concentrates on how knowledge and experimental techniques from visual attention can help this goal. From this standpoint, the visualization system must best exploit the human visual system abilities. As electronic databases containing all the diagrams will soon be available, it is important to think in advance the display techniques. Presently, maintenance operators favor working with paper printouts even where such databases are already available. The study shows why such an approach is valuable for the design of a display that fits the operator's tasks. Beyond that, this work has been a mean to learn the experimental techniques of cognitive sciences in an applied frame. (authors). 9 figs., 5 annexes
Teach yourself visually Access 2013

CERN Document Server

McFedries, Paul

2013-01-01

The easy, visual way to learn this popular database program Part of the Office 2013 productivity suite, Access enables you to organize, present, analyze, and share data on a network or over the web. With this Visual guide to show you how, you'll master the fundamentals of this robust database application in no time. Clear, step-by-step instructions are illustrated with full-color screen shots that show exactly what you should see on your screen. Learn to enter new records; create, edit, and design tables and forms; develop queries that generate specific reports; add smart tags to y
Updates on resources, software tools, and databases for plant proteomics in 2016-2017.

Science.gov (United States)

Misra, Biswapriya B

2018-02-08

Proteomics data processing, annotation, and analysis can often lead to major hurdles in large-scale high-throughput bottom-up proteomics experiments. Given the recent rise in protein-based big datasets being generated, efforts in in silico tool development occurrences have had an unprecedented increase; so much so, that it has become increasingly difficult to keep track of all the advances in a particular academic year. However, these tools benefit the plant proteomics community in circumventing critical issues in data analysis and visualization, as these continually developing open-source and community-developed tools hold potential in future research efforts. This review will aim to introduce and summarize more than 50 software tools, databases, and resources developed and published during 2016-2017 under the following categories: tools for data pre-processing and analysis, statistical analysis tools, peptide identification tools, databases and spectral libraries, and data visualization and interpretation tools. Intended for a well-informed proteomics community, finally, efforts in data archiving and validation datasets for the community will be discussed as well. Additionally, the author delineates the current and most commonly used proteomics tools in order to introduce novice readers to this -omics discovery platform. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
ECMDB: The E. coli Metabolome Database

OpenAIRE

Guo, An Chi; Jewison, Timothy; Wilson, Michael; Liu, Yifeng; Knox, Craig; Djoumbou, Yannick; Lo, Patrick; Mandal, Rupasri; Krishnamurthy, Ram; Wishart, David S.

2012-01-01

The Escherichia coli Metabolome Database (ECMDB, http://www.ecmdb.ca) is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ?1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. E...
PATRIC, the bacterial bioinformatics database and analysis resource.

Science.gov (United States)

Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W

2014-01-01

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

Directory of Open Access Journals (Sweden)

Mohit Verma

Full Text Available Chickpea is an important grain legume used as a rich source of protein in human diet. The narrow genetic diversity and limited availability of genomic resources are the major constraints in implementing breeding strategies and biotechnological interventions for genetic enhancement of chickpea. We developed an integrated Chickpea Transcriptome Database (CTDB, which provides the comprehensive web interface for visualization and easy retrieval of transcriptome data in chickpea. The database features many tools for similarity search, functional annotation (putative function, PFAM domain and gene ontology search and comparative gene expression analysis. The current release of CTDB (v2.0 hosts transcriptome datasets with high quality functional annotation from cultivated (desi and kabuli types and wild chickpea. A catalog of transcription factor families and their expression profiles in chickpea are available in the database. The gene expression data have been integrated to study the expression profiles of chickpea transcripts in major tissues/organs and various stages of flower development. The utilities, such as similarity search, ortholog identification and comparative gene expression have also been implemented in the database to facilitate comparative genomic studies among different legumes and Arabidopsis. Furthermore, the CTDB represents a resource for the discovery of functional molecular markers (microsatellites and single nucleotide polymorphisms between different chickpea types. We anticipate that integrated information content of this database will accelerate the functional and applied genomic research for improvement of chickpea. The CTDB web service is freely available at http://nipgr.res.in/ctdb.html.
Advanced SPARQL querying in small molecule databases.

Science.gov (United States)

Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

2016-01-01

In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.
Database Description - TMBETA-GENOME | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available ENOME is a database for transmembrane β-barrel proteins in complete genomes. For each genome, calculations with machine learning algo...rithms and statistical methods have been perfumed and th
Comprehensive Reconstruction and Visualization of Non-Coding Regulatory Networks in Human

Science.gov (United States)

Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

2014-01-01

Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape. PMID:25540777
Comprehensive reconstruction and visualization of non-coding regulatory networks in human.

Science.gov (United States)

Bonnici, Vincenzo; Russo, Francesco; Bombieri, Nicola; Pulvirenti, Alfredo; Giugno, Rosalba

2014-01-01

Research attention has been powered to understand the functional roles of non-coding RNAs (ncRNAs). Many studies have demonstrated their deregulation in cancer and other human disorders. ncRNAs are also present in extracellular human body fluids such as serum and plasma, giving them a great potential as non-invasive biomarkers. However, non-coding RNAs have been relatively recently discovered and a comprehensive database including all of them is still missing. Reconstructing and visualizing the network of ncRNAs interactions are important steps to understand their regulatory mechanism in complex systems. This work presents ncRNA-DB, a NoSQL database that integrates ncRNAs data interactions from a large number of well established on-line repositories. The interactions involve RNA, DNA, proteins, and diseases. ncRNA-DB is available at http://ncrnadb.scienze.univr.it/ncrnadb/. It is equipped with three interfaces: web based, command-line, and a Cytoscape app called ncINetView. By accessing only one resource, users can search for ncRNAs and their interactions, build a network annotated with all known ncRNAs and associated diseases, and use all visual and mining features available in Cytoscape.
Visualization of the African swine fever virus infection in living cells by incorporation into the virus particle of green fluorescent protein-p54 membrane protein chimera

International Nuclear Information System (INIS)

Hernaez, Bruno; Escribano, Jose M.; Alonso, Covadonga

2006-01-01

Many stages of African swine fever virus infection have not yet been studied in detail. To track the behavior of African swine fever virus (ASFV) in the infected cells in real time, we produced an infectious recombinant ASFV (B54GFP-2) that expresses and incorporates into the virus particle a chimera of the p54 envelope protein fused to the enhanced green fluorescent protein (EGFP). The incorporation of the fusion protein into the virus particle was confirmed immunologically and it was determined that p54-EGFP was fully functional by confirmation that the recombinant virus made normal-sized plaques and presented similar growth curves to the wild-type virus. The tagged virus was visualized as individual fluorescent particles during the first stages of infection and allowed to visualize the infection progression in living cells through the viral life cycle by confocal microscopy. In this work, diverse potential applications of B54GFP-2 to study different aspects of ASFV infection are shown. By using this recombinant virus it was possible to determine the trajectory and speed of intracellular virus movement. Additionally, we have been able to visualize for first time the ASFV factory formation dynamics and the cytophatic effect of the virus in live infected cells. Finally, we have analyzed virus progression along the infection cycle and infected cell death as time-lapse animations
Review of Spatial-Database System Usability: Recommendations for the ADDNS Project

National Research Council Canada - National Science Library

Abdalla, R. M; Niall, K. K

2007-01-01

...) and three-dimensional (3D) visualizations. This report presents an overview of the basic concepts of GIS and spatial databases, provides an analytical usability evaluation and critically analyses different spatial- database applications...
BLAST-based structural annotation of protein residues using Protein Data Bank.

Science.gov (United States)

Singh, Harinder; Raghava, Gajendra P S

2016-01-25

In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .
SNPversity: a web-based tool for visualizing diversity

Science.gov (United States)

Schott, David A; Vinnakota, Abhinav G; Portwood, John L; Andorf, Carson M

2018-01-01

Abstract Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces. Database URL: https://www.maizegdb.org/snpversity PMID:29688387
The Protein Model Portal--a comprehensive resource for protein structure and model information.

Science.gov (United States)

Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

2013-01-01

The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org.
The Protein Model Portal—a comprehensive resource for protein structure and model information

Science.gov (United States)

Haas, Juergen; Roth, Steven; Arnold, Konstantin; Kiefer, Florian; Schmidt, Tobias; Bordoli, Lorenza; Schwede, Torsten

2013-01-01

The Protein Model Portal (PMP) has been developed to foster effective use of 3D molecular models in biomedical research by providing convenient and comprehensive access to structural information for proteins. Both experimental structures and theoretical models for a given protein can be searched simultaneously and analyzed for structural variability. By providing a comprehensive view on structural information, PMP offers the opportunity to apply consistent assessment and validation criteria to the complete set of structural models available for proteins. PMP is an open project so that new methods developed by the community can contribute to PMP, for example, new modeling servers for creating homology models and model quality estimation servers for model validation. The accuracy of participating modeling servers is continuously evaluated by the Continuous Automated Model EvaluatiOn (CAMEO) project. The PMP offers a unique interface to visualize structural coverage of a protein combining both theoretical models and experimental structures, allowing straightforward assessment of the model quality and hence their utility. The portal is updated regularly and actively developed to include latest methods in the field of computational structural biology. Database URL: http://www.proteinmodelportal.org PMID:23624946
Visualization of membrane protein crystals in lipid cubic phase using X-ray imaging

OpenAIRE

Warren, Anna J.; Armour, Wes; Axford, Danny; Basham, Mark; Connolley, Thomas; Hall, David R.; Horrell, Sam; McAuley, Katherine E.; Mykhaylyk, Vitaliy; Wagner, Armin; Evans, Gwyndaf

2013-01-01

The focus in macromolecular crystallography is moving towards even more challenging target proteins that often crystallize on much smaller scales and are frequently mounted in opaque or highly refractive materials. It is therefore essential that X-ray beamline technology develops in parallel to accommodate such difficult samples. In this paper, the use of X-ray microradiography and microtomography is reported as a tool for crystal visualization, location and characterization on the macromolec...
Biomine: predicting links between biological entities using network models of heterogeneous databases

Directory of Open Access Journals (Sweden)

Eronen Lauri

2012-06-01

Full Text Available Abstract Background Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Results Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. Conclusions The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable

An Animated Introduction to Relational Databases for Many Majors

Science.gov (United States)

Dietrich, Suzanne W.; Goelman, Don; Borror, Connie M.; Crook, Sharon M.

2015-01-01

Database technology affects many disciplines beyond computer science and business. This paper describes two animations developed with images and color that visually and dynamically introduce fundamental relational database concepts and querying to students of many majors. The goal is for educators in diverse academic disciplines to incorporate the…
Generation of a predicted protein database from EST data and application to iTRAQ analyses in grape (Vitis vinifera cv. Cabernet Sauvignon) berries at ripening initiation

Science.gov (United States)

Lücker, Joost; Laszczak, Mario; Smith, Derek; Lund, Steven T

2009-01-01

Background iTRAQ is a proteomics technique that uses isobaric tags for relative and absolute quantitation of tryptic peptides. In proteomics experiments, the detection and high confidence annotation of proteins and the significance of corresponding expression differences can depend on the quality and the species specificity of the tryptic peptide map database used for analysis of the data. For species for which finished genome sequence data are not available, identification of proteins relies on similarity to proteins from other species using comprehensive peptide map databases such as the MSDB. Results We were interested in characterizing ripening initiation ('veraison') in grape berries at the protein level in order to better define the molecular control of this important process for grape growers and wine makers. We developed a bioinformatic pipeline for processing EST data in order to produce a predicted tryptic peptide database specifically targeted to the wine grape cultivar, Vitis vinifera cv. Cabernet Sauvignon, and lacking truncated N- and C-terminal fragments. By searching iTRAQ MS/MS data generated from berry exocarp and mesocarp samples at ripening initiation, we determined that implementation of the custom database afforded a large improvement in high confidence peptide annotation in comparison to the MSDB. We used iTRAQ MS/MS in conjunction with custom peptide db searches to quantitatively characterize several important pathway components for berry ripening previously described at the transcriptional level and confirmed expression patterns for these at the protein level. Conclusion We determined that a predicted peptide database for MS/MS applications can be derived from EST data using advanced clustering and trimming approaches and successfully implemented for quantitative proteome profiling. Quantitative shotgun proteome profiling holds great promise for characterizing biological processes such as fruit ripening initiation and may be further
ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

Science.gov (United States)

Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

2017-01-04

The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
RaftProt: mammalian lipid raft proteome database.

Science.gov (United States)

Shah, Anup; Chen, David; Boda, Akash R; Foster, Leonard J; Davis, Melissa J; Hill, Michelle M

2015-01-01

RaftProt (http://lipid-raft-database.di.uq.edu.au/) is a database of mammalian lipid raft-associated proteins as reported in high-throughput mass spectrometry studies. Lipid rafts are specialized membrane microdomains enriched in cholesterol and sphingolipids thought to act as dynamic signalling and sorting platforms. Given their fundamental roles in cellular regulation, there is a plethora of information on the size, composition and regulation of these membrane microdomains, including a large number of proteomics studies. To facilitate the mining and analysis of published lipid raft proteomics studies, we have developed a searchable database RaftProt. In addition to browsing the studies, performing basic queries by protein and gene names, searching experiments by cell, tissue and organisms; we have implemented several advanced features to facilitate data mining. To address the issue of potential bias due to biochemical preparation procedures used, we have captured the lipid raft preparation methods and implemented advanced search option for methodology and sample treatment conditions, such as cholesterol depletion. Furthermore, we have identified a list of high confidence proteins, and enabled searching only from this list of likely bona fide lipid raft proteins. Given the apparent biological importance of lipid raft and their associated proteins, this database would constitute a key resource for the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Principles of Information Visualization for Business Research

OpenAIRE

Ioan I. ANDONE

2008-01-01

In the era of data-centric-science, a large number of visualization tools have been created to help researchers understand increasingly rich business databases. Information visualization is a process of constructing a visual presentation of business quantitative data, especially prepared for managerial use. Interactive information visualization provide researchers with remarkable tools for discovery and innovation. By combining powerful data mining methods with user-controlled interfaces, use...
Design and Development of a Linked Open Data-Based Health Information Representation and Visualization System: Potentials and Preliminary Evaluation

Science.gov (United States)

Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur

2014-01-01

Background Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)—a new Semantic Web set of best practice of standards to publish and link heterogeneous data—can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. Objective The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. Methods We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk—a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. Results We developed an LOD
Design and development of a linked open data-based health information representation and visualization system: potentials and preliminary evaluation.

Science.gov (United States)

Tilahun, Binyam; Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur

2014-10-25

Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)-a new Semantic Web set of best practice of standards to publish and link heterogeneous data-can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk-a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. We developed an LOD-based health information representation, querying
Analysis of newly established EST databases reveals similarities between heart regeneration in newt and fish

Directory of Open Access Journals (Sweden)

Weis Patrick

2010-01-01

Full Text Available Abstract Background The newt Notophthalmus viridescens possesses the remarkable ability to respond to cardiac damage by formation of new myocardial tissue. Surprisingly little is known about changes in gene activities that occur during the course of regeneration. To begin to decipher the molecular processes, that underlie restoration of functional cardiac tissue, we generated an EST database from regenerating newt hearts and compared the transcriptional profile of selected candidates with genes deregulated during zebrafish heart regeneration. Results A cDNA library of 100,000 cDNA clones was generated from newt hearts 14 days after ventricular injury. Sequencing of 11520 cDNA clones resulted in 2894 assembled contigs. BLAST searches revealed 1695 sequences with potential homology to sequences from the NCBI database. BLAST searches to TrEMBL and Swiss-Prot databases assigned 1116 proteins to Gene Ontology terms. We also identified a relatively large set of 174 ORFs, which are likely to be unique for urodele amphibians. Expression analysis of newt-zebrafish homologues confirmed the deregulation of selected genes during heart regeneration. Sequences, BLAST results and GO annotations were visualized in a relational web based database followed by grouping of identified proteins into clusters of GO Terms. Comparison of data from regenerating zebrafish hearts identified biological processes, which were uniformly overrepresented during cardiac regeneration in newt and zebrafish. Conclusion We concluded that heart regeneration in newts and zebrafish led to the activation of similar sets of genes, which suggests that heart regeneration in both species might follow similar principles. The design of the newly established newt EST database allows identification of molecular pathways important for heart regeneration.
MCAW-DB: A glycan profile database capturing the ambiguity of glycan recognition patterns.

Science.gov (United States)

Hosoda, Masae; Takahashi, Yushi; Shiota, Masaaki; Shinmachi, Daisuke; Inomoto, Renji; Higashimoto, Shinichi; Aoki-Kinoshita, Kiyoko F

2018-05-11

Glycan-binding protein (GBP) interaction experiments, such as glycan microarrays, are often used to understand glycan recognition patterns. However, oftentimes the interpretation of glycan array experimental data makes it difficult to identify discrete GBP binding patterns due to their ambiguity. It is known that lectins, for example, are non-specific in their binding affinities; the same lectin can bind to different monosaccharides or even different glycan structures. In bioinformatics, several tools to mine the data generated from these sorts of experiments have been developed. These tools take a library of predefined motifs, which are commonly-found glycan patterns such as sialyl-Lewis X, and attempt to identify the motif(s) that are specific to the GBP being analyzed. In our previous work, as opposed to using predefined motifs, we developed the Multiple Carbohydrate Alignment with Weights (MCAW) tool to visualize the state of the glycans being recognized by the GBP under analysis. We previously reported on the effectiveness of our tool and algorithm by analyzing several glycan array datasets from the Consortium of Functional Glycomics (CFG). In this work, we report on our analysis of 1081 data sets which we collected from the CFG, the results of which we have made publicly and freely available as a database called MCAW-DB. We introduce this database, its usage and describe several analysis results. We show how MCAW-DB can be used to analyze glycan-binding patterns of GBPs amidst their ambiguity. For example, the visualization of glycan-binding patterns in MCAW-DB show how they correlate with the concentrations of the samples used in the array experiments. Using MCAW-DB, the patterns of glycans found to bind to various GBP-glycan binding proteins are visualized, indicating the binding "environment" of the glycans. Thus, the ambiguity of glycan recognition is numerically represented, along with the patterns of monosaccharides surrounding the binding region. The
Data mining and visualization of the Alabama accident database

Science.gov (United States)

2000-08-01

The Alabama Department of Public Safety has developed and maintains a centralized database that contain traffic accident data collected from crash report completed by local police officers and state troopers. The Critical Analysis Reporting Environme...
Transporter Classification Database (TCDB)

Data.gov (United States)

U.S. Department of Health & Human Services — The Transporter Classification Database details a comprehensive classification system for membrane transport proteins known as the Transporter Classification (TC)...
E-MSD: the European Bioinformatics Institute Macromolecular Structure Database.

Science.gov (United States)

Boutselakis, H; Dimitropoulos, D; Fillon, J; Golovin, A; Henrick, K; Hussain, A; Ionides, J; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Oldfield, T; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, J; Tagari, M; Tate, J; Tromm, S; Velankar, S; Vranken, W

2003-01-01

The E-MSD macromolecular structure relational database (http://www.ebi.ac.uk/msd) is designed to be a single access point for protein and nucleic acid structures and related information. The database is derived from Protein Data Bank (PDB) entries. Relational database technologies are used in a comprehensive cleaning procedure to ensure data uniformity across the whole archive. The search database contains an extensive set of derived properties, goodness-of-fit indicators, and links to other EBI databases including InterPro, GO, and SWISS-PROT, together with links to SCOP, CATH, PFAM and PROSITE. A generic search interface is available, coupled with a fast secondary structure domain search tool.
Yeast Interacting Proteins Database: YKL002W, YLR423C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthes... into lumenal vesicles of multivesicular bodies, and for delivery of newly synthesized vacuolar enzymes to t
Yeast Interacting Proteins Database: YKL002W, YDL165W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available integral membrane proteins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthes...ins into lumenal vesicles of multivesicular bodies, and for delivery of newly synthesized vacuolar enzymes t
Yeast Interacting Proteins Database: YER081W, YDR105C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available YDR105C TMS1 Vacuolar membrane protein of unknown function that is conserved in mammals; predicted to contai...tion that is conserved in mammals; predicted to contain eleven transmembrane heli
The NCBI BioSystems database.

Science.gov (United States)

Geer, Lewis Y; Marchler-Bauer, Aron; Geer, Renata C; Han, Lianyi; He, Jane; He, Siqian; Liu, Chunlei; Shi, Wenyao; Bryant, Stephen H

2010-01-01

The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.
Yeast Interacting Proteins Database: YOR117W, YJL184W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available c stress response, telomere uncapping and elongation, transcription; component of the EKC/KEOPS protein comp...n proposed to be involved in the modification of N-linked oligosaccharides, osmotic stress response, telomere uncap
InSilico Proteomics System: Integration and Application of Protein and Protein-Protein Interaction Data using Microsoft .NET

Directory of Open Access Journals (Sweden)

Straßer Wolfgang

2006-12-01

Full Text Available In the last decades, biological databases became the major knowledge resource for researchers in the field of molecular biology. The distribution of information among these databases is one of the major problems. An overview about the subject area of data access and representation of protein and protein-protein interaction data within public biological databases is described. For a comprehensive and consistent way of searching and analysing integrated protein and protein-protein interaction data, the InSilico Proteomics (ISP project has been initiated. Its three main objectives are (1 to provide an integrated knowledge pool for data investigation and global network analysis functions for a better understanding of a cell’s interactome, (2 employment of public data for plausibility analysis and validation of in-house experimental data and (3 testing the applicability of Microsoft’s .NET architecture for bioinformatics applications. Data integrated into the ISP database can be queried through the Web portal PRIMOS (PRotein Interaction and MOlecule Search which is freely available at http://biomis.fh-hagenberg.at/isp/primos.
Using random forests for assistance in the curation of G-protein coupled receptor databases.

Science.gov (United States)

Shkurin, Aleksei; Vellido, Alfredo

2017-08-18

Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has
A visual detection of protein content based on titration of moving reaction boundary electrophoresis.

Science.gov (United States)

Wang, Hou-Yu; Guo, Cheng-Ye; Guo, Chen-Gang; Fan, Liu-Yin; Zhang, Lei; Cao, Cheng-Xi

2013-04-24

A visual electrophoretic titration method was firstly developed from the concept of moving reaction boundary (MRB) for protein content analysis. In the developed method, when the voltage was applied, the hydroxide ions in the cathodic vessel moved towards the anode, and neutralized the carboxyl groups of protein immobilized via highly cross-linked polyacrylamide gel (PAG), generating a MRB between the alkali and the immobilized protein. The boundary moving velocity (V(MRB)) was as a function of protein content, and an acid-base indicator was used to denote the boundary displacement. As a proof of concept, standard model proteins and biological samples were chosen for the experiments to study the feasibility of the developed method. The experiments revealed that good linear calibration functions between V(MRB) and protein content (correlation coefficients R>0.98). The experiments further demonstrated the following merits of developed method: (1) weak influence of non-protein nitrogen additives (e.g., melamine) adulterated in protein samples, (2) good agreement with the classic Kjeldahl method (R=0.9945), (3) fast measuring speed in total protein analysis of large samples from the same source, and (4) low limit of detection (0.02-0.15 mg mL(-1) for protein content), good precision (R.S.D. of intra-day less than 1.7% and inter-day less than 2.7%), and high recoveries (105-107%). Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.

PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

Science.gov (United States)

Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

2018-01-04

The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Graphics Design Framework to Visualize Multi-Dimensional Economic Datasets

Science.gov (United States)

Chandramouli, Magesh; Narayanan, Badri; Bertoline, Gary R.

2013-01-01

This study implements a prototype graphics visualization framework to visualize multidimensional data. This graphics design framework serves as a "visual analytical database" for visualization and simulation of economic models. One of the primary goals of any kind of visualization is to extract useful information from colossal volumes of…
The establishment of a database of Italian feeds for the Cornell Net Carbohydrate and Protein System

Directory of Open Access Journals (Sweden)

Enzo Tartari

2010-01-01

Full Text Available A field application of the Cornell Net Carbohydrate and Protein System (CNCPS in Italy has been limited because thefeed bank is based on North American feedstuffs and still few laboratories are able to analyze feeds as requested by theCNCPS. Moreover, the standardization of analytical procedures is still not homogeneous among laboratories. This workwas carried out to establish a first database for feeds commonly used in Italy, providing nutritionists and producers anaccurate and current feed composition, also indicating methods and apparatus for analytical procedures potentially availablefor routine analysis. A total of 909 samples of hays, silages and raw materials (protein feeds, cereals and by-productswere analyzed through 1999 and 2002; analysis included protein solubility and degradability, protein fractions,structural carbohydrate fractions and the calculation of neutral detergent structural carbohydrates. When possible, averagedata were compared with those included in the feed bank of CNCPS ver. 3 and with those obtained by another Italianlaboratory. The main differences were observed in chemical composition of forages and silages, whose composition largelydepends on environmental conditions and physiological stage; protein feeds, cereals and by-products showed somedifferences in crude protein, soluble protein and protein fractions even in feeds of national origin.The intent to modify the feed bank values of CNCPS for establishing an Italian data base of feeds will require a collaborativestudy of many laboratories not only for forages, hays and silages samples - whose composition is greatly dependenton environmental factors and agronomic techniques - but also for protein fractions, whose values are largely influencedby even small changes in analytical techniques.
Series of agriculture in the statistical office of the Republic of Serbia database

Directory of Open Access Journals (Sweden)

Stojanović-Radović Jelena

2015-01-01

Full Text Available The objectives of this Paper have been to examine which data on agriculture can be found in the Statistical Office of the Republic of Serbia Database, and what are the possibilities for the use of the Database in the research and analysis of agriculture. The Statistical Office of the Republic of Serbia Database physically represents normalized database formed in DBMS SQL Server. The methodological approach to the Paper subject is primarily related to modelling and the way of using Database. The options of accession, filtering and downloading of data from the Database are explained. The technical characteristics of the Database were described, indicators of agriculture listed and the possibilities of using Database were analysed. We examined whether these possibilities could be improved. It was concluded that improvements were possible, first, by enriching Database with data that are now only available in printed publications of the Office, and then, through methodological and technical improvements by redesigning the Database modelled on cloud founded databases. Also, the application of the achievements of the new multidisciplinary scientific field - Visual Analytics would improve visualization, interactive data analysis and data management.
Lrit1, a Retinal Transmembrane Protein, Regulates Selective Synapse Formation in Cone Photoreceptor Cells and Visual Acuity

Directory of Open Access Journals (Sweden)

Akiko Ueno

2018-03-01

Full Text Available Summary: In the vertebrate retina, cone photoreceptors play crucial roles in photopic vision by transmitting light-evoked signals to ON- and/or OFF-bipolar cells. However, the mechanisms underlying selective synapse formation in the cone photoreceptor pathway remain poorly understood. Here, we found that Lrit1, a leucine-rich transmembrane protein, localizes to the photoreceptor synaptic terminal and regulates the synaptic connection between cone photoreceptors and cone ON-bipolar cells. Lrit1-deficient retinas exhibit an aberrant morphology of cone photoreceptor pedicles, as well as an impairment of signal transmission from cone photoreceptors to cone ON-bipolar cells. Furthermore, we demonstrated that Lrit1 interacts with Frmpd2, a photoreceptor scaffold protein, and with mGluR6, an ON-bipolar cell-specific glutamate receptor. Additionally, Lrit1-null mice showed visual acuity impairments in their optokinetic responses. These results suggest that the Frmpd2-Lrit1-mGluR6 axis regulates selective synapse formation in cone photoreceptors and is essential for normal visual function. : Ueno et al. finds that Lrit1 plays an important role in regulating the synaptic connection between cone photoreceptors and cone ON-bipolar cells. The Frmpd2-Lrit1-mGluR6 axis is crucial for selective synapse formation in cone photoreceptors and for development of normal visual function. Keywords: retina, circuit, synapse formation, cone photoreceptor cell, ON-bipolar cell, visual acuity
Glycosylation in secreted proteins from yeast Kluyveromyces lactis

Energy Technology Data Exchange (ETDEWEB)

Santos, A.V.; Passos, F.M.L. [Universidade Federal de Vicosa (UFV), MG (Brazil). Dept. de Microbiologia. Lab. de Fisiologia de Microrganismos; Azevedo, B.R.; Pimenta, A.M.C.; Santoro, M.M. [Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG (Brazil). Dept. de Bioquimica e Imunologia. Lab. de Enzimologia e Fisico-Quimica de Proteina

2008-07-01

Full text: The nutritional status of a cell culture affects either the expression or the traffic of a number of proteins. The identification of the physiological conditions which favor protein secretion has important biotechnological consequences in designing systems for recombinant extracellular protein industrial production. Yeast Kluyvromyces lactis has been cultured in a continuous stirring tank bioreactor (CSTR) under nitrogen limitation at growth rates (0.03 h{sup -1} and 0.09 h{sup -1}) close to either exponential or stationary batch growth phases, respectively the objective was to investigate the extracellular glycoproteins at these two level of nitrogen limitation. Proteins from free cell extracts were separated by gradient SDS-PAGE (5-15%) and two-dimensional chromatography, and were analyzed by mass spectrometry (MALDI-TOF-TOF-MS). In SDS-PAGE analysis, differences in extracellular proteome were visualized: different proteins profiles at these two growth rates. The 0.09 h-1 growth rate showed larger number of bands using colloidal Coma ssie Blue staining. Different bands were detected at these two growth rates when the PAS assay for glycoprotein detection in polyacrylamide gel was used. The two-dimensional chromatogram profiles were comparatively distinguished between the 0.03 h{sup -1} and 0.09 h{sup -1} growth rate samples. Protein peaks from the second dimension, were subjected to mass spectrometry. The mass spectrums visualized showed glycosylated proteins with N-acetylglucosamine molecules and 8, 9 or 15 hexoses molecules. Comparisons between the proteins averaged mass values with the deduced proteins masses from K. lactis secreted proteins database indicated possible post-translational modifications, such as post-translational proteolysis, acetylation, deamidation and myristoylation.
Glycosylation in secreted proteins from yeast Kluyveromyces lactis

International Nuclear Information System (INIS)

Santos, A.V.; Passos, F.M.L.; Azevedo, B.R.; Pimenta, A.M.C.; Santoro, M.M.

2008-01-01

Full text: The nutritional status of a cell culture affects either the expression or the traffic of a number of proteins. The identification of the physiological conditions which favor protein secretion has important biotechnological consequences in designing systems for recombinant extracellular protein industrial production. Yeast Kluyvromyces lactis has been cultured in a continuous stirring tank bioreactor (CSTR) under nitrogen limitation at growth rates (0.03 h -1 and 0.09 h -1 ) close to either exponential or stationary batch growth phases, respectively the objective was to investigate the extracellular glycoproteins at these two level of nitrogen limitation. Proteins from free cell extracts were separated by gradient SDS-PAGE (5-15%) and two-dimensional chromatography, and were analyzed by mass spectrometry (MALDI-TOF-TOF-MS). In SDS-PAGE analysis, differences in extracellular proteome were visualized: different proteins profiles at these two growth rates. The 0.09 h-1 growth rate showed larger number of bands using colloidal Coma ssie Blue staining. Different bands were detected at these two growth rates when the PAS assay for glycoprotein detection in polyacrylamide gel was used. The two-dimensional chromatogram profiles were comparatively distinguished between the 0.03 h -1 and 0.09 h -1 growth rate samples. Protein peaks from the second dimension, were subjected to mass spectrometry. The mass spectrums visualized showed glycosylated proteins with N-acetylglucosamine molecules and 8, 9 or 15 hexoses molecules. Comparisons between the proteins averaged mass values with the deduced proteins masses from K. lactis secreted proteins database indicated possible post-translational modifications, such as post-translational proteolysis, acetylation, deamidation and myristoylation
PDBTM: Protein Data Bank of transmembrane proteins after 8 years.

Science.gov (United States)

Kozma, Dániel; Simon, István; Tusnády, Gábor E

2013-01-01

The PDBTM database (available at http://pdbtm.enzim.hu), the first comprehensive and up-to-date transmembrane protein selection of the Protein Data Bank, was launched in 2004. The database was created and has been continuously updated by the TMDET algorithm that is able to distinguish between transmembrane and non-transmembrane proteins using their 3D atomic coordinates only. The TMDET algorithm can locate the spatial positions of transmembrane proteins in lipid bilayer as well. During the last 8 years not only the size of the PDBTM database has been steadily growing from ∼400 to 1700 entries but also new structural elements have been identified, in addition to the well-known α-helical bundle and β-barrel structures. Numerous 'exotic' transmembrane protein structures have been solved since the first release, which has made it necessary to define these new structural elements, such as membrane loops or interfacial helices in the database. This article reports the new features of the PDBTM database that have been added since its first release, and our current efforts to keep the database up-to-date and easy to use so that it may continue to serve as a fundamental resource for the scientific community.
TranscriptomeBrowser 3.0: introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks

Directory of Open Access Journals (Sweden)

Lepoivre Cyrille

2012-01-01

Full Text Available Abstract Background Deciphering gene regulatory networks by in silico approaches is a crucial step in the study of the molecular perturbations that occur in diseases. The development of regulatory maps is a tedious process requiring the comprehensive integration of various evidences scattered over biological databases. Thus, the research community would greatly benefit from having a unified database storing known and predicted molecular interactions. Furthermore, given the intrinsic complexity of the data, the development of new tools offering integrated and meaningful visualizations of molecular interactions is necessary to help users drawing new hypotheses without being overwhelmed by the density of the subsequent graph. Results We extend the previously developed TranscriptomeBrowser database with a set of tables containing 1,594,978 human and mouse molecular interactions. The database includes: (i predicted regulatory interactions (computed by scanning vertebrate alignments with a set of 1,213 position weight matrices, (ii potential regulatory interactions inferred from systematic analysis of ChIP-seq experiments, (iii regulatory interactions curated from the literature, (iv predicted post-transcriptional regulation by micro-RNA, (v protein kinase-substrate interactions and (vi physical protein-protein interactions. In order to easily retrieve and efficiently analyze these interactions, we developed In-teractomeBrowser, a graph-based knowledge browser that comes as a plug-in for Transcriptome-Browser. The first objective of InteractomeBrowser is to provide a user-friendly tool to get new insight into any gene list by providing a context-specific display of putative regulatory and physical interactions. To achieve this, InteractomeBrowser relies on a "cell compartments-based layout" that makes use of a subset of the Gene Ontology to map gene products onto relevant cell compartments. This layout is particularly powerful for visual integration
Yeast Interacting Proteins Database: YDL239C, YLR423C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...cription Protein required for spore wall formation, thought to mediate assembly of a Don1p-containing structure at the leading
Yeast Interacting Proteins Database: YDL239C, YPL070W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...cription Protein required for spore wall formation, thought to mediate assembly of a Don1p-containing structure at the leading
Yeast Interacting Proteins Database: YDL239C, YML042W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...iption Protein required for spore wall formation, thought to mediate assembly of a Don1p-containing structure at the leading
Yeast Interacting Proteins Database: YDL239C, YKL103C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available of a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle p...ait description Protein required for spore wall formation, thought to mediate assembly of a Don1p-containing structure at the leading
The Universal Protein Resource (UniProt): an expanding universe of protein information.

Science.gov (United States)

Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

2006-01-01

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
ContaMiner and ContaBase: a webserver and database for early identification of unwantedly crystallized protein contaminants

Science.gov (United States)

Hungler, Arnaud; Momin, Afaque; Diederichs, Kay; Arold, Stefan, T.

2016-01-01

Solving the phase problem in protein X-ray crystallography relies heavily on the identity of the crystallized protein, especially when molecular replacement (MR) methods are used. Yet, it is not uncommon that a contaminant crystallizes instead of the protein of interest. Such contaminants may be proteins from the expression host organism, protein fusion tags or proteins added during the purification steps. Many contaminants co-purify easily, crystallize and give good diffraction data. Identification of contaminant crystals may take time, since the presence of the contaminant is unexpected and its identity unknown. A webserver (ContaMiner) and a contaminant database (ContaBase) have been established, to allow fast MR-based screening of crystallographic data against currently 62 known contaminants. The web-based ContaMiner (available at http://strube.cbrc.kaust.edu.sa/contaminer/) currently produces results in 5 min to 4 h. The program is also available in a github repository and can be installed locally. ContaMiner enables screening of novel crystals at synchrotron beamlines, and it would be valuable as a routine safety check for ‘crystallization and preliminary X-ray analysis’ publications. Thus, in addition to potentially saving X-ray crystallographers much time and effort, ContaMiner might considerably lower the risk of publishing erroneous data. PMID:27980519
[Determination of total protein content in soya-bean milk via visual moving reaction boundary titration].

Science.gov (United States)

Guo, Chengye; Wang, Houyu; Zhang, Lei; Fan, Liuyin; Cao, Chengxi

2013-11-01

A visual, rapid and accurate moving reaction boundary titration (MRBT) method was used for the determination of the total protein in soya-bean milk. During the process, moving reaction boundary (MRB) was formed by hydroxyl ions in the catholyte and soya-bean milk proteins immobilized in polyacrylamide gel (PAG), and an acid-base indicator was used to denote the boundary motion. The velocity of MRB has a relationship with protein concentration, which was used to obtain a standard curve. By paired t-test, there was no significant difference of the protein content between MRBT and Kjeldahl method at 95% confidence interval. The procedure of MRBT method required about 10 min, and it had linearity in the range of 2.0-14.0 g/L, low limit of detection (0.05 g/L), good precision (RSD of intra-day < 1.90% and inter-day < 4.39%), and high recoveries (97.41%-99.91%). In addition, non-protein nitrogen (NPN) such as melamine added into the soya-bean milk had weak influence on MRBT results.
uVis: A Formula-Based Visualization Tool

DEFF Research Database (Denmark)

Pantazos, Kostas; Xu, Shangjin; Kuhail, Mohammad Amin

Several tools use programming approaches for developing advanced visualizations. Others can with a few steps create simple visualizations with built-in patterns, and users with limited IT experience can use them. However, it is programming and time demanding to create and customize...... these visualizations. We introduce uVis, a tool that allows users with advanced spreadsheet-like IT knowledge and basic database understanding to create simple as well as advanced visualizations. These users construct visualizations by combining building blocks (i.e. controls, shapes). They specify spreadsheet...
Generation of a predicted protein database from EST data and application to iTRAQ analyses in grape (Vitis vinifera cv. Cabernet Sauvignon berries at ripening initiation

Directory of Open Access Journals (Sweden)

Smith Derek

2009-01-01

Full Text Available Abstract Background iTRAQ is a proteomics technique that uses isobaric tags for relative and absolute quantitation of tryptic peptides. In proteomics experiments, the detection and high confidence annotation of proteins and the significance of corresponding expression differences can depend on the quality and the species specificity of the tryptic peptide map database used for analysis of the data. For species for which finished genome sequence data are not available, identification of proteins relies on similarity to proteins from other species using comprehensive peptide map databases such as the MSDB. Results We were interested in characterizing ripening initiation ('veraison' in grape berries at the protein level in order to better define the molecular control of this important process for grape growers and wine makers. We developed a bioinformatic pipeline for processing EST data in order to produce a predicted tryptic peptide database specifically targeted to the wine grape cultivar, Vitis vinifera cv. Cabernet Sauvignon, and lacking truncated N- and C-terminal fragments. By searching iTRAQ MS/MS data generated from berry exocarp and mesocarp samples at ripening initiation, we determined that implementation of the custom database afforded a large improvement in high confidence peptide annotation in comparison to the MSDB. We used iTRAQ MS/MS in conjunction with custom peptide db searches to quantitatively characterize several important pathway components for berry ripening previously described at the transcriptional level and confirmed expression patterns for these at the protein level. Conclusion We determined that a predicted peptide database for MS/MS applications can be derived from EST data using advanced clustering and trimming approaches and successfully implemented for quantitative proteome profiling. Quantitative shotgun proteome profiling holds great promise for characterizing biological processes such as fruit ripening
Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database.

Science.gov (United States)

Hariri, Nadjla; Ravandi, Somayyeh Nadi

2014-01-01

Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual method allowed them more freedom of action.
VerSeDa: vertebrate secretome database.

Science.gov (United States)

Cortazar, Ana R; Oguiza, José A; Aransay, Ana M; Lavín, José L

2017-01-01

Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php. © The Author(s) 2017. Published by Oxford University Press.

PhosphoBase: a database of phosphorylation sites

DEFF Research Database (Denmark)

Blom, Nikolaj; Kreegipuu, Andres; Brunak, Søren

1998-01-01

PhosphoBase is a database of experimentally verified phosphorylation sites. Version 1.0 contains 156 entries and 398 experimentally determined phosphorylation sites. Entries are compiled and revised from the literature and from major protein sequence databases such as SwissProt and PIR. The entries...... provide information about the phosphoprotein and the exact position of its phosphorylation sites. Furthermore, part of the entries contain information about kinetic data obtained from enzyme assays on specific peptides. To illustrate the use of data extracted from PhosphoBase we present a sequence logo...... displaying the overall conservation of positions around serines phosphorylated by protein kinase A (PKA). PhosphoBase is available on the WWW at http://www.cbs.dtu.dk/databases/PhosphoBase/....
Doors for memory: A searchable database.

Science.gov (United States)

Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T; Bowes, Lindsey; Stone, Rob

2016-11-01

The study of human long-term memory has for over 50 years been dominated by research on words. This is partly due to lack of suitable nonverbal materials. Experience in developing a clinical test suggested that door scenes can provide an ecologically relevant and sensitive alternative to the faces and geometrical figures traditionally used to study visual memory. In pursuing this line of research, we have accumulated over 2000 door scenes providing a database that is categorized on a range of variables including building type, colour, age, condition, glazing, and a range of other physical characteristics. We describe an illustrative study of recognition memory for 100 doors tested by yes/no, two-alternative, or four-alternative forced-choice paradigms. These stimuli, together with the full categorized database, are available through a dedicated website. We suggest that door scenes provide an ecologically relevant and participant-friendly source of material for studying the comparatively neglected field of visual long-term memory.
BioWarehouse: a bioinformatics database warehouse toolkit.

Science.gov (United States)

Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

2006-03-23

This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
Database Constraints Applied to Metabolic Pathway Reconstruction Tools

Directory of Open Access Journals (Sweden)

Jordi Vilaplana

2014-01-01

Full Text Available Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (reannotation of proteomes, to properly identify both the individual proteins involved in the process(es of interest and their function. It also enables the sets of proteins involved in the process(es in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.
Database constraints applied to metabolic pathway reconstruction tools.

Science.gov (United States)

Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi

2014-01-01

Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.
Enhanced DIII-D Data Management Through a Relational Database

Science.gov (United States)

Burruss, J. R.; Peng, Q.; Schachter, J.; Schissel, D. P.; Terpstra, T. B.

2000-10-01

A relational database is being used to serve data about DIII-D experiments. The database is optimized for queries across multiple shots, allowing for rapid data mining by SQL-literate researchers. The relational database relates different experiments and datasets, thus providing a big picture of DIII-D operations. Users are encouraged to add their own tables to the database. Summary physics quantities about DIII-D discharges are collected and stored in the database automatically. Meta-data about code runs, MDSplus usage, and visualization tool usage are collected, stored in the database, and later analyzed to improve computing. Documentation on the database may be accessed through programming languages such as C, Java, and IDL, or through ODBC compliant applications such as Excel and Access. A database-driven web page also provides a convenient means for viewing database quantities through the World Wide Web. Demonstrations will be given at the poster.
Access to DNA and protein databases on the Internet.

Science.gov (United States)

Harper, R

1994-02-01

During the past year, the number of biological databases that can be queried via Internet has dramatically increased. This increase has resulted from the introduction of networking tools, such as Gopher and WAIS, that make it easy for research workers to index databases and make them available for on-line browsing. Biocomputing in the nineties will see the advent of more client/server options for the solution of problems in bioinformatics.
Quality Assurance Procedures for ModCat Database Code Files

Energy Technology Data Exchange (ETDEWEB)

Siciliano, Edward R.; Devanathan, Ram; Guillen, Zoe C.; Kouzes, Richard T.; Schweppe, John E.

2014-04-01

The Quality Assurance procedures used for the initial phase of the Model Catalog Project were developed to attain two objectives, referred to as “basic functionality” and “visualization.” To ensure the Monte Carlo N-Particle model input files posted into the ModCat database meet those goals, all models considered as candidates for the database are tested, revised, and re-tested.
Identifying opportune landing sites in degraded visual environments with terrain and cultural databases

Science.gov (United States)

Moody, Marc; Fisher, Robert; Little, J. Kristin

2014-06-01

Boeing has developed a degraded visual environment navigational aid that is flying on the Boeing AH-6 light attack helicopter. The navigational aid is a two dimensional software digital map underlay generated by the Boeing™ Geospatial Embedded Mapping Software (GEMS) and fully integrated with the operational flight program. The page format on the aircraft's multi function displays (MFD) is termed the Approach page. The existing work utilizes Digital Terrain Elevation Data (DTED) and OpenGL ES 2.0 graphics capabilities to compute the pertinent graphics underlay entirely on the graphics processor unit (GPU) within the AH-6 mission computer. The next release will incorporate cultural databases containing Digital Vertical Obstructions (DVO) to warn the crew of towers, buildings, and power lines when choosing an opportune landing site. Future IRAD will include Light Detection and Ranging (LIDAR) point cloud generating sensors to provide 2D and 3D synthetic vision on the final approach to the landing zone. Collision detection with respect to terrain, cultural, and point cloud datasets may be used to further augment the crew warning system. The techniques for creating the digital map underlay leverage the GPU almost entirely, making this solution viable on most embedded mission computing systems with an OpenGL ES 2.0 capable GPU. This paper focuses on the AH-6 crew interface process for determining a landing zone and flying the aircraft to it.
Yeast Interacting Proteins Database: YDR176W, YDL239C [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available a Don1p-containing structure at the leading edge of the prospore membrane via interaction with spindle pole...ining structure at the leading edge of the prospore membrane via interaction with spindle pole body componen...DY3 Prey description Protein required for spore wall formation, thought to mediate assembly of a Don1p-conta
The new protein topology graph library web server.

Science.gov (United States)

Schäfer, Tim; Scheck, Andreas; Bruneß, Daniel; May, Patrick; Koch, Ina

2016-02-01

We present a new, extended version of the Protein Topology Graph Library web server. The Protein Topology Graph Library describes the protein topology on the super-secondary structure level. It allows to compute and visualize protein ligand graphs and search for protein structural motifs. The new server features additional information on ligand binding to secondary structure elements, increased usability and an application programming interface (API) to retrieve data, allowing for an automated analysis of protein topology. The Protein Topology Graph Library server is freely available on the web at http://ptgl.uni-frankfurt.de. The website is implemented in PHP, JavaScript, PostgreSQL and Apache. It is supported by all major browsers. The VPLG software that was used to compute the protein ligand graphs and all other data in the database is available under the GNU public license 2.0 from http://vplg.sourceforge.net. tim.schaefer@bioinformatik.uni-frankfurt.de; ina.koch@bioinformatik.uni-frankfurt.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Teach yourself visually complete Excel

CERN Document Server

McFedries, Paul

2013-01-01

Get the basics of Excel and then go beyond with this new instructional visual guide While many users need Excel just to create simple worksheets, many businesses and professionals rely on the advanced features of Excel to handle things like database creation and data analysis. Whatever project you have in mind, this visual guide takes you step by step through what each step should look like. Veteran author Paul McFedries first presents the basics and then gradually takes it further with his coverage of designing worksheets, collaborating between worksheets, working with visual data
Analysis of Patent Databases Using VxInsight

Energy Technology Data Exchange (ETDEWEB)

BOYACK,KEVIN W.; WYLIE,BRIAN N.; DAVIDSON,GEORGE S.; JOHNSON,DAVID K.

2000-12-12

We present the application of a new knowledge visualization tool, VxInsight, to the mapping and analysis of patent databases. Patent data are mined and placed in a database, relationships between the patents are identified, primarily using the citation and classification structures, then the patents are clustered using a proprietary force-directed placement algorithm. Related patents cluster together to produce a 3-D landscape view of the tens of thousands of patents. The user can navigate the landscape by zooming into or out of regions of interest. Querying the underlying database places a colored marker on each patent matching the query. Automatically generated labels, showing landscape content, update continually upon zooming. Optionally, citation links between patents may be shown on the landscape. The combination of these features enables powerful analyses of patent databases.
Usability Testing of a Large, Multidisciplinary Library Database: Basic Search and Visual Search

Directory of Open Access Journals (Sweden)

Jody Condit Fagan

2006-09-01

Full Text Available Visual search interfaces have been shown by researchers to assist users with information search and retrieval. Recently, several major library vendors have added visual search interfaces or functions to their products. For public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. This study presents the results of eight full-scale usability tests of both the EBSCOhost Basic Search and Visual Search in the context of a large liberal arts university.
Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.

Science.gov (United States)

Yang, Jiaoyun; Wang, Haipeng; Ding, Huitong; An, Ning; Alterovitz, Gil

2017-01-19

Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps. By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks. By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases' quality, and make biobricks selection.
An extensible database architecture for nationwide power quality monitoring

Energy Technology Data Exchange (ETDEWEB)

Kuecuek, Dilek; Inan, Tolga; Salor, Oezguel; Demirci, Turan; Buhan, Serkan; Boyrazoglu, Burak [TUBITAK Uzay, Power Electronics Group, TR 06531 Ankara (Turkey); Akkaya, Yener; Uensar, Oezguer; Altintas, Erinc; Haliloglu, Burhan [Turkish Electricity Transmission Co. Inc., TR 06490 Ankara (Turkey); Cadirci, Isik [TUBITAK Uzay, Power Electronics Group, TR 06531 Ankara (Turkey); Hacettepe University, Electrical and Electronics Eng. Dept., TR 06532 Ankara (Turkey); Ermis, Muammer [METU, Electrical and Electronics Eng. Dept., TR 06531 Ankara (Turkey)

2010-07-15

Electrical power quality (PQ) data is one of the prevalent types of engineering data. Its measurement at relevant sampling rates leads to large volumes of PQ data to be managed and analyzed. In this paper, an extensible database architecture is presented based on a novel generic data model for PQ data. The proposed architecture is operated on the nationwide PQ data of the Turkish Electricity Transmission System measured in the field by mobile PQ monitoring systems. The architecture is extensible in the sense that it can be used to store and manage PQ data collected by any means with little or no customization. The architecture has three modules: a PQ database corresponding to the implementation of the generic data model, a visual user query interface to enable its users to specify queries to the PQ database and a query processor acting as a bridge between the query interface and the database. The operation of the architecture is illustrated on the field PQ data with several query examples through the visual query interface. The execution of the architecture on this data of considerable volume supports its applicability and convenience for PQ data. (author)
HCVpro: Hepatitis C virus protein interaction database

KAUST Repository

Kwofie, Samuel K.; Schaefer, Ulf; Sundararajan, Vijayaraghava Seshadri; Bajic, Vladimir B.; Christoffels, Alan G.

2011-01-01

It is essential to catalog characterized hepatitis C virus (HCV) protein-protein interaction (PPI) data and the associated plethora of vital functional information to augment the search for therapies, vaccines and diagnostic biomarkers
Reduced myelin basic protein and actin-related gene expression in visual cortex in schizophrenia.

Science.gov (United States)

Matthews, Paul R; Eastwood, Sharon L; Harrison, Paul J

2012-01-01

Most brain gene expression studies of schizophrenia have been conducted in the frontal cortex or hippocampus. The extent to which alterations occur in other cortical regions is not well established. We investigated primary visual cortex (Brodmann area 17) from the Stanley Neuropathology Consortium collection of tissue from 60 subjects with schizophrenia, bipolar disorder, major depression, or controls. We first carried out a preliminary array screen of pooled RNA, and then used RT-PCR to quantify five mRNAs which the array identified as differentially expressed in schizophrenia (myelin basic protein [MBP], myelin-oligodendrocyte glycoprotein [MOG], β-actin [ACTB], thymosin β-10 [TB10], and superior cervical ganglion-10 [SCG10]). Reduced mRNA levels were confirmed by RT-PCR for MBP, ACTB and TB10. The MBP reduction was limited to transcripts containing exon 2. ACTB and TB10 mRNAs were also decreased in bipolar disorder. None of the transcripts were altered in subjects with major depression. Reduced MBP mRNA in schizophrenia replicates findings in other brain regions and is consistent with oligodendrocyte involvement in the disorder. The decreases in expression of ACTB, and the actin-binding protein gene TB10, suggest changes in cytoskeletal organisation. The findings confirm that the primary visual cortex shows molecular alterations in schizophrenia and extend the evidence for a widespread, rather than focal, cortical pathophysiology.
Discrete Frenet frame, inflection point solitons, and curve visualization with applications to folded proteins

Science.gov (United States)

Hu, Shuangwei; Lundgren, Martin; Niemi, Antti J.

2011-06-01

We develop a transfer matrix formalism to visualize the framing of discrete piecewise linear curves in three-dimensional space. Our approach is based on the concept of an intrinsically discrete curve. This enables us to more effectively describe curves that in the limit where the length of line segments vanishes approach fractal structures in lieu of continuous curves. We verify that in the case of differentiable curves the continuum limit of our discrete equation reproduces the generalized Frenet equation. In particular, we draw attention to the conceptual similarity between inflection points where the curvature vanishes and topologically stable solitons. As an application we consider folded proteins, their Hausdorff dimension is known to be fractal. We explain how to employ the orientation of Cβ carbons of amino acids along a protein backbone to introduce a preferred framing along the backbone. By analyzing the experimentally resolved fold geometries in the Protein Data Bank we observe that this Cβ framing relates intimately to the discrete Frenet framing. We also explain how inflection points (a.k.a. soliton centers) can be located in the loops and clarify their distinctive rôle in determining the loop structure of folded proteins.
CyanoBase: the cyanobacteria genome database update 2010

OpenAIRE

Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

2009-01-01

CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in var...

Customizable Visualizations with Formula-Linked Building Blocks

DEFF Research Database (Denmark)

Kuhail, Mohammad Amin; Lauesen, Søren

different appearance or behavior than what the widgets support. Another approach is to combine primitive graphical elements using traditional programming or visualization toolkits. Traditional programming allows high customizability, but it is time consuming and hard to develop advanced visualizations......One approach to visualization construction is to use complex blocks (widgets) that are tailored for specific visualizations, and customize the visualizations by setting the properties of the widgets. This approach allows fast and easy visualization construction but falls short if the user wants....... Visualization toolkits allow easier visualization creation in some cases, but customization and interaction are tedious. As an alternative, we developed uVis visualization tool that uses spreadsheet-like formulas to connect building blocks. uVis formulas can refer to building blocks and database tables. We...
BioWarehouse: a bioinformatics database warehouse toolkit

Directory of Open Access Journals (Sweden)

Stringer-Calvert David WJ

2006-03-01

Full Text Available Abstract Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the
An ontology-based search engine for protein-protein interactions.

Science.gov (United States)

Park, Byungkyu; Han, Kyungsook

2010-01-18

Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.
Follicle Online: an integrated database of follicle assembly, development and ovulation.

Science.gov (United States)

Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

2015-01-01

Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.
Database Search Engines: Paradigms, Challenges and Solutions.

Science.gov (United States)

Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

2016-01-01

The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

Science.gov (United States)

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-03-01

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D-SURFER2.0 and EM-SURFER.

Science.gov (United States)

Han, Xusi; Wei, Qing; Kihara, Daisuke

2017-12-08

With the rapid growth in the number of solved protein structures stored in the Protein Data Bank (PDB) and the Electron Microscopy Data Bank (EMDB), it is essential to develop tools to perform real-time structure similarity searches against the entire structure database. Since conventional structure alignment methods need to sample different orientations of proteins in the three-dimensional space, they are time consuming and unsuitable for rapid, real-time database searches. To this end, we have developed 3D-SURFER and EM-SURFER, which utilize 3D Zernike descriptors (3DZD) to conduct high-throughput protein structure comparison, visualization, and analysis. Taking an atomic structure or an electron microscopy map of a protein or a protein complex as input, the 3DZD of a query protein is computed and compared with the 3DZD of all other proteins in PDB or EMDB. In addition, local geometrical characteristics of a query protein can be analyzed using VisGrid and LIGSITE CSC in 3D-SURFER. This article describes how to use 3D-SURFER and EM-SURFER to carry out protein surface shape similarity searches, local geometric feature analysis, and interpretation of the search results. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
The Protein Data Bank and Its Uses in Structural Biology Education

Directory of Open Access Journals (Sweden)

Judith Voet

2004-05-01

Full Text Available The Protein Data Bank (PDB is a repository for the structures of proteins and nucleic acids. Itcontains les of their 3-dimensional coordinates, information on how these structures were determinedand references to the journal articles describing them. The PDB was established in 1971 by HelenBerman (it s present director and has grown exponentially so that it now contains 25,000 data lesrepresenting X-ray crystallographic, NMR and other structure determinations. Database queryingand data miningtools and resources at the PDB make it possible to search, compare and infer orpredict the function of newly identied proteins. Computer graphics capabilities make it possible foranyone to easily visualize and study the structural data. The capability to present beautiful graphicrepresentations of the 3-dimesnional structures of proteins and nucleic acids has been a boon to theeducation community. Communicating an understanding of these structures and the chemical forcesdetermining them and their interactions is one of the major aims of biochemistry and molecular biologyeducation. The ability to teach these principles visually has made a great dierence in our abilityto excite our students and provide them with physical interpretations for some abstract concepts inbiochemistry and molecular biology. In this talk we will explore some of the ways that the education community uses the PDB.
Visualization of membrane protein crystals in lipid cubic phase using X-ray imaging

International Nuclear Information System (INIS)

Warren, Anna J.; Armour, Wes; Axford, Danny; Basham, Mark; Connolley, Thomas; Hall, David R.; Horrell, Sam; McAuley, Katherine E.; Mykhaylyk, Vitaliy; Wagner, Armin; Evans, Gwyndaf

2013-01-01

A comparison of X-ray diffraction and radiographic techniques for the location and characterization of protein crystals is demonstrated on membrane protein crystals mounted within lipid cubic phase material. The focus in macromolecular crystallography is moving towards even more challenging target proteins that often crystallize on much smaller scales and are frequently mounted in opaque or highly refractive materials. It is therefore essential that X-ray beamline technology develops in parallel to accommodate such difficult samples. In this paper, the use of X-ray microradiography and microtomography is reported as a tool for crystal visualization, location and characterization on the macromolecular crystallography beamlines at the Diamond Light Source. The technique is particularly useful for microcrystals and for crystals mounted in opaque materials such as lipid cubic phase. X-ray diffraction raster scanning can be used in combination with radiography to allow informed decision-making at the beamline prior to diffraction data collection. It is demonstrated that the X-ray dose required for a full tomography measurement is similar to that for a diffraction grid-scan, but for sample location and shape estimation alone just a few radiographic projections may be required
Visualization of membrane protein crystals in lipid cubic phase using X-ray imaging

Energy Technology Data Exchange (ETDEWEB)

Warren, Anna J. [Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE (United Kingdom); Armour, Wes [Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE (United Kingdom); Oxford e-Research Centre, 7 Keble Road, Oxford OX1 3QG (United Kingdom); Axford, Danny; Basham, Mark; Connolley, Thomas; Hall, David R. [Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE (United Kingdom); Horrell, Sam [Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE (United Kingdom); University of Liverpool, Liverpool L69 3BX (United Kingdom); McAuley, Katherine E.; Mykhaylyk, Vitaliy; Wagner, Armin; Evans, Gwyndaf, E-mail: gwyndaf.evans@diamond.ac.uk [Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE (United Kingdom)

2013-07-01

A comparison of X-ray diffraction and radiographic techniques for the location and characterization of protein crystals is demonstrated on membrane protein crystals mounted within lipid cubic phase material. The focus in macromolecular crystallography is moving towards even more challenging target proteins that often crystallize on much smaller scales and are frequently mounted in opaque or highly refractive materials. It is therefore essential that X-ray beamline technology develops in parallel to accommodate such difficult samples. In this paper, the use of X-ray microradiography and microtomography is reported as a tool for crystal visualization, location and characterization on the macromolecular crystallography beamlines at the Diamond Light Source. The technique is particularly useful for microcrystals and for crystals mounted in opaque materials such as lipid cubic phase. X-ray diffraction raster scanning can be used in combination with radiography to allow informed decision-making at the beamline prior to diffraction data collection. It is demonstrated that the X-ray dose required for a full tomography measurement is similar to that for a diffraction grid-scan, but for sample location and shape estimation alone just a few radiographic projections may be required.
Visual Search Performance in Patients with Vision Impairment: A Systematic Review.

Science.gov (United States)

Senger, Cassia; Margarido, Maria Rita Rodrigues Alves; De Moraes, Carlos Gustavo; De Fendi, Ligia Issa; Messias, André; Paula, Jayter Silva

2017-11-01

Patients with visual impairment are constantly facing challenges to achieve an independent and productive life, which depends upon both a good visual discrimination and search capacities. Given that visual search is a critical skill for several daily tasks and could be used as an index of the overall visual function, we investigated the relationship between vision impairment and visual search performance. A comprehensive search was undertaken using electronic PubMed, EMBASE, LILACS, and Cochrane databases from January 1980 to December 2016, applying the following terms: "visual search", "visual search performance", "visual impairment", "visual exploration", "visual field", "hemianopia", "search time", "vision lost", "visual loss", and "low vision". Two hundred seventy six studies from 12,059 electronic database files were selected, and 40 of them were included in this review. Studies included participants of all ages, both sexes, and the sample sizes ranged from 5 to 199 participants. Visual impairment was associated with worse visual search performance in several ophthalmologic conditions, which were either artificially induced, or related to specific eye and neurological diseases. This systematic review details all the described circumstances interfering with visual search tasks, highlights the need for developing technical standards, and outlines patterns for diagnosis and therapy using visual search capabilities.
DistiLD Database

DEFF Research Database (Denmark)

Palleja, Albert; Horn, Heiko; Eliasson, Sabrina

2012-01-01

Genome-wide association studies (GWAS) have identified thousands of single nucleotide polymorphisms (SNPs) associated with the risk of hundreds of diseases. However, there is currently no database that enables non-specialists to answer the following simple questions: which SNPs associated...... with diseases are in linkage disequilibrium (LD) with a gene of interest? Which chromosomal regions have been associated with a given disease, and which are the potentially causal genes in each region? To answer these questions, we use data from the HapMap Project to partition each chromosome into so-called LD...... blocks, so that SNPs in LD with each other are preferentially in the same block, whereas SNPs not in LD are in different blocks. By projecting SNPs and genes onto LD blocks, the DistiLD database aims to increase usage of existing GWAS results by making it easy to query and visualize disease...
Lrit1, a Retinal Transmembrane Protein, Regulates Selective Synapse Formation in Cone Photoreceptor Cells and Visual Acuity.

Science.gov (United States)

Ueno, Akiko; Omori, Yoshihiro; Sugita, Yuko; Watanabe, Satoshi; Chaya, Taro; Kozuka, Takashi; Kon, Tetsuo; Yoshida, Satoyo; Matsushita, Kenji; Kuwahara, Ryusuke; Kajimura, Naoko; Okada, Yasushi; Furukawa, Takahisa

2018-03-27

In the vertebrate retina, cone photoreceptors play crucial roles in photopic vision by transmitting light-evoked signals to ON- and/or OFF-bipolar cells. However, the mechanisms underlying selective synapse formation in the cone photoreceptor pathway remain poorly understood. Here, we found that Lrit1, a leucine-rich transmembrane protein, localizes to the photoreceptor synaptic terminal and regulates the synaptic connection between cone photoreceptors and cone ON-bipolar cells. Lrit1-deficient retinas exhibit an aberrant morphology of cone photoreceptor pedicles, as well as an impairment of signal transmission from cone photoreceptors to cone ON-bipolar cells. Furthermore, we demonstrated that Lrit1 interacts with Frmpd2, a photoreceptor scaffold protein, and with mGluR6, an ON-bipolar cell-specific glutamate receptor. Additionally, Lrit1-null mice showed visual acuity impairments in their optokinetic responses. These results suggest that the Frmpd2-Lrit1-mGluR6 axis regulates selective synapse formation in cone photoreceptors and is essential for normal visual function. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
CyanoBase: the cyanobacteria genome database update 2010.

Science.gov (United States)

Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

2010-01-01

CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.
Human Variome Project Quality Assessment Criteria for Variation Databases.

Science.gov (United States)

Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter

2016-06-01

Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.
Arabidopsis Regenerating Protoplast: A Powerful Model System for Combining the Proteomics of Cell Wall Proteins and the Visualization of Cell Wall Dynamics

OpenAIRE

Yokoyama, Ryusuke; Kuki, Hiroaki; Kuroha, Takeshi; Nishitani, Kazuhiko

2016-01-01

The development of a range of sub-proteomic approaches to the plant cell wall has identified many of the cell wall proteins. However, it remains difficult to elucidate the precise biological role of each protein and the cell wall dynamics driven by their actions. The plant protoplast provides an excellent means not only for characterizing cell wall proteins, but also for visualizing the dynamics of cell wall regeneration, during which cell wall proteins are secreted. It therefore offers a uni...
Interactive protein manipulation

International Nuclear Information System (INIS)

2003-01-01

We describe an interactive visualization and modeling program for the creation of protein structures ''from scratch''. The input to our program is an amino acid sequence -decoded from a gene- and a sequence of predicted secondary structure types for each amino acid-provided by external structure prediction programs. Our program can be used in the set-up phase of a protein structure prediction process; the structures created with it serve as input for a subsequent global internal energy minimization, or another method of protein structure prediction. Our program supports basic visualization methods for protein structures, interactive manipulation based on inverse kinematics, and visualization guides to aid a user in creating ''good'' initial structures
Visual and Verbal Learning in a Genetic Metabolic Disorder

Science.gov (United States)

Spilkin, Amy M.; Ballantyne, Angela O.; Trauner, Doris A.

2009-01-01

Visual and verbal learning in a genetic metabolic disorder (cystinosis) were examined in the following three studies. The goal of Study I was to provide a normative database and establish the reliability and validity of a new test of visual learning and memory (Visual Learning and Memory Test; VLMT) that was modeled after a widely used test of…
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

Directory of Open Access Journals (Sweden)

Schmidt Bertil

2010-04-01

Full Text Available Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA. A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72 times using the optimized SIMT algorithm and up to 1.77 (1.66 times using the partitioned vectorized algorithm, with a performance of up to 17 (30 billion cells update per second (GCUPS on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295 graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.
Analysis of functionality free CASE-tools databases design

Directory of Open Access Journals (Sweden)

A. V. Gavrilov

2016-01-01

Full Text Available The introduction in the educational process of database design CASEtechnologies requires the institution of significant costs for the purchase of software. A possible solution could be the use of free software peers. At the same time this kind of substitution should be based on even-com representation of the functional characteristics and features of operation of these programs. The purpose of the article – a review of the free and non-profi t CASE-tools database design, as well as their classifi cation on the basis of the analysis functionality. When writing this article were used materials from the offi cial websites of the tool developers. Evaluation of the functional characteristics of CASEtools for database design made exclusively empirically with the direct work with software products. Analysis functionality of tools allow you to distinguish the two categories CASE-tools database design. The first category includes systems with a basic set of features and tools. The most important basic functions of these systems are: management connections to database servers, visual tools to create and modify database objects (tables, views, triggers, procedures, the ability to enter and edit data in table mode, user and privilege management tools, editor SQL-code, means export/import data. CASE-system related to the first category can be used to design and develop simple databases, data management, as well as a means of administration server database. A distinctive feature of the second category of CASE-tools for database design (full-featured systems is the presence of visual designer, allowing to carry out the construction of the database model and automatic creation of the database on the server based on this model. CASE-system related to this categories can be used for the design and development of databases of any structural complexity, as well as a database server administration tool. The article concluded that the

Use of fluorescent proteins and color-coded imaging to visualize cancer cells with different genetic properties.

Science.gov (United States)

Hoffman, Robert M

2016-03-01

Fluorescent proteins are very bright and available in spectrally-distinct colors, enable the imaging of color-coded cancer cells growing in vivo and therefore the distinction of cancer cells with different genetic properties. Non-invasive and intravital imaging of cancer cells with fluorescent proteins allows the visualization of distinct genetic variants of cancer cells down to the cellular level in vivo. Cancer cells with increased or decreased ability to metastasize can be distinguished in vivo. Gene exchange in vivo which enables low metastatic cancer cells to convert to high metastatic can be color-coded imaged in vivo. Cancer stem-like and non-stem cells can be distinguished in vivo by color-coded imaging. These properties also demonstrate the vast superiority of imaging cancer cells in vivo with fluorescent proteins over photon counting of luciferase-labeled cancer cells.
PrionScan: an online database of predicted prion domains in complete proteomes.

Science.gov (United States)

Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier

2014-02-05

Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists
A comprehensive two-dimensional gel protein database of noncultured unfractionated normal human epidermal keratinocytes: towards an integrated approach to the study of cell proliferation, differentiation and skin diseases

DEFF Research Database (Denmark)

Celis, J E; Madsen, Peder; Rasmussen, H H

1991-01-01

A two-dimensional (2-D) gel database of cellular proteins from noncultured, unfractionated normal human epidermal keratinocytes has been established. A total of 2651 [35S]methionine-labeled cellular proteins (1868 isoelectric focusing, 783 nonequilibrium pH gradient electrophoresis) were resolved...
Constructing an XML database of linguistics data

Directory of Open Access Journals (Sweden)

J H Kroeze

2010-04-01

Full Text Available A language-oriented, multi-dimensional database of the linguistic characteristics of the Hebrew text of the Old Testament can enable researchers to do ad hoc queries. XML is a suitable technology to transform free text into a database. A clause’s word order can be kept intact while other features such as syntactic and semantic functions can be marked as elements or attributes. The elements or attributes from the XML “database” can be accessed and proces sed by a 4th generation programming language, such as Visual Basic. XML is explored as an option to build an exploitable database of linguistic data by representing inherently multi-dimensional data, including syntactic and semantic analyses of free text.
Yeast Interacting Proteins Database: YOL069W, YIL144W [Yeast Interacting Proteins Database

Lifescience Database Archive (English)

Full Text Available complex (Ndc80p-Nuf2p-Spc24p-Spc25p); involved in chromosome segregation, spindle checkpoint activity and kinetochore clustering...vity, kinetochore assembly and clustering Rows with this prey as prey (2) Rows with this prey as bait (0) 12...-Nuf2p-Spc24p-Spc25p); involved in chromosome segregation, spindle checkpoint activity and kinetochore clustering...d coiled-coil protein involved in chromosome segregation, spindle checkpoint activity, kinetochore assembly and clustering
Design of database management system for 60Co container inspection system

International Nuclear Information System (INIS)

Liu Jinhui; Wu Zhifang

2007-01-01

The function of the database management system has been designed according to the features of cobalt-60 container inspection system. And the software related to the function has been constructed. The database querying and searching are included in the software. The database operation program is constructed based on Microsoft SQL server and Visual C ++ under Windows 2000. The software realizes database querying, image and graph displaying, statistic, report form and its printing, interface designing, etc. The software is powerful and flexible for operation and information querying. And it has been successfully used in the real database management system of cobalt-60 container inspection system. (authors)
BIOPEP database and other programs for processing bioactive peptide sequences.

Science.gov (United States)

Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata

2008-01-01

This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.
Interactive protein manipulation

Energy Technology Data Exchange (ETDEWEB)

SNCrivelli@lbl.gov

2003-07-01

We describe an interactive visualization and modeling program for the creation of protein structures ''from scratch''. The input to our program is an amino acid sequence -decoded from a gene- and a sequence of predicted secondary structure types for each amino acid-provided by external structure prediction programs. Our program can be used in the set-up phase of a protein structure prediction process; the structures created with it serve as input for a subsequent global internal energy minimization, or another method of protein structure prediction. Our program supports basic visualization methods for protein structures, interactive manipulation based on inverse kinematics, and visualization guides to aid a user in creating ''good'' initial structures.
Cα and Cβ Carbon-13 Chemical Shifts in Proteins From an Empirical Database

International Nuclear Information System (INIS)

Iwadate, Mitsuo; Asakura, Tetsuo; Williamson, Michael P.

1999-01-01

We have constructed an extensive database of 13C Cα and Cβ chemical shifts in proteins of solution, for proteins of which a high-resolution crystal structure exists, and for which the crystal structure has been shown to be essentially identical to the solution structure. There is no systematic effect of temperature, reference compound, or pH on reported shifts, but there appear to be differences in reported shifts arising from referencing differences of up to 4.2 ppm. The major factor affecting chemical shifts is the backbone geometry, which causes differences of ca. 4 ppm between typical α- helix and β-sheet geometries for Cα, and of ca. 2 ppm for Cβ. The side-chain dihedral angle χ1 has an effect of up to 0.5 ppm on the Cα shift, particularly for amino acids with branched side-chains at Cβ. Hydrogen bonding to main-chain atoms has an effect of up to 0.9 ppm, which depends on the main- chain conformation. The sequence of the protein and ring-current shifts from aromatic rings have an insignificant effect (except for residues following proline). There are significant differences between different amino acid types in the backbone geometry dependence; the amino acids can be grouped together into five different groups with different φ,ψ shielding surfaces. The overall fit of individual residues to a single non-residue-specific surface, incorporating the effects of hydrogen bonding and χ1 angle, is 0.96 ppm for both Cα and Cβ. The results from this study are broadly similar to those from ab initio studies, but there are some differences which could merit further attention
ChemProt: a disease chemical biology database

DEFF Research Database (Denmark)

Taboureau, Olivier; Nielsen, Sonny Kim; Audouze, Karine Marie Laure

2011-01-01

Systems pharmacology is an emergent area that studies drug action across multiple scales of complexity, from molecular and cellular to tissue and organism levels. There is a critical need to develop network-based approaches to integrate the growing body of chemical biology knowledge with network...... biology. Here, we report ChemProt, a disease chemical biology database, which is based on a compilation of multiple chemical-protein annotation resources, as well as disease-associated protein-protein interactions (PPIs). We assembled more than 700 000 unique chemicals with biological annotation for 30...... evaluation of environmental chemicals, natural products and approved drugs, as well as the selection of new compounds based on their activity profile against most known biological targets, including those related to adverse drug events. Results from the disease chemical biology database associate citalopram...
Expanded microbial genome coverage and improved protein family annotation in the COG database.

Science.gov (United States)

Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

2015-01-01

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the
Overview of EVE - the event visualization environment of ROOT

International Nuclear Information System (INIS)

Tadel, Matevz

2010-01-01

EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data. Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views. The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file. EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.
Overview of EVE - the event visualization environment of ROOT

Energy Technology Data Exchange (ETDEWEB)

Tadel, Matevz, E-mail: matevz.tadel@cern.c [CERN, CH-1211 Geneve 23 (Switzerland)

2010-04-01

EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data. Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views. The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file. EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.
Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.

Science.gov (United States)

Fang, Yuming; Zhang, Chi; Li, Jing; Lei, Jianjun; Perreira Da Silva, Matthieu; Le Callet, Patrick

2017-10-01

In this paper, we investigate the visual attention modeling for stereoscopic video from the following two aspects. First, we build one large-scale eye tracking database as the benchmark of visual attention modeling for stereoscopic video. The database includes 47 video sequences and their corresponding eye fixation data. Second, we propose a novel computational model of visual attention for stereoscopic video based on Gestalt theory. In the proposed model, we extract the low-level features, including luminance, color, texture, and depth, from discrete cosine transform coefficients, which are used to calculate feature contrast for the spatial saliency computation. The temporal saliency is calculated by the motion contrast from the planar and depth motion features in the stereoscopic video sequences. The final saliency is estimated by fusing the spatial and temporal saliency with uncertainty weighting, which is estimated by the laws of proximity, continuity, and common fate in Gestalt theory. Experimental results show that the proposed method outperforms the state-of-the-art stereoscopic video saliency detection models on our built large-scale eye tracking database and one other database (DML-ITRACK-3D).
UbSRD: The Ubiquitin Structural Relational Database.

Science.gov (United States)

Harrison, Joseph S; Jacobs, Tim M; Houlihan, Kevin; Van Doorslaer, Koenraad; Kuhlman, Brian

2016-02-22

The structurally defined ubiquitin-like homology fold (UBL) can engage in several unique protein-protein interactions and many of these complexes have been characterized with high-resolution techniques. Using Rosetta's structural classification tools, we have created the Ubiquitin Structural Relational Database (UbSRD), an SQL database of features for all 509 UBL-containing structures in the PDB, allowing users to browse these structures by protein-protein interaction and providing a platform for quantitative analysis of structural features. We used UbSRD to define the recognition features of ubiquitin (UBQ) and SUMO observed in the PDB and the orientation of the UBQ tail while interacting with certain types of proteins. While some of the interaction surfaces on UBQ and SUMO overlap, each molecule has distinct features that aid in molecular discrimination. Additionally, we find that the UBQ tail is malleable and can adopt a variety of conformations upon binding. UbSRD is accessible as an online resource at rosettadesign.med.unc.edu/ubsrd. Copyright © 2015 Elsevier Ltd. All rights reserved.
Protein-Protein Interaction Network and Gene Ontology

Science.gov (United States)

Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.
MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

Science.gov (United States)

Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

2010-01-01

During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).
A role for the membrane protein M6 in the Drosophila visual system.

Science.gov (United States)

Zappia, María Paula; Bernabo, Guillermo; Billi, Silvia C; Frasch, Alberto C; Ceriani, María Fernanda; Brocco, Marcela Adriana

2012-07-04

Members of the proteolipid protein family, including the four-transmembrane glycoprotein M6a, are involved in neuronal plasticity in mammals. Results from our group previously demonstrated that M6, the only proteolipid protein expressed in Drosophila, localizes to the cell membrane in follicle cells. M6 loss triggers female sterility, which suggests a role for M6 in follicular cell remodeling. These results were the basis of the present study, which focused on the function and requirements of M6 in the fly nervous system. The present study identified two novel, tissue-regulated M6 isoforms with variable N- and C- termini, and showed that M6 is the functional fly ortholog of Gpm6a. In the adult brain, the protein was localized to several neuropils, such as the optic lobe, the central complex, and the mushroom bodies. Interestingly, although reduced M6 levels triggered a mild rough-eye phenotype, hypomorphic M6 mutants exhibited a defective response to light. Based on its ability to induce filopodium formation we propose that M6 is key in cell remodeling processes underlying visual system function. These results bring further insight into the role of M6/M6a in biological processes involving neuronal plasticity and behavior in flies and mammals.
CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks

Directory of Open Access Journals (Sweden)

Baumbach Jan

2007-11-01

Full Text Available Abstract Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user can be analyzed in the context of known
HippDB: a database of readily targeted helical protein-protein interactions.

Science.gov (United States)

Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S

2013-11-01

HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.

MIPS: analysis and annotation of proteins from whole genomes.

Science.gov (United States)

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Density-based retrieval from high-similarity image databases

DEFF Research Database (Denmark)

Hansen, Michael Edberg; Carstensen, Jens Michael

2004-01-01

Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce a me...
Nuclear integrated database and design advancement system

International Nuclear Information System (INIS)

Ha, Jae Joo; Jeong, Kwang Sub; Kim, Seung Hwan; Choi, Sun Young.

1997-01-01

The objective of NuIDEAS is to computerize design processes through an integrated database by eliminating the current work style of delivering hardcopy documents and drawings. The major research contents of NuIDEAS are the advancement of design processes by computerization, the establishment of design database and 3 dimensional visualization of design data. KSNP (Korea Standard Nuclear Power Plant) is the target of legacy database and 3 dimensional model, so that can be utilized in the next plant design. In the first year, the blueprint of NuIDEAS is proposed, and its prototype is developed by applying the rapidly revolutionizing computer technology. The major results of the first year research were to establish the architecture of the integrated database ensuring data consistency, and to build design database of reactor coolant system and heavy components. Also various softwares were developed to search, share and utilize the data through networks, and the detailed 3 dimensional CAD models of nuclear fuel and heavy components were constructed, and walk-through simulation using the models are developed. This report contains the major additions and modifications to the object oriented database and associated program, using methods and Javascript.. (author). 36 refs., 1 tab., 32 figs
GMDD: a database of GMO detection methods.

Science.gov (United States)

Dong, Wei; Yang, Litao; Shen, Kailin; Kim, Banghyun; Kleter, Gijs A; Marvin, Hans J P; Guo, Rong; Liang, Wanqi; Zhang, Dabing

2008-06-04

Since more than one hundred events of genetically modified organisms (GMOs) have been developed and approved for commercialization in global area, the GMO analysis methods are essential for the enforcement of GMO labelling regulations. Protein and nucleic acid-based detection techniques have been developed and utilized for GMOs identification and quantification. However, the information for harmonization and standardization of GMO analysis methods at global level is needed. GMO Detection method Database (GMDD) has collected almost all the previous developed and reported GMOs detection methods, which have been grouped by different strategies (screen-, gene-, construct-, and event-specific), and also provide a user-friendly search service of the detection methods by GMO event name, exogenous gene, or protein information, etc. In this database, users can obtain the sequences of exogenous integration, which will facilitate PCR primers and probes design. Also the information on endogenous genes, certified reference materials, reference molecules, and the validation status of developed methods is included in this database. Furthermore, registered users can also submit new detection methods and sequences to this database, and the newly submitted information will be released soon after being checked. GMDD contains comprehensive information of GMO detection methods. The database will make the GMOs analysis much easier.
Annotation error in public databases: misannotation of molecular function in enzyme superfamilies.

Directory of Open Access Journals (Sweden)

Alexandra M Schnoes

2009-12-01

Full Text Available Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families; the two other protein sequence databases (GenBank NR and TrEMBL and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.
Annotation error in public databases: misannotation of molecular function in enzyme superfamilies.

Science.gov (United States)

Schnoes, Alexandra M; Brown, Shoshana D; Dodevski, Igor; Babbitt, Patricia C

2009-12-01

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.
mIMT-visHTS: A novel method for multiplexing isobaric mass tagged datasets with an accompanying visualization high throughput screening tool for protein profiling.

Science.gov (United States)

Ricchiuto, Piero; Iwata, Hiroshi; Yabusaki, Katsumi; Yamada, Iwao; Pieper, Brett; Sharma, Amitabh; Aikawa, Masanori; Singh, Sasha A

2015-10-14

Isobaric mass tagging (IMT) methods enable the analysis of thousands of proteins simultaneously. We used tandem mass tagging reagents (TMT™) to monitor the relative changes in the proteome of the mouse macrophage cell line RAW264.7 at the same six time points after no stimulation (baseline phenotype), stimulation with interferon gamma (pro-inflammatory phenotype) or stimulation with interleukin-4 (anti-inflammatory phenotype). The combined TMT datasets yielded nearly 12,000 protein profiles for comparison. To facilitate this large analysis, we developed a novel method that combines or multiplexes the separate IMT (mIMT) datasets into a single super dataset for subsequent model-based clustering and co-regulation analysis. Specially designed visual High Throughput Screening (visHTS) software screened co-regulated proteins. visHTS generates an interactive and visually intuitive color-coded bullseye plot that enables users to browse the cluster outputs and identify co-regulated proteins. Copyright © 2015 Elsevier B.V. All rights reserved.
Chemogenetic Modulation of G Protein-Coupled Receptor Signalling in Visual Attention Research

DEFF Research Database (Denmark)

Jørgensen, Søren H; Fitzpatrick, Ciarán Martin; Gether, Ulrik

2017-01-01

Exclusively Activated by Designer Drugs (DREADDs). The DREADD technology is an emerging and transformative method that allows selective manipulation of G protein-coupled receptor (GPCR) signalling, and its broad-ranging usefulness in attention research is now beginning to emerge. We first describe......Attention is a fundamental cognitive process involved in nearly all aspects of life. Abnormal attentional control is a symptom of many neurological disorders, most notably recognized in ADHD (attention deficit hyperactivity disorder). Although attentional performance and its malfunction has been...... the different DREADDs available and explain how unprecedented specificity of neuronal signalling can be achieved using DREADDs. We next discuss various studies performed in animal models of visual attention, where different brain regions and neuronal populations have been probed by DREADDs. We highlight...
ProteinShop: A tool for interactive protein manipulation and steering

Energy Technology Data Exchange (ETDEWEB)

Crivelli, Silvia; Kreylos, Oliver; Max, Nelson; Hamann, Bernd; Bethel, Wes

2004-05-25

We describe ProteinShop, a new visualization tool that streamlines and simplifies the process of determining optimal protein folds. ProteinShop may be used at different stages of a protein structure prediction process. First, it can create protein configurations containing secondary structures specified by the user. Second, it can interactively manipulate protein fragments to achieve desired folds by adjusting the dihedral angles of selected coil regions using an Inverse Kinematics method. Last, it serves as a visual framework to monitor and steer a protein structure prediction process that may be running on a remote machine. ProteinShop was used to create initial configurations for a protein structure prediction method developed by a team that competed in CASP5. ProteinShop's use accelerated the process of generating initial configurations, reducing the time required from days to hours. This paper describes the structure of ProteinShop and discusses its main features.
Design and realization of reports of database in VC++. net

International Nuclear Information System (INIS)

Zhu Haijun; Shen Liren; Liu Dekang

2006-01-01

The design and realization of reports of database on the basis of VC ++ . net is presented. In the first, report template using word format files is introduced, and the method of filling the data table up according to the database is expatiated. The function of preview and printing calling word software is analyzed. The key code of how to generate reports automatically with Visual C ++ . net is given. (authors)
PathwayAccess: CellDesigner plugins for pathway databases.

Science.gov (United States)

Van Hemert, John L; Dickerson, Julie A

2010-09-15

CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation. Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/~jlv.
Efficient Disk-Based Techniques for Manipulating Very Large String Databases

KAUST Repository

Allam, Amin

2017-01-01

Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories
Medicago truncatula transporter database: a comprehensive database resource for M. truncatula transporters

Directory of Open Access Journals (Sweden)

Miao Zhenyan

2012-02-01

Full Text Available Abstract Background Medicago truncatula has been chosen as a model species for genomic studies. It is closely related to an important legume, alfalfa. Transporters are a large group of membrane-spanning proteins. They deliver essential nutrients, eject waste products, and assist the cell in sensing environmental conditions by forming a complex system of pumps and channels. Although studies have effectively characterized individual M. truncatula transporters in several databases, until now there has been no available systematic database that includes all transporters in M. truncatula. Description The M. truncatula transporter database (MTDB contains comprehensive information on the transporters in M. truncatula. Based on the TransportTP method, we have presented a novel prediction pipeline. A total of 3,665 putative transporters have been annotated based on International Medicago Genome Annotated Group (IMGAG V3.5 V3 and the M. truncatula Gene Index (MTGI V10.0 releases and assigned to 162 families according to the transporter classification system. These families were further classified into seven types according to their transport mode and energy coupling mechanism. Extensive annotations referring to each protein were generated, including basic protein function, expressed sequence tag (EST mapping, genome locus, three-dimensional template prediction, transmembrane segment, and domain annotation. A chromosome distribution map and text-based Basic Local Alignment Search Tools were also created. In addition, we have provided a way to explore the expression of putative M. truncatula transporter genes under stress treatments. Conclusions In summary, the MTDB enables the exploration and comparative analysis of putative transporters in M. truncatula. A user-friendly web interface and regular updates make MTDB valuable to researchers in related fields. The MTDB is freely available now to all users at http://bioinformatics.cau.edu.cn/MtTransporter/.
Integrated application of the database for airborne geophysical survey achievement information

International Nuclear Information System (INIS)

Ji Zengxian; Zhang Junwei

2006-01-01

The paper briefly introduces the database of information for airborne geophysical survey achievements. This database was developed on the platform of Microsoft Windows System with the technical methods of Visual C++ 6.0 and MapGIS. It is an information management system concerning airborne geophysical surveying achievements with perfect functions in graphic display, graphic cutting and output, query of data, printing of documents and reports, maintenance of database, etc. All information of airborne geophysical survey achievements in nuclear industry from 1972 to 2003 was embedded in. Based on regional geological map and Meso-Cenozoic basin map, the detailed statistical information of each airborne survey area, each airborne radioactive anomalous point and high field point can be presented visually by combining geological or basin research result. The successful development of this system will provide a fairly good base and platform for management of archives and data of airborne geophysical survey achievements in nuclear industry. (authors)
Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences.

Science.gov (United States)

Jaschob, Daniel; Davis, Trisha N; Riffle, Michael

2015-03-07

Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.
PATRIC, the bacterial bioinformatics database and analysis resource

Science.gov (United States)

Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

2014-01-01

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323
Visualization of Safety Assessment Result Using GIS in SITES

International Nuclear Information System (INIS)

Yun, Bong-Yo; Park, Joo Wan; Park, Se-Moon; Kim, Chang-Lak

2006-01-01

Site Information and Total Environmental database management System (SITES) is an integrated program for overall data analysis, environmental monitoring, and safety analysis that are produced from the site investigation and environmental assessment of the relevant nuclear facility. SITES is composed of three main modules such as Site Environment Characterization database for Unified and Reliable Evaluation system (SECURE), Safety Assessment INTegration system (SAINT) and Site Useful Data Analysis and ALarm system (SUDAL). The visualization function of safety assessment and environmental monitoring results is designed. This paper is to introduce the visualization design method using Geographic Information System (GIS) for SITES
CardioTF, a database of deconstructing transcriptional circuits in the heart system.

Science.gov (United States)

Zhen, Yisong

2016-01-01

Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.
MIPS: curated databases and comprehensive secondary data resources in 2010.

Science.gov (United States)

Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

2011-01-01

The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
Expediting topology data gathering for the TOPDB database.

Science.gov (United States)

Dobson, László; Langó, Tamás; Reményi, István; Tusnády, Gábor E

2015-01-01

The Topology Data Bank of Transmembrane Proteins (TOPDB, http://topdb.enzim.ttk.mta.hu) contains experimentally determined topology data of transmembrane proteins. Recently, we have updated TOPDB from several sources and utilized a newly developed topology prediction algorithm to determine the most reliable topology using the results of experiments as constraints. In addition to collecting the experimentally determined topology data published in the last couple of years, we gathered topographies defined by the TMDET algorithm using 3D structures from the PDBTM. Results of global topology analysis of various organisms as well as topology data generated by high throughput techniques, like the sequential positions of N- or O-glycosylations were incorporated into the TOPDB database. Moreover, a new algorithm was developed to integrate scattered topology data from various publicly available databases and a new method was introduced to measure the reliability of predicted topologies. We show that reliability values highly correlate with the per protein topology accuracy of the utilized prediction method. Altogether, more than 52,000 new topology data and more than 2600 new transmembrane proteins have been collected since the last public release of the TOPDB database. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Generating descriptive visual words and visual phrases for large-scale image applications.

Science.gov (United States)

Zhang, Shiliang; Tian, Qi; Hua, Gang; Huang, Qingming; Gao, Wen

2011-09-01

Bag-of-visual Words (BoWs) representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the text words. Notwithstanding its great success and wide adoption, visual vocabulary created from single-image local descriptors is often shown to be not as effective as desired. In this paper, descriptive visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, a descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs for image applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain objects or scenes are identified and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are informative and descriptive and, thus, are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including large-scale near-duplicated image retrieval, image search re-ranking, and object recognition. The combination of DVW and DVP performs better than the state of the art in large-scale near-duplicated image retrieval in terms of accuracy, efficiency and memory consumption. The proposed image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm by 12.4% in mean average precision and about 11 times faster in efficiency.
ECOTOX Knowledgebase: New tools for data visualization and database interoperability (poster)

Science.gov (United States)

The ECOTOXicology knowledgebase (ECOTOX) is a comprehensive, curated database that summarizes toxicology data from single chemical exposure studies to terrestrial and aquatic organisms. The ECOTOX Knowledgebase provides risk assessors and researchers consistent information on tox...
VKCDB: Voltage-gated potassium channel database

Directory of Open Access Journals (Sweden)

Gallin Warren J

2004-01-01

Full Text Available Abstract Background The family of voltage-gated potassium channels comprises a functionally diverse group of membrane proteins. They help maintain and regulate the potassium ion-based component of the membrane potential and are thus central to many critical physiological processes. VKCDB (Voltage-gated potassium [K] Channel DataBase is a database of structural and functional data on these channels. It is designed as a resource for research on the molecular basis of voltage-gated potassium channel function. Description Voltage-gated potassium channel sequences were identified by using BLASTP to search GENBANK and SWISSPROT. Annotations for all voltage-gated potassium channels were selectively parsed and integrated into VKCDB. Electrophysiological and pharmacological data for the channels were collected from published journal articles. Transmembrane domain predictions by TMHMM and PHD are included for each VKCDB entry. Multiple sequence alignments of conserved domains of channels of the four Kv families and the KCNQ family are also included. Currently VKCDB contains 346 channel entries. It can be browsed and searched using a set of functionally relevant categories. Protein sequences can also be searched using a local BLAST engine. Conclusions VKCDB is a resource for comparative studies of voltage-gated potassium channels. The methods used to construct VKCDB are general; they can be used to create specialized databases for other protein families. VKCDB is accessible at http://vkcdb.biology.ualberta.ca.
Database of ligand-induced domain movements in enzymes

Directory of Open Access Journals (Sweden)

Hayward Steven

2009-03-01

Full Text Available Abstract Background Conformational change induced by the binding of a substrate or coenzyme is a poorly understood stage in the process of enzyme catalysed reactions. For enzymes that exhibit a domain movement, the conformational change can be clearly characterized and therefore the opportunity exists to gain an understanding of the mechanisms involved. The development of the non-redundant database of protein domain movements contains examples of ligand-induced domain movements in enzymes, but this valuable data has remained unexploited. Description The domain movements in the non-redundant database of protein domain movements are those found by applying the DynDom program to pairs of crystallographic structures contained in Protein Data Bank files. For each pair of structures cross-checking ligands in their Protein Data Bank files with the KEGG-LIGAND database and using methods that search for ligands that contact the enzyme in one conformation but not the other, the non-redundant database of protein domain movements was refined down to a set of 203 enzymes where a domain movement is apparently triggered by the binding of a functional ligand. For these cases, ligand binding information, including hydrogen bonds and salt-bridges between the ligand and specific residues on the enzyme is presented in the context of dynamical information such as the regions that form the dynamic domains, the hinge bending residues, and the hinge axes. Conclusion The presentation at a single website of data on interactions between a ligand and specific residues on the enzyme alongside data on the movement that these interactions induce, should lead to new insights into the mechanisms of these enzymes in particular, and help in trying to understand the general process of ligand-induced domain closure in enzymes. The website can be found at: http://www.cmp.uea.ac.uk/dyndom/enzymeList.do
Database for the OECD-IAEA Paks Fuel Project

International Nuclear Information System (INIS)

Szabo, Emese; Hozer, Zoltan; Gyori, Csaba; Hegyi, Gyoergy

2010-01-01

On 10 April 2003 severe damage of fuel assemblies took place during an incident at Unit 2 of Paks Nuclear Power Plant in Hungary. The assemblies were being cleaned in a special tank below the water level of the spent fuel storage pool in order to remove crud buildup. That afternoon, the chemical cleaning of assemblies was completed and the fuel rods were being cooled by circulation of storage pool water. The first sign of fuel failure was the detection of some fission gases released from the cleaning tank during that evening. The cleaning tank cover locks were released after midnight and this operation was followed by a sudden increase in activity concentrations. The visual inspection revealed that all 30 fuel assemblies were severely damaged. The first evaluation of the event showed that the severe fuel damage happened due to inadequate coolant circulation within the cleaning tank. The damaged fuel assemblies will be removed from the cleaning tank in 2005 and will be stored in special canisters in the spent fuel storage pool of the Paks NPP. Following several discussions between expert from different countries and international organisations the OECD-IAEA Paks Fuel Project was proposed. The project is envisaged in two phases. - Phase 1 is to cover organization of visual inspection of material, preparation of database, performance of analyses and preparatory work for fuel examination. - Phase 2 is to cover the fuel transport and the hot cell examination. The first meeting of the project was held in Budapest on 30-31 January 2006. Phase 1 of the Paks Fuel Project will focus on the numerical simulation of the most important aspects of the incident. This activity will help in the reconstruction of the accidental scenario. The first step of Phase 1 was the collection of a database necessary for the code calculations. The main objective of database collection was to provide input data for calculations. For this reason the collection was focused on such data that are
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

Science.gov (United States)

Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

2016-01-01

Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.
O-GLYCBASE version 2.0: a revised database of O-glycosylated proteins

DEFF Research Database (Denmark)

Hansen, Jan; Lund, Ole; Rapacki, Kristoffer

1997-01-01

O-GLYCBASE is an updated database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the literature, and from the SWISS-PROT database. Entries include information about species, sequence, glycosylation sites and glycan type. O-GLYCBASE is...... patterns for the GalNAc, mannose and GlcNAc transferases are shown. The O-GLYCBASE database is available through WWW or by anonymous FTP....
Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence

DEFF Research Database (Denmark)

Blom, Nikolaj; Sicheritz-Pontén, Thomas; Gupta, Ramneek

2004-01-01

Post-translational modifications (PTMs) occur on almost all proteins analyzed to date. The function of a modified protein is often strongly affected by these modifications and therefore increased knowledge about the potential PTMs of a target protein may increase our understanding of the molecular...... steps by integrating computational approaches into the validation procedures. Many advanced methods for the prediction of PTMs exist and many are made publicly available. We describe our experiences with the development of prediction methods for phosphorylation and glycosylation sites...... and the development of PTM-specific databases. In addition, we discuss novel ideas for PTM visualization (exemplified by kinase landscapes) and improvements for prediction specificity (by using ESS-evolutionary stable sites). As an example, we present a new method for kinase-specific prediction of phosphorylation...
A VBA Desktop Database for Proposal Processing at National Optical Astronomy Observatories

Science.gov (United States)

Brown, Christa L.

National Optical Astronomy Observatories (NOAO) has developed a relational Microsoft Windows desktop database using Microsoft Access and the Microsoft Office programming language, Visual Basic for Applications (VBA). The database is used to track data relating to observing proposals from original receipt through the review process, scheduling, observing, and final statistical reporting. The database has automated proposal processing and distribution of information. It allows NOAO to collect and archive data so as to query and analyze information about our science programs in new ways.
Visualization and Data Analysis for High-Performance Computing

Energy Technology Data Exchange (ETDEWEB)

Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-09-27

This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.
FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events.

Science.gov (United States)

Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J P; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok

2015-01-01

Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain-domain interactions, protein-protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist's mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop 'novel' therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE. © The Author(s) 2015. Published by Oxford University Press.
Development of the updated system of city underground pipelines based on Visual Studio

Science.gov (United States)

Zhang, Jianxiong; Zhu, Yun; Li, Xiangdong

2009-10-01

Our city has owned the integrated pipeline network management system with ArcGIS Engine 9.1 as the bottom development platform and with Oracle9i as basic database for storaging data. In this system, ArcGIS SDE9.1 is applied as the spatial data engine, and the system was a synthetic management software developed with Visual Studio visualization procedures development tools. As the pipeline update function of the system has the phenomenon of slower update and even sometimes the data lost, to ensure the underground pipeline data can real-time be updated conveniently and frequently, and the actuality and integrity of the underground pipeline data, we have increased a new update module in the system developed and researched by ourselves. The module has the powerful data update function, and can realize the function of inputting and outputting and rapid update volume of data. The new developed module adopts Visual Studio visualization procedures development tools, and uses access as the basic database to storage data. We can edit the graphics in AutoCAD software, and realize the database update using link between the graphics and the system. Practice shows that the update module has good compatibility with the original system, reliable and high update efficient of the database.
A multidisciplinary database for geophysical time series management

Science.gov (United States)

Montalto, P.; Aliotta, M.; Cassisi, C.; Prestifilippo, M.; Cannata, A.

2013-12-01

The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.
LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs).

Science.gov (United States)

Wei, Tiandi; Gong, Jing; Jamitzky, Ferdinand; Heckl, Wolfgang M; Stark, Robert W; Rössle, Shaila C

2008-11-05

Leucine-rich repeats (LRRs) are present in more than 6000 proteins. They are found in organisms ranging from viruses to eukaryotes and play an important role in protein-ligand interactions. To date, more than one hundred crystal structures of LRR containing proteins have been determined. This knowledge has increased our ability to use the crystal structures as templates to model LRR proteins with unknown structures. Since the individual three-dimensional LRR structures are not directly available from the established databases and since there are only a few detailed annotations for them, a conformational LRR database useful for homology modeling of LRR proteins is desirable. We developed LRRML, a conformational database and an extensible markup language (XML) description of LRRs. The release 0.2 contains 1261 individual LRR structures, which were identified from 112 PDB structures and annotated manually. An XML structure was defined to exchange and store the LRRs. LRRML provides a source for homology modeling and structural analysis of LRR proteins. In order to demonstrate the capabilities of the database we modeled the mouse Toll-like receptor 3 (TLR3) by multiple templates homology modeling and compared the result with the crystal structure. LRRML is an information source for investigators involved in both theoretical and applied research on LRR proteins. It is available at http://zeus.krist.geo.uni-muenchen.de/~lrrml.
LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs

Directory of Open Access Journals (Sweden)

Stark Robert W

2008-11-01

Full Text Available Abstract Background Leucine-rich repeats (LRRs are present in more than 6000 proteins. They are found in organisms ranging from viruses to eukaryotes and play an important role in protein-ligand interactions. To date, more than one hundred crystal structures of LRR containing proteins have been determined. This knowledge has increased our ability to use the crystal structures as templates to model LRR proteins with unknown structures. Since the individual three-dimensional LRR structures are not directly available from the established databases and since there are only a few detailed annotations for them, a conformational LRR database useful for homology modeling of LRR proteins is desirable. Description We developed LRRML, a conformational database and an extensible markup language (XML description of LRRs. The release 0.2 contains 1261 individual LRR structures, which were identified from 112 PDB structures and annotated manually. An XML structure was defined to exchange and store the LRRs. LRRML provides a source for homology modeling and structural analysis of LRR proteins. In order to demonstrate the capabilities of the database we modeled the mouse Toll-like receptor 3 (TLR3 by multiple templates homology modeling and compared the result with the crystal structure. Conclusion LRRML is an information source for investigators involved in both theoretical and applied research on LRR proteins. It is available at http://zeus.krist.geo.uni-muenchen.de/~lrrml.
Visual Saliency Models for Text Detection in Real World.

Directory of Open Access Journals (Sweden)

Renwu Gao

Full Text Available This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.
Database activities at Brookhaven National Laboratory

International Nuclear Information System (INIS)

Trahern, C.G.

1995-01-01

Brookhaven National Laboratory is a multi-disciplinary lab in the DOE system of research laboratories. Database activities are correspondingly diverse within the restrictions imposed by the dominant relational database paradigm. The authors discuss related activities and tools used in RHIC and in the other major projects at BNL. The others are the Protein Data Bank being maintained by the Chemistry department, and a Geographical Information System (GIS)--a Superfund sponsored environmental monitoring project under development in the Office of Environmental Restoration
Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria

International Nuclear Information System (INIS)

Fritzsching, Keith J.; Hong, Mei; Schmidt-Rohr, Klaus

2016-01-01

We have determined refined multidimensional chemical shift ranges for intra-residue correlations ( 13 C– 13 C, 15 N– 13 C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 13 C chemical shifts and >3 million chemical shifts in total); these data were originally derived from the Biological Magnetic Resonance Data Bank. Using relatively simple non-parametric statistics to find peak maxima in the distributions of helix, sheet, coil and turn chemical shifts, and without the use of limited “hand-picked” data sets, we show that ∼94 % of the 13 C NMR data and almost all 15 N data are quite accurately referenced and assigned, with smaller standard deviations (0.2 and 0.8 ppm, respectively) than recognized previously. On the other hand, approximately 6 % of the 13 C chemical shift data in the PACSY database are shown to be clearly misreferenced, mostly by ca. −2.4 ppm. The removal of the misreferenced data and other outliers by this purging by intrinsic quality criteria (PIQC) allows for reliable identification of secondary maxima in the two-dimensional chemical-shift distributions already pre-separated by secondary structure. We demonstrate that some of these correspond to specific regions in the Ramachandran plot, including left-handed helix dihedral angles, reflect unusual hydrogen bonding, or are due to the influence of a following proline residue. With appropriate smoothing, significantly more tightly defined chemical shift ranges are obtained for each amino acid type in the different secondary structures. These chemical shift ranges, which may be defined at any statistical threshold, can be used for amino-acid type assignment and secondary-structure analysis of chemical shifts from intra-residue cross peaks by inspection or by using a
Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria

Energy Technology Data Exchange (ETDEWEB)

Fritzsching, Keith J., E-mail: kfritzsc@brandeis.edu [Brandeis University, Department of Chemistry (United States); Hong, Mei [Massachusetts Institute of Technology, Department of Chemistry (United States); Schmidt-Rohr, Klaus, E-mail: srohr@brandeis.edu [Brandeis University, Department of Chemistry (United States)

2016-02-15

We have determined refined multidimensional chemical shift ranges for intra-residue correlations ({sup 13}C–{sup 13}C, {sup 15}N–{sup 13}C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 {sup 13}C chemical shifts and >3 million chemical shifts in total); these data were originally derived from the Biological Magnetic Resonance Data Bank. Using relatively simple non-parametric statistics to find peak maxima in the distributions of helix, sheet, coil and turn chemical shifts, and without the use of limited “hand-picked” data sets, we show that ∼94 % of the {sup 13}C NMR data and almost all {sup 15}N data are quite accurately referenced and assigned, with smaller standard deviations (0.2 and 0.8 ppm, respectively) than recognized previously. On the other hand, approximately 6 % of the {sup 13}C chemical shift data in the PACSY database are shown to be clearly misreferenced, mostly by ca. −2.4 ppm. The removal of the misreferenced data and other outliers by this purging by intrinsic quality criteria (PIQC) allows for reliable identification of secondary maxima in the two-dimensional chemical-shift distributions already pre-separated by secondary structure. We demonstrate that some of these correspond to specific regions in the Ramachandran plot, including left-handed helix dihedral angles, reflect unusual hydrogen bonding, or are due to the influence of a following proline residue. With appropriate smoothing, significantly more tightly defined chemical shift ranges are obtained for each amino acid type in the different secondary structures. These chemical shift ranges, which may be defined at any statistical threshold, can be used for amino-acid type assignment and secondary-structure analysis of chemical shifts from intra
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

Directory of Open Access Journals (Sweden)

Ahmad Tamimi

Full Text Available Profile Hidden Markov Model (Profile-HMM is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.

Development of radionuclide parameter database on internal contamination in nuclear emergencies

International Nuclear Information System (INIS)

Zhao Li; Xu Cuihua; Li Wenhong; Su Xu

2010-01-01

Objective: To develop a radionuclide parameter database on internal contamination in nuclear emergencies. Methods: By researching the radionuclides composition discharged from different nuclear emergencies, the radionuclide parameters were achieved on physical decay, absorption and metabolism in the body from ICRP publications and some other publications. The database on internal contamination for nuclear incidents was developed by using MS Visual Studio 2005 C and MS Access programming language. Results: The radionuclide parameter database on internal contamination in nuclear emergency was established. Conclusions: The database may be very convenient for searching radionuclides and radionuclide parameter data discharged from different nuclear emergencies, which would be helpful to the monitoring and assessment and assessment of internal contamination in nuclear emergencies. (authors)
Protein - Trypanosomes Database | LSDB Archive [Life Science Database Archive metadata

Lifescience Database Archive (English)

Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us Trypanoso...nhibitor of the protein. Data file File name: trypanosome.zip File URL: ftp://ftp....biosciencedbc.jp/archive/trypanosome/LATEST/trypanosome.zip File size: 1.4 KB Simple search URL http://togo...db.biosciencedbc.jp/togodb/view/trypanosome#en Data acquisition method - Data analysis method - Number of da...ndelian inheritance in Man ) map Location of the gene on a chromosome or its chromosome number pdb PDB ID (P
Image-based query-by-example for big databases of galaxy images

Science.gov (United States)

Shamir, Lior; Kuminski, Evan

2017-01-01

Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.
COGcollator: a web server for analysis of distant relationships between homologous protein families.

Science.gov (United States)

Dibrova, Daria V; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Mulkidjanian, Armen Y

2017-11-29

The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs. We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server. Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1. This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.
RAIN: RNA-protein Association and Interaction Networks

DEFF Research Database (Denmark)

Junge, Alexander; Refsgaard, Jan Christian; Garde, Christian

2017-01-01

is challenging due to data heterogeneity. Here, we present a database of ncRNA-RNA and ncRNA-protein interactions and its integration with the STRING database of protein-protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data...
Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

Science.gov (United States)

Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

2014-01-01

With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
Identification of proteins similar to AvrE type III effector proteins from ...

African Journals Online (AJOL)

Stephen Opiyo

GSE22274), and AraCyc databases, we highlighted 16 protein candidates from Arabidopsidis genome .... projection method similar to principal component analysis (PCA) .... RIN4 RIN4 (RPM1 INTERACTING PROTEIN 4); protein binding.
Occupational Health and the Visual Arts: An Introduction.

Science.gov (United States)

Hinkamp, David; McCann, Michael; Babin, Angela R

2017-09-01

Occupational hazards in the visual arts often involve hazardous materials, though hazardous equipment and hazardous work conditions can also be found. Occupational health professionals are familiar with most of these hazards and are particularly qualified to contribute clinical and preventive expertise to these issues. Articles illustrating visual arts health issues were sought and reviewed. Literature sources included medical databases, unindexed art-health publications, and popular press articles. Few medical articles examine health issues in the visuals arts directly, but exposures to pigments, solvents, and other hazards found in the visual arts are well described. The hierarchy of controls is an appropriate model for controlling hazards and promoting safer visual art workplaces. The health and safety of those working in the visual arts can benefit from the occupational health approach. Sources of further information are available.
Manchester visual query language

Science.gov (United States)

Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

1993-04-01

We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.
Investigational study on construction of the physical function database; Shintai kino data base no kochiku ni kansuru chosa kenkyu

Energy Technology Data Exchange (ETDEWEB)

NONE

1996-03-01

An investigational study was carried out on construction of the physical function database which can supply data useful for planning, design and production when companies provide products and barrier-free environment for the aged society. Up to now, the final image of database was studied as to the visual function. In addition, this study is aimed at constructing the physical function database. In the literature survey, basic data on physical characteristics of the aged which have lain scattered were collected and systematically sorted in relation to the exercise function in order to make an analysis of the data and measuring technology in terms of reliability, importance, and applied values. In the survey of corporate needs, an examination of concrete needs for the exercise function and auditory function was made for general companies and companies related to the medical and welfare apparatus field. As to the visual function, a study was conducted on selection of new items for visual measurement and measuring methods. In the study of the database structure, a pilot database was constructed and subjects were extracted. 529 refs., 57 figs., 15 tabs.
Color impact in visual attention deployment considering emotional images

Science.gov (United States)

Chamaret, C.

2012-03-01

Color is a predominant factor in the human visual attention system. Even if it cannot be sufficient to the global or complete understanding of a scene, it may impact the visual attention deployment. We propose to study the color impact as well as the emotion aspect of pictures regarding the visual attention deployment. An eye-tracking campaign has been conducted involving twenty people watching half pictures of database in full color and the other half of database in grey color. The eye fixations of color and black and white images were highly correlated leading to the question of the integration of such cues in the design of visual attention model. Indeed, the prediction of two state-of-the-art computational models shows similar results for the two color categories. Similarly, the study of saccade amplitude and fixation duration versus time viewing did not bring any significant differences between the two mentioned categories. In addition, spatial coordinates of eye fixations reveal an interesting indicator for investigating the differences of visual attention deployment over time and fixation number. The second factor related to emotion categories shows evidences of emotional inter-categories differences between color and grey eye fixations for passive and positive emotion. The particular aspect associated to this category induces a specific behavior, rather based on high frequencies, where the color components influence the visual attention deployment.
Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.

Science.gov (United States)

Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia

2015-01-01

With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com © The Author(s) 2015. Published by Oxford University Press.
The MPI facial expression database--a validated database of emotional and conversational facial expressions.

Directory of Open Access Journals (Sweden)

Kathrin Kaulard

Full Text Available The ability to communicate is one of the core aspects of human life. For this, we use not only verbal but also nonverbal signals of remarkable complexity. Among the latter, facial expressions belong to the most important information channels. Despite the large variety of facial expressions we use in daily life, research on facial expressions has so far mostly focused on the emotional aspect. Consequently, most databases of facial expressions available to the research community also include only emotional expressions, neglecting the largely unexplored aspect of conversational expressions. To fill this gap, we present the MPI facial expression database, which contains a large variety of natural emotional and conversational expressions. The database contains 55 different facial expressions performed by 19 German participants. Expressions were elicited with the help of a method-acting protocol, which guarantees both well-defined and natural facial expressions. The method-acting protocol was based on every-day scenarios, which are used to define the necessary context information for each expression. All facial expressions are available in three repetitions, in two intensities, as well as from three different camera angles. A detailed frame annotation is provided, from which a dynamic and a static version of the database have been created. In addition to describing the database in detail, we also present the results of an experiment with two conditions that serve to validate the context scenarios as well as the naturalness and recognizability of the video sequences. Our results provide clear evidence that conversational expressions can be recognized surprisingly well from visual information alone. The MPI facial expression database will enable researchers from different research fields (including the perceptual and cognitive sciences, but also affective computing, as well as computer vision to investigate the processing of a wider range of natural
Fluorescence microscopy visualization of halomucin, a secreted 927 kDa protein surrounding Haloquadratum walsbyi cells

Directory of Open Access Journals (Sweden)

Ralf eZenke

2015-03-01

Full Text Available At the time of its first publication, halomucin from Haloquadratum walsbyi strain HBSQ001 was the largest archaeal protein known (9159 aa. It has a predicted signal sequence, making it likely to be an extracellular or secreted protein. Best BLAST matches were found to be mammalian mucins that protect tissues to dehydration and chemical stress. It was hypothesized that halomucin participates in protection against desiccation by retaining water in a hull around the halophilic organisms that live at the limits of water activity. We visualized Haloquadratum cells by staining their intracellular polyhydroxybutyrate granules using Nile Blue. Halomucin was stained by immunofluorescence with antibodies generated against synthetic peptides derived from the halomucin amino acid sequence. Polyhydroxybutyrate stained cells were reconstructed in 3D which highlights not only the highly regular square shape but also the extreme flatness of Haloquadratum. Double-staining proves halomucin to be extracellular but to be only loosely associated to cells in agreement with its hypothesized function.
O-GLYCBASE version 3.0: a revised database of O-glycosylated proteins

DEFF Research Database (Denmark)

Hansen, Jan; Lund, Ole; Nilsson, Jette

1998-01-01

O-GLYCBASE is a revised database of information on glycoproteins and their O-linked glycosylation sites. Entries are compiled and revised from the literature, and from the sequence databases. Entries include informations about species, sequence, glycosylation sites and glycan type and is fully cr...
dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes

Science.gov (United States)

Singh Nanda, Jagpreet; Kumar, Rahul; Raghava, Gajendra P. S.

2016-01-01

We have developed a database called dbEM (database of Epigenetic Modifiers) to maintain the genomic information of about 167 epigenetic modifiers/proteins, which are considered as potential cancer targets. In dbEM, modifiers are classified on functional basis and comprise of 48 histone methyl transferases, 33 chromatin remodelers and 31 histone demethylases. dbEM maintains the genomic information like mutations, copy number variation and gene expression in thousands of tumor samples, cancer cell lines and healthy samples. This information is obtained from public resources viz. COSMIC, CCLE and 1000-genome project. Gene essentiality data retrieved from COLT database further highlights the importance of various epigenetic proteins for cancer survival. We have also reported the sequence profiles, tertiary structures and post-translational modifications of these epigenetic proteins in cancer. It also contains information of 54 drug molecules against different epigenetic proteins. A wide range of tools have been integrated in dbEM e.g. Search, BLAST, Alignment and Profile based prediction. In our analysis, we found that epigenetic proteins DNMT3A, HDAC2, KDM6A, and TET2 are highly mutated in variety of cancers. We are confident that dbEM will be very useful in cancer research particularly in the field of epigenetic proteins based cancer therapeutics. This database is available for public at URL: http://crdd.osdd.net/raghava/dbem.
Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Directory of Open Access Journals (Sweden)

Samer Haidar

2017-01-01

Full Text Available Protein kinase CK2, initially designated as casein kinase 2, is an ubiquitously expressed serine/threonine kinase. This enzyme, implicated in many cellular processes, is highly expressed and active in many tumor cells. A large number of compounds has been developed as inhibitors comprising different backbones. Beside others, structures with an indeno[1,2-b]indole scaffold turned out to be potent new leads. With the aim of developing new inhibitors of human protein kinase CK2, we report here on the generation of common feature pharmacophore model to further explain the binding requirements for human CK2 inhibitors. Nine common chemical features of indeno[1,2-b]indole-type CK2 inhibitors were determined using MOE software (Chemical Computing Group, Montreal, Canada. This pharmacophore model was used for database mining with the aim to identify novel scaffolds for developing new potent and selective CK2 inhibitors. Using this strategy several structures were selected by searching inside the ZINC compound database. One of the selected compounds was bikaverin (6,11-dihydroxy-3,8-dimethoxy-1-methylbenzo[b]xanthene-7,10,12-trione, a natural compound which is produced by several kinds of fungi. This compound was tested on human recombinant CK2 and turned out to be an active inhibitor with an IC50 value of 1.24 µM.
1.15 - Structural Chemogenomics Databases to Navigate Protein–Ligand Interaction Space

NARCIS (Netherlands)

Kanev, G.K.; Kooistra, A.J.; de Esch, I.J.P.; de Graaf, C.

2017-01-01

Structural chemogenomics databases allow the integration and exploration of heterogeneous genomic, structural, chemical, and pharmacological data in order to extract useful information that is applicable for the discovery of new protein targets and biologically active molecules. Integrated databases
Neural decoding of visual imagery during sleep.

Science.gov (United States)

Horikawa, T; Tamaki, M; Miyawaki, Y; Kamitani, Y

2013-05-03

Visual imagery during sleep has long been a topic of persistent speculation, but its private nature has hampered objective analysis. Here we present a neural decoding approach in which machine-learning models predict the contents of visual imagery during the sleep-onset period, given measured brain activity, by discovering links between human functional magnetic resonance imaging patterns and verbal reports with the assistance of lexical and image databases. Decoding models trained on stimulus-induced brain activity in visual cortical areas showed accurate classification, detection, and identification of contents. Our findings demonstrate that specific visual experience during sleep is represented by brain activity patterns shared by stimulus perception, providing a means to uncover subjective contents of dreaming using objective neural measurement.
Database system of geological information for geological evaluation base of NPP sites(I)

International Nuclear Information System (INIS)

Lim, C. B.; Choi, K. R.; Sim, T. M.; No, M. H.; Lee, H. W.; Kim, T. K.; Lim, Y. S.; Hwang, S. K.

2002-01-01

This study aims to provide database system for site suitability analyses of geological information and a processing program for domestic NPP site evaluation. This database system program includes MapObject provided by ESRI and Spread 3.5 OCX, and is coded with Visual Basic language. Major functions of the systematic database program includes vector and raster farmat topographic maps, database design and application, geological symbol plot, the database search for the plotted geological symbol, and so on. The program can also be applied in analyzing not only for lineament trends but also for statistic treatment from geologically site and laboratory information and sources in digital form and algorithm, which is usually used internationally

Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

Science.gov (United States)

Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.
Databases in the Area of Pharmacogenetics

Science.gov (United States)

Sim, Sarah C.; Altman, Russ B.; Ingelman-Sundberg, Magnus

2012-01-01

In the area of pharmacogenetics and personalized health care it is obvious that databases, providing important information of the occurrence and consequences of variant genes encoding drug metabolizing enzymes, drug transporters, drug targets, and other proteins of importance for drug response or toxicity, are of critical value for scientists, physicians, and industry. The primary outcome of the pharmacogenomic field is the identification of biomarkers that can predict drug toxicity and drug response, thereby individualizing and improving drug treatment of patients. The drug in question and the polymorphic gene exerting the impact are the main issues to be searched for in the databases. Here, we review the databases that provide useful information in this respect, of benefit for the development of the pharmacogenomic field. PMID:21309040
Bioinformatics and moonlighting proteins

Directory of Open Access Journals (Sweden)

Sergio eHernández

2015-06-01

Full Text Available Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyse and describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are: a remote homology searches using Psi-Blast, b detection of functional motifs and domains, c analysis of data from protein-protein interaction databases (PPIs, d match the query protein sequence to 3D databases (i.e., algorithms as PISITE, e mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs have the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations –it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/, previously published by our group, has been used as a benchmark for the all of the analyses.
Arabidopsis Regenerating Protoplast: A Powerful Model System for Combining the Proteomics of Cell Wall Proteins and the Visualization of Cell Wall Dynamics

Science.gov (United States)

Yokoyama, Ryusuke; Kuki, Hiroaki; Kuroha, Takeshi; Nishitani, Kazuhiko

2016-01-01

The development of a range of sub-proteomic approaches to the plant cell wall has identified many of the cell wall proteins. However, it remains difficult to elucidate the precise biological role of each protein and the cell wall dynamics driven by their actions. The plant protoplast provides an excellent means not only for characterizing cell wall proteins, but also for visualizing the dynamics of cell wall regeneration, during which cell wall proteins are secreted. It therefore offers a unique opportunity to investigate the de novo construction process of the cell wall. This review deals with sub-proteomic approaches to the plant cell wall through the use of protoplasts, a methodology that will provide the basis for further exploration of cell wall proteins and cell wall dynamics. PMID:28248244
The Importance of Biological Databases in Biological Discovery.

Science.gov (United States)

Baxevanis, Andreas D; Bateman, Alex

2015-06-19

Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.
An Examination of Job Skills Posted on Internet Databases: Implications for Information Systems Degree Programs.

Science.gov (United States)

Liu, Xia; Liu, Lai C.; Koong, Kai S.; Lu, June

2003-01-01

Analysis of 300 information technology job postings in two Internet databases identified the following skill categories: programming languages (Java, C/C++, and Visual Basic were most frequent); website development (57% sought SQL and HTML skills); databases (nearly 50% required Oracle); networks (only Windows NT or wide-area/local-area networks);…
PRIDE and "Database on Demand" as valuable tools for computational proteomics.

Science.gov (United States)

Vizcaíno, Juan Antonio; Reisinger, Florian; Côté, Richard; Martens, Lennart

2011-01-01

The Proteomics Identifications Database (PRIDE, http://www.ebi.ac.uk/pride ) provides users with the ability to explore and compare mass spectrometry-based proteomics experiments that reveal details of the protein expression found in a broad range of taxonomic groups, tissues, and disease states. A PRIDE experiment typically includes identifications of proteins, peptides, and protein modifications. Additionally, many of the submitted experiments also include the mass spectra that provide the evidence for these identifications. Finally, one of the strongest advantages of PRIDE in comparison with other proteomics repositories is the amount of metadata it contains, a key point to put the above-mentioned data in biological and/or technical context. Several informatics tools have been developed in support of the PRIDE database. The most recent one is called "Database on Demand" (DoD), which allows custom sequence databases to be built in order to optimize the results from search engines. We describe the use of DoD in this chapter. Additionally, in order to show the potential of PRIDE as a source for data mining, we also explore complex queries using federated BioMart queries to integrate PRIDE data with other resources, such as Ensembl, Reactome, or UniProt.
Large Scale Topographic Maps Generalisation and Visualization Based on New Methodology

OpenAIRE

Dinar, Ilma; Ključanin, Slobodanka; Poslončec-Petrić, Vesna

2015-01-01

Integrating spatial data from different sources results in visualization which is the last step in the process of digital basic topographic maps creation. Sources used for visualization are existing real estate cadastre database orthophoto plans and digital terrain models. Analogue cadastre plans were scanned and georeferenced according to existing regulations and used for toponyms. Visualization of topologically inspected geometric primitives was performed based on the ''Collection of cartog...
A database for human performance under simulated emergencies of nuclear power plants

International Nuclear Information System (INIS)

Park, Jin Kyun; Jung, Won Dea

2005-01-01

Reliable human performance is a prerequisite in securing the safety of complicated process systems such as nuclear power plants. However, the amount of available knowledge that can explain why operators deviate from an expected performance level is so small because of the infrequency of real accidents. Therefore, in this study, a database that contains a set of useful information extracted from simulated emergencies was developed in order to provide important clues for understanding the change of operators' performance under stressful conditions (i.e., real accidents). The database was developed under Microsoft Windows TM environment using Microsoft Access 97 TM and Microsoft Visual Basic 6.0 TM . In the database, operators' performance data obtained from the analysis of over 100 audio-visual records for simulated emergencies were stored using twenty kinds of distinctive data fields. A total of ten kinds of operators' performance data are available from the developed database. Although it is still difficult to predict operators' performance under stressful conditions based on the results of simulated emergencies, simulation studies remain the most feasible way to scrutinize performance. Accordingly, it is expected that the performance data of this study will provide a concrete foundation for understanding the change of operators' performance in emergency situations
The MPI Facial Expression Database — A Validated Database of Emotional and Conversational Facial Expressions

Science.gov (United States)

Kaulard, Kathrin; Cunningham, Douglas W.; Bülthoff, Heinrich H.; Wallraven, Christian

2012-01-01

The ability to communicate is one of the core aspects of human life. For this, we use not only verbal but also nonverbal signals of remarkable complexity. Among the latter, facial expressions belong to the most important information channels. Despite the large variety of facial expressions we use in daily life, research on facial expressions has so far mostly focused on the emotional aspect. Consequently, most databases of facial expressions available to the research community also include only emotional expressions, neglecting the largely unexplored aspect of conversational expressions. To fill this gap, we present the MPI facial expression database, which contains a large variety of natural emotional and conversational expressions. The database contains 55 different facial expressions performed by 19 German participants. Expressions were elicited with the help of a method-acting protocol, which guarantees both well-defined and natural facial expressions. The method-acting protocol was based on every-day scenarios, which are used to define the necessary context information for each expression. All facial expressions are available in three repetitions, in two intensities, as well as from three different camera angles. A detailed frame annotation is provided, from which a dynamic and a static version of the database have been created. In addition to describing the database in detail, we also present the results of an experiment with two conditions that serve to validate the context scenarios as well as the naturalness and recognizability of the video sequences. Our results provide clear evidence that conversational expressions can be recognized surprisingly well from visual information alone. The MPI facial expression database will enable researchers from different research fields (including the perceptual and cognitive sciences, but also affective computing, as well as computer vision) to investigate the processing of a wider range of natural facial expressions
Database citation in full text biomedical articles.

Science.gov (United States)

Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

2013-01-01

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
Deep Time Data Infrastructure: Integrating Our Current Geologic and Biologic Databases

Science.gov (United States)

Kolankowski, S. M.; Fox, P. A.; Ma, X.; Prabhu, A.

2016-12-01

As our knowledge of Earth's geologic and mineralogical history grows, we require more efficient methods of sharing immense amounts of data. Databases across numerous disciplines have been utilized to offer extensive information on very specific Epochs of Earth's history up to its current state, i.e. Fossil record, rock composition, proteins, etc. These databases could be a powerful force in identifying previously unseen correlations such as relationships between minerals and proteins. Creating a unifying site that provides a portal to these databases will aid in our ability as a collaborative scientific community to utilize our findings more effectively. The Deep-Time Data Infrastructure (DTDI) is currently being defined as part of a larger effort to accomplish this goal. DTDI will not be a new database, but an integration of existing resources. Current geologic and related databases were identified, documentation of their schema was established and will be presented as a stage by stage progression. Through conceptual modeling focused around variables from their combined records, we will determine the best way to integrate these databases using common factors. The Deep-Time Data Infrastructure will allow geoscientists to bridge gaps in data and further our understanding of our Earth's history.
GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

Directory of Open Access Journals (Sweden)

Promponas Vasilis J

2003-10-01

Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating
Cyclone: java-based querying and computing with Pathway/Genome databases.

Science.gov (United States)

Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

2007-05-15

Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.
On the Perceptual Organization of Image Databases Using Cognitive Discriminative Biplots

Directory of Open Access Journals (Sweden)

Spiros Fotopoulos

2007-01-01

Full Text Available A human-centered approach to image database organization is presented in this study. The management of a generic image database is pursued using a standard psychophysical experimental procedure followed by a well-suited data analysis methodology that is based on simple geometrical concepts. The end result is a cognitive discriminative biplot, which is a visualization of the intrinsic organization of the image database best reflecting the user's perception. The discriminating power of the introduced cognitive biplot constitutes an appealing tool for image retrieval and a flexible interface for visual data mining tasks. These ideas were evaluated in two ways. First, the separability of semantically distinct image classes was measured according to their reduced representations on the biplot. Then, a nearest-neighbor retrieval scheme was run on the emerged low-dimensional terrain to measure the suitability of the biplot for performing content-based image retrieval (CBIR. The achieved organization performance when compared with the performance of a contemporary system was found superior. This promoted the further discussion of packing these ideas into a realizable algorithmic procedure for an efficient and effective personalized CBIR system.
Integration of TGS and CTEN assays using the CTENFIT analysis and databasing program

International Nuclear Information System (INIS)

Estep, R.

2000-01-01

The CTEN F IT program, written for Windows 9x/NT in C++, performs databasing and analysis of combined thermal/epithermal neutron (CTEN) passive and active neutron assay data and integrates that with isotopics results and gamma-ray data from methods such as tomographic gamma scanning (TGS). The binary database is reflected in a companion Excel database that allows extensive customization via Visual Basic for Applications macros. Automated analysis options make the analysis of the data transparent to the assay system operator. Various record browsers and information displays simplified record keeping tasks
Creating a specialist protein resource network: a meeting report for the protein bioinformatics and community resources retreat.

Science.gov (United States)

Babbitt, Patricia C; Bagos, Pantelis G; Bairoch, Amos; Bateman, Alex; Chatonnet, Arnaud; Chen, Mark Jinan; Craik, David J; Finn, Robert D; Gloriam, David; Haft, Daniel H; Henrissat, Bernard; Holliday, Gemma L; Isberg, Vignir; Kaas, Quentin; Landsman, David; Lenfant, Nicolas; Manning, Gerard; Nagano, Nozomi; Srinivasan, Narayanaswamy; O'Donovan, Claire; Pruitt, Kim D; Sowdhamini, Ramanathan; Rawlings, Neil D; Saier, Milton H; Sharman, Joanna L; Spedding, Michael; Tsirigos, Konstantinos D; Vastermark, Ake; Vriend, Gerrit

2015-01-01

During 11-12 August 2014, a Protein Bioinformatics and Community Resources Retreat was held at the Wellcome Trust Genome Campus in Hinxton, UK. This meeting brought together the principal investigators of several specialized protein resources (such as CAZy, TCDB and MEROPS) as well as those from protein databases from the large Bioinformatics centres (including UniProt and RefSeq). The retreat was divided into five sessions: (1) key challenges, (2) the databases represented, (3) best practices for maintenance and curation, (4) information flow to and from large data centers and (5) communication and funding. An important outcome of this meeting was the creation of a Specialist Protein Resource Network that we believe will improve coordination of the activities of its member resources. We invite further protein database resources to join the network and continue the dialogue.
Membrane's Eleven: heavy-atom derivatives of membrane-protein crystals

DEFF Research Database (Denmark)

Morth, Jens Preben; Sørensen, Thomas Lykke-Møller; Nissen, Poul

2006-01-01

A database has been assembled of heavy-atom derivatives used in the structure determination of membrane proteins. The database can serve as a guide to the design of experiments in the search for heavy-atom derivatives of new membrane-protein crystals. The database pinpoints organomercurials...
Protein Annotation from Protein Interaction Networks and Gene Ontology

OpenAIRE

Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.

2011-01-01

We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...
Multivariate spatiotemporal visualizations for mobile devices in Flyover Country

Science.gov (United States)

Loeffler, S.; Thorn, R.; Myrbo, A.; Roth, R.; Goring, S. J.; Williams, J.

2017-12-01

Visualizing and interacting with complex multivariate and spatiotemporal datasets on mobile devices is challenging due to their smaller screens, reduced processing power, and limited data connectivity. Pollen data require visualizing pollen assemblages spatially, temporally, and across multiple taxa to understand plant community dynamics through time. Drawing from cartography, information visualization, and paleoecology, we have created new mobile-first visualization techniques that represent multiple taxa across many sites and enable user interaction. Using pollen datasets from the Neotoma Paleoecology Database as a case study, the visualization techniques allow ecological patterns and trends to be quickly understood on a mobile device compared to traditional pollen diagrams and maps. This flexible visualization system can be used for datasets beyond pollen, with the only requirements being point-based localities and multiple variables changing through time or depth.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.