WorldWideScience

Sample records for gene ontology functional

  1. Defining functional distances over Gene Ontology

    Directory of Open Access Journals (Sweden)

    del Pozo Angela

    2008-01-01

    Full Text Available Abstract Background A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-. However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. Results We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model Df which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. Conclusion The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.

  2. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  3. Exploiting ontology graph for predicting sparsely annotated gene function.

    Science.gov (United States)

    Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian

    2015-06-15

    Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. https://github.com/wangshenguiuc/clusDCA. © The Author 2015. Published by Oxford University Press.

  4. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  5. A new measure for functional similarity of gene products based on Gene Ontology

    Directory of Open Access Journals (Sweden)

    Lengauer Thomas

    2006-06-01

    Full Text Available Abstract Background Gene Ontology (GO is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role. Results We present a new method for comparing sets of GO terms and for assessing the functional similarity of gene products. The method relies on two semantic similarity measures; simRel and funSim. One measure (simRel is applied in the comparison of the biological processes found in different groups of organisms. The other measure (funSim is used to find functionally related gene products within the same or between different genomes. Results indicate that the method, in addition to being in good agreement with established sequence similarity approaches, also provides a means for the identification of functionally related proteins independent of evolutionary relationships. The method is also applied to estimating functional similarity between all proteins in Saccharomyces cerevisiae and to visualizing the molecular function space of yeast in a map of the functional space. A similar approach is used to visualize the functional relationships between protein families. Conclusion The approach enables the comparison of the underlying molecular biology of different taxonomic groups and provides a new comparative genomics tool identifying functionally related gene products independent of homology. The proposed map of the functional space provides a new global view on the functional relationships between gene products or protein families.

  6. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  7. Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph.

    Science.gov (United States)

    Richards, Adam J; Muller, Brian; Shotwell, Matthew; Cowart, L Ashley; Rohrer, Bäerbel; Lu, Xinghua

    2010-06-15

    The results of initial analyses for many high-throughput technologies commonly take the form of gene or protein sets, and one of the ensuing tasks is to evaluate the functional coherence of these sets. The study of gene set function most commonly makes use of controlled vocabulary in the form of ontology annotations. For a given gene set, the statistical significance of observing these annotations or 'enrichment' may be tested using a number of methods. Instead of testing for significance of individual terms, this study is concerned with the task of assessing the global functional coherence of gene sets, for which novel metrics and statistical methods have been devised. The metrics of this study are based on the topological properties of graphs comprised of genes and their Gene Ontology annotations. A novel aspect of these methods is that both the enrichment of annotations and the relationships among annotations are considered when determining the significance of functional coherence. We applied our methods to perform analyses on an existing database and on microarray experimental results. Here, we demonstrated that our approach is highly discriminative in terms of differentiating coherent gene sets from random ones and that it provides biologically sensible evaluations in microarray analysis. We further used examples to show the utility of graph visualization as a tool for studying the functional coherence of gene sets. The implementation is provided as a freely accessible web application at: http://projects.dbbe.musc.edu/gosteiner. Additionally, the source code written in the Python programming language, is available under the General Public License of the Free Software Foundation. Supplementary data are available at Bioinformatics online.

  8. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  9. Evaluating the significance of protein functional similarity based on gene ontology.

    Science.gov (United States)

    Konopka, Bogumil M; Golda, Tomasz; Kotulska, Malgorzata

    2014-11-01

    Gene ontology is among the most successful ontologies in the biomedical domain. It is used to describe, unambiguously, protein molecular functions, cellular localizations, and processes in which proteins participate. The hierarchical structure of gene ontology allows quantifying protein functional similarity by application of algorithms that calculate semantic similarities. The scores, however, are meaningless without a given context. Here, we propose how to evaluate the significance of protein function semantic similarity scores by comparing them to reference distributions calculated for randomly chosen proteins. In the study, thresholds for significant functional semantic similarity, in four representative annotation corpuses, were estimated. We also show that the score significance is influenced by the number and specificity of gene ontology terms that are annotated to compared proteins. While proteins with a greater number of terms tend to yield higher similarity scores, proteins with more specific terms produce lower scores. The estimated significance thresholds were validated using protein sequence-function and structure-function relationships. Taking into account the term number and term specificity improves the distinction between significant and insignificant semantic similarity comparisons.

  10. Gene Ontology Consortium: going forward.

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes.

    Science.gov (United States)

    Kourmpetis, Yiannis Ai; van Dijk, Aalt Dj; Ter Braak, Cajo Jf

    2013-03-27

    : Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to belong to a detailed functional class, but not in a broader class that, due to the vocabulary structure, includes the predicted one.We present a novel discrete optimization algorithm called Functional Annotation with Labeling CONsistency (FALCON) that resolves such contradictions. The GO is modeled as a discrete Bayesian Network. For any given input of GO term membership probabilities, the algorithm returns the most probable GO term assignments that are in accordance with the Gene Ontology structure. The optimization is done using the Differential Evolution algorithm. Performance is evaluated on simulated and also real data from Arabidopsis thaliana showing improvement compared to related approaches. We finally applied the FALCON algorithm to obtain genome-wide function predictions for six eukaryotic species based on data provided by the CAFA (Critical Assessment of Function Annotation) project.

  12. The Functional Genetics of Handedness and Language Lateralization: Insights from Gene Ontology, Pathway and Disease Association Analyses

    Directory of Open Access Journals (Sweden)

    Judith Schmitz

    2017-07-01

    Full Text Available Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.

  13. The Functional Genetics of Handedness and Language Lateralization: Insights from Gene Ontology, Pathway and Disease Association Analyses.

    Science.gov (United States)

    Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian

    2017-01-01

    Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.

  14. Mapping genes for plant structure, development and evolution: functional mapping meets ontology.

    Science.gov (United States)

    He, Qiuling; Berg, Arthur; Li, Yao; Vallejos, C Eduardo; Wu, Rongling

    2010-01-01

    One of the fundamental tasks in biology is the identification of genes that control the structure and developmental pattern of complex traits and their responses to the environment during trait development. Functional mapping provides a statistical means for detecting quantitative trait loci (QTLs) that underlie developmental traits, such as growth trajectories, and for testing the interplay between gene action and development. Here we describe how functional mapping and studies of plant ontology can be integrated so as to elucidate the expression mechanisms of QTLs that control plant growth, morphology, development, and adaptation to changing environments. This approach can also be used to construct an evo-devo framework for inferring the evolution of developmental traits. 2009 Elsevier Ltd. All rights reserved.

  15. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  16. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.

    Science.gov (United States)

    Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn

    2018-03-19

    Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.

  17. PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.

    Science.gov (United States)

    Khan, Ishita K; Wei, Qing; Chitale, Meghana; Kihara, Daisuke

    2015-01-15

    Protein function prediction (PFP) is an automated function prediction method that predicts Gene Ontology (GO) annotations for a protein sequence using distantly related sequences and contextual associations of GO terms. Extended similarity group (ESG) is another GO prediction algorithm that makes predictions based on iterative sequence database searches. Here, we provide interactive web servers for the PFP and ESG algorithms that are equipped with an effective visualization of the GO predictions in a hierarchical topology. PFP/ESG servers are freely available at http://kiharalab.org/web/pfp.php and http://kiharalab.org/web/esg.php, or access both at http://kiharalab.org/pfp_esg.php. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. How to learn about gene function: text-mining or ontologies?

    Science.gov (United States)

    Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I

    2015-03-01

    As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic

  19. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community. PMID:24093723

  20. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.

    Directory of Open Access Journals (Sweden)

    Paul D Thomas

    Full Text Available A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011 has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis". First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1 that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function, and 2 that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the

  1. Computational algorithms to predict Gene Ontology annotations.

    Science.gov (United States)

    Pinoli, Pietro; Chicco, Davide; Masseroli, Marco

    2015-01-01

    Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a

  2. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-10-01

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  3. Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology

    Science.gov (United States)

    Manda, Prashanti; Ozkan, Seval; Wang, Hui; McCarthy, Fiona; Bridges, Susan M.

    2012-01-01

    The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms. PMID:23071802

  4. Inferring gene ontologies from pairwise similarity data

    Science.gov (United States)

    Kramer, Michael; Dutkowski, Janusz; Yu, Michael; Bafna, Vineet; Ideker, Trey

    2014-01-01

    Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene–gene pairwise similarities from -omics data;infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; andrespect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference. Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast. Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall). Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data. Contact: tideker@ucsd.edu PMID:24932003

  5. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  6. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  7. OAHG: an integrated resource for annotating human genes with multi-level ontologies.

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-10-05

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ 2  = 0.2428, p < 2.2e-16).

  8. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. Conclusions The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl. PMID:23895341

  9. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

    Science.gov (United States)

    Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane

    2013-07-29

    The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

  10. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    Science.gov (United States)

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  11. Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology.

    Science.gov (United States)

    Lovering, Ruth C; Roncaglia, Paola; Howe, Douglas G; Laulederkind, Stanley J F; Khodiyar, Varsha K; Berardini, Tanya Z; Tweedie, Susan; Foulger, Rebecca E; Osumi-Sutherland, David; Campbell, Nancy H; Huntley, Rachael P; Talmud, Philippa J; Blake, Judith A; Breckenridge, Ross; Riley, Paul R; Lambiase, Pier D; Elliott, Perry M; Clapp, Lucie; Tinker, Andrew; Hill, David P

    2018-02-01

    A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects. © 2018 The Authors.

  12. Towards refactoring the Molecular Function Ontology with a UML profile for function modeling.

    Science.gov (United States)

    Burek, Patryk; Loebe, Frank; Herre, Heinrich

    2017-10-04

    Gene Ontology (GO) is the largest resource for cataloging gene products. This resource grows steadily and, naturally, this growth raises issues regarding the structure of the ontology. Moreover, modeling and refactoring large ontologies such as GO is generally far from being simple, as a whole as well as when focusing on certain aspects or fragments. It seems that human-friendly graphical modeling languages such as the Unified Modeling Language (UML) could be helpful in connection with these tasks. We investigate the use of UML for making the structural organization of the Molecular Function Ontology (MFO), a sub-ontology of GO, more explicit. More precisely, we present a UML dialect, called the Function Modeling Language (FueL), which is suited for capturing functions in an ontologically founded way. FueL is equipped, among other features, with language elements that arise from studying patterns of subsumption between functions. We show how to use this UML dialect for capturing the structure of molecular functions. Furthermore, we propose and discuss some refactoring options concerning fragments of MFO. FueL enables the systematic, graphical representation of functions and their interrelations, including making information explicit that is currently either implicit in MFO or is mainly captured in textual descriptions. Moreover, the considered subsumption patterns lend themselves to the methodical analysis of refactoring options with respect to MFO. On this basis we argue that the approach can increase the comprehensibility of the structure of MFO for humans and can support communication, for example, during revision and further development.

  13. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  14. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  15. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Ogunniyi, Abiodun D; Mahdi, Layla K; Paton, James C; Adelson, David L

    2013-01-01

    The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO), which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria) from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s) of infection. It can also aid in the discovery of genes associated with specific function(s) for investigation as a novel vaccine or therapeutic targets. http://turing.ersa.edu.au/BacteriaGO.

  16. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  17. Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Tratz, Stephen C.; Gregory, Michelle L.

    2006-06-08

    With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  18. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  19. Bayesian assignment of gene ontology terms to gene expression experiments.

    Science.gov (United States)

    Sykacek, P

    2012-09-15

    Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Source code under GPL license is available from the author. peter.sykacek@boku.ac.at.

  20. Bayesian assignment of gene ontology terms to gene expression experiments

    Science.gov (United States)

    Sykacek, P.

    2012-01-01

    Motivation: Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. Results: This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Availability: Source code under GPL license is available from the author. Contact: peter.sykacek@boku.ac.at PMID:22962488

  1. The Representation of Heart Development in the Gene Ontology

    Science.gov (United States)

    Khodiyar, Varsha K.; Hill, David P.; Howe, Doug; Berardini, Tanya Z.; Tweedie, Susan; Talmud, Philippa J.; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C.

    2012-01-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. PMID:21419760

  2. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  3. Improving missing value estimation in microarray data with gene ontology.

    Science.gov (United States)

    Tuikkala, Johannes; Elo, Laura; Nevalainen, Olli S; Aittokallio, Tero

    2006-03-01

    Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation. We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments. Java and Matlab codes are available on request from the authors. Available online at http://users.utu.fi/jotatu/GOImpute.html.

  4. A flexible ontology for inference of emergent whole cell function from relationships between subcellular processes.

    Science.gov (United States)

    Hansen, Jens; Meretzky, David; Woldesenbet, Simeneh; Stolovitzky, Gustavo; Iyengar, Ravi

    2017-12-18

    Whole cell responses arise from coordinated interactions between diverse human gene products functioning within various pathways underlying sub-cellular processes (SCP). Lower level SCPs interact to form higher level SCPs, often in a context specific manner to give rise to whole cell function. We sought to determine if capturing such relationships enables us to describe the emergence of whole cell functions from interacting SCPs. We developed the Molecular Biology of the Cell Ontology based on standard cell biology and biochemistry textbooks and review articles. Currently, our ontology contains 5,384 genes, 753 SCPs and 19,180 expertly curated gene-SCP associations. Our algorithm to populate the SCPs with genes enables extension of the ontology on demand and the adaption of the ontology to the continuously growing cell biological knowledge. Since whole cell responses most often arise from the coordinated activity of multiple SCPs, we developed a dynamic enrichment algorithm that flexibly predicts SCP-SCP relationships beyond the current taxonomy. This algorithm enables us to identify interactions between SCPs as a basis for higher order function in a context dependent manner, allowing us to provide a detailed description of how SCPs together can give rise to whole cell functions. We conclude that this ontology can, from omics data sets, enable the development of detailed SCP networks for predictive modeling of emergent whole cell functions.

  5. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  6. Interactome and Gene Ontology provide congruent yet subtly different views of a eukaryotic cell

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2009-07-01

    Full Text Available Abstract Background The characterization of the global functional structure of a cell is a major goal in bioinformatics and systems biology. Gene Ontology (GO and the protein-protein interaction network offer alternative views of that structure. Results This study presents a comparison of the global structures of the Gene Ontology and the interactome of Saccharomyces cerevisiae. Sensitive, unsupervised methods of clustering applied to a large fraction of the proteome led to establish a GO-interactome correlation value of +0.47 for a general dataset that contains both high and low-confidence interactions and +0.58 for a smaller, high-confidence dataset. Conclusion The structures of the yeast cell deduced from GO and interactome are substantially congruent. However, some significant differences were also detected, which may contribute to a better understanding of cell function and also to a refinement of the current ontologies.

  7. The effects of shared information on semantic calculations in the gene ontology.

    Science.gov (United States)

    Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

    2017-01-01

    The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

  8. The effects of shared information on semantic calculations in the gene ontology

    Directory of Open Access Journals (Sweden)

    Paul W. Bible

    2017-01-01

    Full Text Available The structured vocabulary that describes gene function, the gene ontology (GO, serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK, an open source C++ library with Python interface (github.com/paulbible/ggtk.

  9. Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data.

    Science.gov (United States)

    Paul, Animesh Kumar; Shill, Pintu Chandra

    2018-01-01

    The product of gene expression works together in the cell for each living organism in order to achieve different biological processes. Many proteins are involved in different roles depending on the environment of the organism for the functioning of the cell. In this paper, we propose gene ontology (GO) annotations based semi-supervised clustering algorithm called GO fuzzy relational clustering (GO-FRC) where one gene is allowed to be assigned to multiple clusters which are the most biologically relevant behavior of genes. In the clustering process, GO-FRC utilizes useful biological knowledge which is available in the form of a gene ontology, as a prior knowledge along with the gene expression data. The prior knowledge helps to improve the coherence of the groups concerning the knowledge field. The proposed GO-FRC has been tested on the two yeast (Saccharomyces cerevisiae) expression profiles datasets (Eisen and Dream5 yeast datasets) and compared with other state-of-the-art clustering algorithms. Experimental results imply that GO-FRC is able to produce more biologically relevant clusters with the use of the small amount of GO annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants

    Directory of Open Access Journals (Sweden)

    Arthur Zwaenepoel

    2018-03-01

    Full Text Available Recent times have seen an enormous growth of “omics” data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes. A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named “MORPH bulk” (https://github.com/arzwa/morph-bulk, for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.

  11. GOurmet: A tool for quantitative comparison and visualization of gene expression profiles based on gene ontology (GO) distributions

    OpenAIRE

    Doherty, Jason M; Carmichael, Lynn K; Mills, Jason C

    2006-01-01

    Abstract Background The ever-expanding population of gene expression profiles (EPs) from specified cells and tissues under a variety of experimental conditions is an important but difficult resource for investigators to utilize effectively. Software tools have been recently developed to use the distribution of gene ontology (GO) terms associated with the genes in an EP to identify specific biological functions or processes that are over- or under-represented in that EP relative to other EPs. ...

  12. Gene Ontology Terms and Automated Annotation for Energy-Related Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mukhopadhyay, Biswarup [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Tyler, Brett M. [Oregon State Univ., Corvallis, OR (United States); Setubal, Joao [Univ. of Sao Paulo (Brazil); Murali, T. M. [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)

    2017-11-03

    Gene Ontology (GO) is one of the more widely used functional ontologies for describing gene functions at various levels. The project developed 660 GO terms for describing energy-related microbial processes and filled the known gaps in this area of the GO system, and then used these terms to describe functions of 179 genes to showcase the utilities of the new resources. It hosted a series of workshops and made presentations at key meetings to inform and train scientific community members on these terms and to receive inputs from them for the GO term generation efforts. The project has developed a website for storing and displaying the resources (http://www.mengo.biochem.vt.edu/). The outcome of the project was further disseminated through peer-reviewed publications and poster and seminar presentations.

  13. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL

    Directory of Open Access Journals (Sweden)

    Aranguren Mikel

    2007-02-01

    Full Text Available Abstract The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.

  14. Ontology based molecular signatures for immune cell types via gene expression analysis

    Science.gov (United States)

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  15. The mammalian adult neurogenesis gene ontology (MANGO provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Directory of Open Access Journals (Sweden)

    Rupert W Overall

    Full Text Available BACKGROUND: Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. METHODOLOGY/PRINCIPAL FINDINGS: We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. CONCLUSIONS/SIGNIFICANCE: The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already

  16. GOseek: a gene ontology search engine using enhanced keywords.

    Science.gov (United States)

    Taha, Kamal

    2013-01-01

    We propose in this paper a biological search engine called GOseek, which overcomes the limitation of current gene similarity tools. Given a set of genes, GOseek returns the most significant genes that are semantically related to the given genes. These returned genes are usually annotated to one of the Lowest Common Ancestors (LCA) of the Gene Ontology (GO) terms annotating the given genes. Most genes have several annotation GO terms. Therefore, there may be more than one LCA for the GO terms annotating the given genes. The LCA annotating the genes that are most semantically related to the given gene is the one that receives the most aggregate semantic contribution from the GO terms annotating the given genes. To identify this LCA, GOseek quantifies the contribution of the GO terms annotating the given genes to the semantics of their LCAs. That is, it encodes the semantic contribution into a numeric format. GOseek uses microarray experiment data to rank result genes based on their significance. We evaluated GOseek experimentally and compared it with a comparable gene prediction tool. Results showed marked improvement over the tool.

  17. GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets

    Directory of Open Access Journals (Sweden)

    Koop Ben F

    2011-07-01

    Full Text Available Abstract Background The increased accessibility of gene expression tools has enabled a wide variety of experiments utilizing transcriptomic analyses. As these tools increase in prevalence, the need for improved standardization in processing and presentation of data increases, as does the need to guard against interpretation bias. Gene Ontology (GO analysis is a powerful method of interpreting and summarizing biological functions. However, while there are many tools available to investigate GO enrichment, there remains a need for methods that directly remove redundant terms from enriched GO lists that often provide little, if any, additional information. Findings Here we present a simple yet novel method called GO Trimming that utilizes an algorithm designed to reduce redundancy in lists of enriched GO categories. Depending on the needs of the user, this method can be performed with variable stringency. In the example presented here, an initial list of 90 terms was reduced to 54, eliminating 36 largely redundant terms. We also compare this method to existing methods and find that GO Trimming, while simple, performs well to eliminate redundant terms in a large dataset throughout the depth of the GO hierarchy. Conclusions The GO Trimming method provides an alternative to other procedures, some of which involve removing large numbers of terms prior to enrichment analysis. This method should free up the researcher from analyzing overly large, redundant lists, and instead enable the concise presentation of manageable, informative GO lists. The implementation of this tool is freely available at: http://lucy.ceh.uvic.ca/go_trimming/cbr_go_trimming.py

  18. An information theoretic approach to assessing Gene-Ontology-driven similarity and its application.

    Science.gov (United States)

    Wang, Haiying; Azuaje, Francisco; Zheng, Huiru

    2014-01-01

    Using information-theoretic approaches, this paper presents a cross-platform system to support the integration of Gene Ontology (GO)-driven similarity knowledge into functional genomics. Three GO-driven similarity measures (Resnik's, Lin's and Jiang's metrics) have been implemented to measure between-term similarity within each of the GO hierarchies. Two approaches (simple and highest average similarity) which are based on the aggregation of between-term similarities, are used to estimate the similarity between gene products. The system has been successfully applied to a number of applications including assessing gene expression correlation patterns and the relationships between GO-driven similarity and other functional properties.

  19. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

    Science.gov (United States)

    Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

    2016-03-11

    Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.

  20. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Evaluation of clustering algorithms for gene expression data using gene ontology annotations.

    Science.gov (United States)

    Ma, Ning; Zhang, Zheng-Guo

    2012-09-01

    Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation. An external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed. The rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases. The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis.

  2. Determining the semantic similarities among Gene Ontology terms.

    Science.gov (United States)

    Taha, Kamal

    2013-05-01

    We present in this paper novel techniques that determine the semantic relationships among GeneOntology (GO) terms. We implemented these techniques in a prototype system called GoSE, which resides between user application and GO database. Given a set S of GO terms, GoSE would return another set S' of GO terms, where each term in S' is semantically related to each term in S. Most current research is focused on determining the semantic similarities among GO ontology terms based solely on their IDs and proximity to one another in the GO graph structure, while overlooking the contexts of the terms, which may lead to erroneous results. The context of a GO term T is the set of other terms, whose existence in the GO graph structure is dependent on T. We propose novel techniques that determine the contexts of terms based on the concept of existence dependency. We present a stack-based sort-merge algorithm employing these techniques for determining the semantic similarities among GO terms.We evaluated GoSE experimentally and compared it with three existing methods. The results of measuring the semantic similarities among genes in KEGG and Pfam pathways retrieved from the DBGET and Sanger Pfam databases, respectively, have shown that our method outperforms the other three methods in recall and precision.

  3. A novel method incorporating gene ontology information for unsupervised clustering and feature selection.

    Directory of Open Access Journals (Sweden)

    Shireesh Srivastava

    Full Text Available Among the primary goals of microarray analysis is the identification of genes that could distinguish between different phenotypes (feature selection. Previous studies indicate that incorporating prior information of the genes' function could help identify physiologically relevant features. However, current methods that incorporate prior functional information do not provide a relative estimate of the effect of different genes on the biological processes of interest.Here, we present a method that integrates gene ontology (GO information and expression data using Bayesian regression mixture models to perform unsupervised clustering of the samples and identify physiologically relevant discriminating features. As a model application, the method was applied to identify the genes that play a role in the cytotoxic responses of human hepatoblastoma cell line (HepG2 to saturated fatty acid (SFA and tumor necrosis factor (TNF-alpha, as compared to the non-toxic response to the unsaturated FFAs (UFA and TNF-alpha. Incorporation of prior knowledge led to a better discrimination of the toxic phenotypes from the others. The model identified roles of lysosomal ATPases and adenylate cyclase (AC9 in the toxicity of palmitate. To validate the role of AC in palmitate-treated cells, we measured the intracellular levels of cyclic AMP (cAMP. The cAMP levels were found to be significantly reduced by palmitate treatment and not by the other FFAs, in accordance with the model selection of AC9.A framework is presented that incorporates prior ontology information, which helped to (a perform unsupervised clustering of the phenotypes, and (b identify the genes relevant to each cluster of phenotypes. We demonstrate the proposed framework by applying it to identify physiologically-relevant feature genes that conferred differential toxicity to saturated vs. unsaturated FFAs. The framework can be applied to other problems to efficiently integrate ontology information and

  4. Genetic Resources for Advanced Biofuel Production Described with the Gene Ontology

    Directory of Open Access Journals (Sweden)

    Trudy eTorto-Alalibo

    2014-10-01

    Full Text Available Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial Energy Gene Ontology (MENGO: http://www.mengo.biochem.vt.edu project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat, can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  5. Semantic interrogation of a multi knowledge domain ontological model of tendinopathy identifies four strong candidate risk genes.

    Science.gov (United States)

    Saunders, Colleen J; Jalali Sefid Dashti, Mahjoubeh; Gamieldien, Junaid

    2016-01-25

    Tendinopathy is a multifactorial syndrome characterised by tendon pain and thickening, and impaired performance during activity. Candidate gene association studies have identified genetic factors that contribute to intrinsic risk of developing tendinopathy upon exposure to extrinsic factors. Bioinformatics approaches that data-mine existing knowledge for biological relationships may assist with the identification of candidate genes. The aim of this study was to data-mine functional annotation of human genes and identify candidate genes by ontology-seeded queries capturing the features of tendinopathy. Our BioOntological Relationship Graph database (BORG) integrates multiple sources of genomic and biomedical knowledge into an on-disk semantic network where human genes and their orthologs in mouse and rat are central concepts mapped to ontology terms. The BORG was used to screen all human genes for potential links to tendinopathy. Following further prioritisation, four strong candidate genes (COL11A2, ELN, ITGB3, LOX) were identified. These genes are differentially expressed in tendinopathy, functionally linked to features of tendinopathy and previously implicated in other connective tissue diseases. In conclusion, cross-domain semantic integration of multiple sources of biomedical knowledge, and interrogation of phenotypes and gene functions associated with disease, may significantly increase the probability of identifying strong and unobvious candidate genes in genetic association studies.

  6. Networks in biological systems: An investigation of the Gene Ontology as an evolving network

    International Nuclear Information System (INIS)

    Coronnello, C; Tumminello, M; Micciche, S; Mantegna, R.N.

    2009-01-01

    Many biological systems can be described as networks where different elements interact, in order to perform biological processes. We introduce a network associated with the Gene Ontology. Specifically, we construct a correlation-based network where the vertices are the terms of the Gene Ontology and the link between each two terms is weighted on the basis of the number of genes that they have in common. We analyze a filtered network obtained from the correlation-based network and we characterize its evolution over different releases of the Gene Ontology.

  7. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    Science.gov (United States)

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  8. Concept mapping One-Carbon Metabolism to model future ontologies for nutrient-gene-phenotype interactions.

    Science.gov (United States)

    Joslin, A C; Green, R; German, J B; Lange, M C

    2014-09-01

    Advances in the development of bioinformatic tools continue to improve investigators' ability to interrogate, organize, and derive knowledge from large amounts of heterogeneous information. These tools often require advanced technical skills not possessed by life scientists. User-friendly, low-barrier-to-entry methods of visualizing nutrigenomics information are yet to be developed. We utilized concept mapping software from the Institute for Human and Machine Cognition to create a conceptual model of diet and health-related data that provides a foundation for future nutrigenomics ontologies describing published nutrient-gene/polymorphism-phenotype data. In this model, maps containing phenotype, nutrient, gene product, and genetic polymorphism interactions are visualized as triples of two concepts linked together by a linking phrase. These triples, or "knowledge propositions," contextualize aggregated data and information into easy-to-read knowledge maps. Maps of these triples enable visualization of genes spanning the One-Carbon Metabolism (OCM) pathway, their sequence variants, and multiple literature-mined associations including concepts relevant to nutrition, phenotypes, and health. The concept map development process documents the incongruity of information derived from pathway databases versus literature resources. This conceptual model highlights the importance of incorporating information about genes in upstream pathways that provide substrates, as well as downstream pathways that utilize products of the pathway under investigation, in this case OCM. Other genes and their polymorphisms, such as TCN2 and FUT2, although not directly involved in OCM, potentially alter OCM pathway functionality. These upstream gene products regulate substrates such as B12. Constellations of polymorphisms affecting the functionality of genes along OCM, together with substrate and cofactor availability, may impact resultant phenotypes. These conceptual maps provide a foundational

  9. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  10. Zebrafish Expression Ontology of Gene Sets (ZEOGS): A Tool to Analyze Enrichment of Zebrafish Anatomical Terms in Large Gene Sets

    Science.gov (United States)

    Marsico, Annalisa

    2013-01-01

    Abstract The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene

  11. The Proteasix Ontology.

    Science.gov (United States)

    Arguello Casteleiro, Mercedes; Klein, Julie; Stevens, Robert

    2016-06-04

    The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.

  12. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  13. Multi-label literature classification based on the Gene Ontology graph.

    Science.gov (United States)

    Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

    2008-12-08

    The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.

  14. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

    Science.gov (United States)

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.

  15. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  16. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  17. Subcellular localization prediction for human internal and organelle membrane proteins with projected gene ontology scores.

    Science.gov (United States)

    Du, Pufeng; Tian, Yang; Yan, Yan

    2012-11-21

    The membrane proteins make up more than a third of all known human proteins. The subcellular localizations play a key role to elucidate the potential biological functions of these membrane proteins. Although the experimental approaches for determining protein subcellular localizations exist, they are usually costly and time consuming. Thus, computational predictions provided an alternative approach for determining the protein subcellular localizations. However, current subcellular location predictors are generally developed for globular proteins. They did not perform well for membrane proteins. In this paper, we proposed a novel prediction algorithm, namely Projected Gene Ontology Score, which introduces the Gene Ontology annotation as a descriptor of the protein. This algorithm could significantly improve the prediction accuracy for the subcellular localizations of membrane proteins. It can designate each protein to one of the eight different locations, while the existing algorithm only covers three locations. Actually, the biological problem considered by our algorithm goes one level deeper than the existing algorithms. In addition, our algorithm can provide more than one location for the testing protein, which could be very useful in practical studies. Our algorithm is expected to be a good complement to the existing algorithms and has the potential to be extended to solve other problems. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  19. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

    Directory of Open Access Journals (Sweden)

    Jain Shobhit

    2010-11-01

    Full Text Available Abstract Background Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs. They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO. Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. Results We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS, to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. Conclusions The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.

  20. University of Texas Southwestern Medical Center: Functional Signature Ontology Tool: Triplicate Measurements of Reporter Gene Expression in Response to Individual Genetic and Chemical Perturbations in HCT116 Cells | Office of Cancer Genomics

    Science.gov (United States)

    The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets.  Read the abstract

  1. Genetic resources for methane production from biomass described with gene ontology

    Directory of Open Access Journals (Sweden)

    Endang ePurwantini

    2014-12-01

    Full Text Available Methane (CH4 is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH4 into the atmosphere contributes to climate change. Biological CH4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H2 plus CO2, acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO2. In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H2 plus CO2. In this report we present Gene Ontology (GO terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO project. A recently discovered non-canonical CH4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing gold standards for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/.

  2. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications.

    Science.gov (United States)

    Whetzel, Patricia L; Noy, Natalya F; Shah, Nigam H; Alexander, Paul R; Nyulas, Csongor; Tudorache, Tania; Musen, Mark A

    2011-07-01

    The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

  3. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network.

    Science.gov (United States)

    Hur, Junguk; Xiang, Zuoshuang; Feldman, Eva L; He, Yongqun

    2011-08-26

    Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were identified. The asserted

  4. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  5. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  6. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  7. Systems analysis of gene ontology and biological pathways involved in post-myocardial infarction responses.

    Science.gov (United States)

    Nguyen, Nguyen T; Lindsey, Merry L; Jin, Yu-Fang

    2015-01-01

    Pathway analysis has been widely used to gain insight into essential mechanisms of the response to myocardial infarction (MI). Currently, there exist multiple pathway databases that organize molecular datasets and manually curate pathway maps for biological interpretation at varying forms of organization. However, inconsistencies among different databases in pathway descriptions, frequently due to conflicting results in the literature, can generate incorrect interpretations. Furthermore, although pathway analysis software provides detailed images of interactions among molecules, it does not exhibit how pathways interact with one another or with other biological processes under specific conditions. We propose a novel method to standardize descriptions of enriched pathways for a set of genes/proteins using Gene Ontology terms. We used this method to examine the relationships among pathways and biological processes for a set of condition-specific genes/proteins, represented as a functional biological pathway-process network. We applied this algorithm to a set of 613 MI-specific proteins we previously identified. A total of 96 pathways from Biocarta, KEGG, and Reactome, and 448 Gene Ontology Biological Processes were enriched with these 613 proteins. The pathways were represented as Boolean functions of biological processes, delivering an interactive scheme to organize enriched information with an emphasis on involvement of biological processes in pathways. We extracted a network focusing on MI to demonstrate that tyrosine phosphorylation of Signal Transducer and Activator of Transcription (STAT) protein, positive regulation of collagen metabolic process, coagulation, and positive/negative regulation of blood coagulation have immediate impacts on the MI response. Our method organized biological processes and pathways in an unbiased approach to provide an intuitive way to identify biological properties of pathways under specific conditions. Pathways from different

  8. QuickGO: a user tutorial for the web-based Gene Ontology browser.

    Science.gov (United States)

    Huntley, Rachael P; Binns, David; Dimmer, Emily; Barrell, Daniel; O'Donovan, Claire; Apweiler, Rolf

    2009-01-01

    The Gene Ontology (GO) has proven to be a valuable resource for functional annotation of gene products. At well over 27 000 terms, the descriptiveness of GO has increased rapidly in line with the biological data it represents. Therefore, it is vital to be able to easily and quickly mine the functional information that has been made available through these GO terms being associated with gene products. QuickGO is a fast, web-based tool for browsing the GO and all associated GO annotations provided by the GOA group. After undergoing a redevelopment, QuickGO is now able to offer many more features beyond simple browsing. Users have responded well to the new tool and given very positive feedback about its usefulness. This tutorial will demonstrate how some of these features could be useful to the researcher wanting to discover more about their dataset, particular areas of biology or to find new ways of directing their research.Database URL:http://www.ebi.ac.uk/QuickGO.

  9. Extending gene ontology in the context of extracellular RNA and vesicle communication

    NARCIS (Netherlands)

    Cheung, Kei-Hoi; Keerthikumar, Shivakumar; Roncaglia, Paola; Subramanian, Sai Lakshmi; Roth, Matthew E; Samuel, Monisha; Anand, Sushma; Gangoda, Lahiru; Gould, Stephen; Alexander, Roger; Galas, David; Gerstein, Mark B; Hill, Andrew F; Kitchen, Robert R; Lötvall, Jan; Patel, Tushar; Procaccini, Dena C; Quesenberry, Peter; Rozowsky, Joel; Raffai, Robert L; Shypitsyna, Aleksandra; Su, Andrew I; Théry, Clotilde; Vickers, Kasey; Wauben, Marca H M|info:eu-repo/dai/nl/112675735; Mathivanan, Suresh; Milosavljevic, Aleksandar; Laurent, Louise C

    2016-01-01

    BACKGROUND: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA

  10. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  11. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  12. The Saccharomyces Genome Database: Gene Product Annotation of Function, Process, and Component.

    Science.gov (United States)

    Cherry, J Michael

    2015-12-02

    An ontology is a highly structured form of controlled vocabulary. Each entry in the ontology is commonly called a term. These terms are used when talking about an annotation. However, each term has a definition that, like the definition of a word found within a dictionary, provides the complete usage and detailed explanation of the term. It is critical to consult a term's definition because the distinction between terms can be subtle. The use of ontologies in biology started as a way of unifying communication between scientific communities and to provide a standard dictionary for different topics, including molecular functions, biological processes, mutant phenotypes, chemical properties and structures. The creation of ontology terms and their definitions often requires debate to reach agreement but the result has been a unified descriptive language used to communicate knowledge. In addition to terms and definitions, ontologies require a relationship used to define the type of connection between terms. In an ontology, a term can have more than one parent term, the term above it in an ontology, as well as more than one child, the term below it in the ontology. Many ontologies are used to construct annotations in the Saccharomyces Genome Database (SGD), as in all modern biological databases; however, Gene Ontology (GO), a descriptive system used to categorize gene function, is the most extensively used ontology in SGD annotations. Examples included in this protocol illustrate the structure and features of this ontology. © 2015 Cold Spring Harbor Laboratory Press.

  13. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology

    OpenAIRE

    Caniza, Horacio; Romero, Alfonso E.; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-01-01

    Summary: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve...

  14. FUNC: a package for detecting significant associations between gene sets and ontological annotations

    Directory of Open Access Journals (Sweden)

    Rahm Erhard

    2007-02-01

    Full Text Available Abstract Background Genome-wide expression, sequence and association studies typically yield large sets of gene candidates, which must then be further analysed and interpreted. Information about these genes is increasingly being captured and organized in ontologies, such as the Gene Ontology. Relationships between the gene sets identified by experimental methods and biological knowledge can be made explicit and used in the interpretation of results. However, it is often difficult to assess the statistical significance of such analyses since many inter-dependent categories are tested simultaneously. Results We developed the program package FUNC that includes and expands on currently available methods to identify significant associations between gene sets and ontological annotations. Implemented are several tests in particular well suited for genome wide sequence comparisons, estimates of the family-wise error rate, the false discovery rate, a sensitive estimator of the global significance of the results and an algorithm to reduce the complexity of the results. Conclusion FUNC is a versatile and useful tool for the analysis of genome-wide data. It is freely available under the GPL license and also accessible via a web service.

  15. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining.

    Science.gov (United States)

    Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2012-12-20

    Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since

  16. Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

    Science.gov (United States)

    Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

    2016-04-01

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. The role of ontologies in biological and biomedical research: a functional perspective

    KAUST Repository

    Hoehndorf, Robert

    2015-04-10

    Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.

  18. The meaning of the wave function in search of the ontology of quantum mechanics

    CERN Document Server

    Gao, Shan

    2017-01-01

    At the heart of quantum mechanics lies the wave function, a powerful but mysterious mathematical object which has been a hot topic of debate from its earliest stages. Covering much of the recent debate and providing a comprehensive and critical review of competing approaches, this ambitious text provides new, decisive proof of the reality of the wave function. Aiming to make sense of the wave function in quantum mechanics and to find the ontological content of the theory, this book explores new ontological interpretations of the wave function in terms of random discontinuous motion of particles. Finally, the book investigates whether the suggested quantum ontology is complete in solving the measurement problem and if it should be revised in the relativistic domain. A timely addition to the literature on the foundations of quantum mechanics, this book is of value to students and researchers with an interest in the philosophy of physics. Presents a concise introduction to quantum mechanics, including the c...

  19. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  20. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Science.gov (United States)

    Fontana, Paolo; Cestaro, Alessandro; Velasco, Riccardo; Formentin, Elide; Toppo, Stefano

    2009-01-01

    Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  1. Combining sequence and Gene Ontology for protein module detection in the Weighted Network.

    Science.gov (United States)

    Yu, Yang; Liu, Jie; Feng, Nuan; Song, Bo; Zheng, Zeyu

    2017-01-07

    Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  3. Annotating activation/inhibition relationships to protein-protein interactions using gene ontology relations.

    Science.gov (United States)

    Yim, Soorin; Yu, Hasun; Jang, Dongjin; Lee, Doheon

    2018-04-11

    Signaling pathways can be reconstructed by identifying 'effect types' (i.e. activation/inhibition) of protein-protein interactions (PPIs). Effect types are composed of 'directions' (i.e. upstream/downstream) and 'signs' (i.e. positive/negative), thereby requiring directions as well as signs of PPIs to predict signaling events from PPI networks. Here, we propose a computational method for systemically annotating effect types to PPIs using relations between functional information of proteins. We used regulates, positively regulates, and negatively regulates relations in Gene Ontology (GO) to predict directions and signs of PPIs. These relations indicate both directions and signs between GO terms so that we can project directions and signs between relevant GO terms to PPIs. Independent test results showed that our method is effective for predicting both directions and signs of PPIs. Moreover, our method outperformed a previous GO-based method that did not consider the relations between GO terms. We annotated effect types to human PPIs and validated several highly confident effect types against literature. The annotated human PPIs are available in Additional file 2 to aid signaling pathway reconstruction and network biology research. We annotated effect types to PPIs by using regulates, positively regulates, and negatively regulates relations in GO. We demonstrated that those relations are effective for predicting not only signs, but also directions of PPIs. The usefulness of those relations suggests their potential applications to other types of interactions such as protein-DNA interactions.

  4. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information.

    Science.gov (United States)

    Zhang, Shu-Bo; Lai, Jian-Huang

    2015-03-01

    Quantifying the semantic similarities between pairs of terms in the Gene Ontology (GO) structure can help to explore the functional relationships between biological entities. A common approach to this problem is to measure the information they have in common based on the information content of their common ancestors. However, many studies have their limitations in measuring the information two GO terms share. This study presented a new measurement, exclusively inherited shared information (EISI) that captured the information shared by two terms based on an intuitive observation on the multiple inheritance relationships among the terms in the GO graph. EISI was derived from the information content of the exclusively inherited common ancestors (EICAs), which were screened from the common ancestors according to the attribute of their direct children. The effectiveness of EISI was evaluated against some state-of-the-art measurements on both artificial and real datasets, it produced more relevant results with experts' scores on the artificial dataset, and supported the prior knowledge of gene function in pathways on the Saccharomyces genome database (SGD). The promising features of EISI are the following: (1) it provides a more effective way to characterize the semantic relationship between two GO terms by taking into account multiple common ancestors related, and (2) can quickly detect all EICAs with time complexity of O(n), which is much more efficient than other methods based on disjunctive common ancestors. It is a promising alternative to multiple inheritance based methods for practical applications on large-scale dataset. The algorithm EISI was implemented in Matlab and is freely available from http://treaton.evai.pl/EISI/. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  6. Integration of text- and data-mining using ontologies successfully selects disease gene candidates.

    Science.gov (United States)

    Tiffin, Nicki; Kelso, Janet F; Powell, Alan R; Pan, Hong; Bajic, Vladimir B; Hide, Winston A

    2005-01-01

    Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (+/-18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.

  7. The Gene Ontology of eukaryotic cilia and flagella

    NARCIS (Netherlands)

    Roncaglia, P.; Dam, T.J.P. van; Christie, K.R.; Nacheva, L.; Toedt, G.; Huynen, M.A.; Huntley, R.P.; Gibson, T.J.; Lomax, J.

    2017-01-01

    Background: Recent research into ciliary structure and function provides important insights into inherited diseases termed ciliopathies and other cilia-related disorders. This wealth of knowledge needs to be translated into a computational representation to be fully exploitable by the research

  8. Gene Ontology based housekeeping gene selection for RNA-seq normalization.

    Science.gov (United States)

    Chen, Chien-Ming; Lu, Yu-Lun; Sio, Chi-Pong; Wu, Guan-Chung; Tzou, Wen-Shyong; Pai, Tun-Wen

    2014-06-01

    RNA-seq analysis provides a powerful tool for revealing relationships between gene expression level and biological function of proteins. In order to identify differentially expressed genes among various RNA-seq datasets obtained from different experimental designs, an appropriate normalization method for calibrating multiple experimental datasets is the first challenging problem. We propose a novel method to facilitate biologists in selecting a set of suitable housekeeping genes for inter-sample normalization. The approach is achieved by adopting user defined experimentally related keywords, GO annotations, GO term distance matrices, orthologous housekeeping gene candidates, and stability ranking of housekeeping genes. By identifying the most distanced GO terms from query keywords and selecting housekeeping gene candidates with low coefficients of variation among different spatio-temporal datasets, the proposed method can automatically enumerate a set of functionally irrelevant housekeeping genes for pratical normalization. Novel and benchmark testing RNA-seq datasets were applied to demostrate that different selections of housekeeping gene lead to strong impact on differential gene expression analysis, and compared results have shown that our proposed method outperformed other traditional approaches in terms of both sensitivity and specificity. The proposed mechanism of selecting appropriate houskeeping genes for inter-dataset normalization is robust and accurate for differential expression analyses. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  10. Overview of the gene ontology task at BioCreative IV.

    Science.gov (United States)

    Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong

    2014-01-01

    Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO

  11. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    Science.gov (United States)

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  12. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.

    Science.gov (United States)

    Khodiyar, Varsha K; Howe, Doug; Talmud, Philippa J; Breckenridge, Ross; Lovering, Ruth C

    2013-01-01

    For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer's vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer's vesicle determine asymmetry in the developing heart, the direction of 'heart jogging' and the direction of 'heart looping'.  'Heart jogging' is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward 'jog'. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development.    We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging.  We found that the human, mouse and zebrafish 'heart jogging orthologs' are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging.    This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.

  13. Closing the loop: from paper to protein annotation using supervised Gene Ontology classification.

    Science.gov (United States)

    Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

    2014-01-01

    Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full text. At this time, results were judged far from reaching the performances required by real curation workflows. In particular, supervised approaches produced the most disappointing results because of lack of training data. Ten years later, the available curation data have massively grown. In 2013, the BioCreative IV GO task revisited the automatic GO assignment task. For this issue, we investigated the power of our supervised classifier, GOCat. GOCat computes similarities between an input text and already curated instances contained in a knowledge base to infer GO concepts. The subtask A consisted in selecting GO evidence sentences for a relevant gene in a full text. For this, we designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set, and obtained fair results. The subtask B consisted in predicting GO concepts from the previous output. For this, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top 20 outputted concepts. Contrary to previous competitions, machine learning has this time outperformed standard dictionary-based approaches. Thanks to BioCreative IV, we were able to design a complete workflow for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts. Contrary to previous competitions, machine learning this time outperformed dictionary-based systems. Observed performances are sufficient for being used in a real semiautomatic curation workflow. GOCat is available at http://eagl.unige.ch/GOCat/. http://eagl.unige.ch/GOCat4FT/.

  14. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    International Nuclear Information System (INIS)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G.; Law, R. David

    2012-01-01

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  15. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    Energy Technology Data Exchange (ETDEWEB)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G. [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada); Law, R. David, E-mail: dlaw@lakeheadu.ca [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada)

    2012-10-15

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  16. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data.

    Science.gov (United States)

    Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan

    2016-01-01

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

  17. Ontology or formal ontology

    Science.gov (United States)

    Žáček, Martin

    2017-07-01

    Ontology or formal ontology? Which word is correct? The aim of this article is to introduce correct terms and explain their basis. Ontology describes a particular area of interest (domain) in a formal way - defines the classes of objects that are in that area, and relationships that may exist between them. Meaning of ontology consists mainly in facilitating communication between people, improve collaboration of software systems and in the improvement of systems engineering. Ontology in all these areas offer the possibility of unification of view, maintaining consistency and unambiguity.

  18. FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences.

    Directory of Open Access Journals (Sweden)

    Federico Minneci

    Full Text Available To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information - either experimentally validated or computationally inferred by similarity - remains completely missing for approximately 30% of human proteins. FFPred was initially developed to bridge this gap by targeting sequences with distant or no homologues of known function and by exploiting clear patterns of intrinsic disorder associated with particular molecular activities and biological processes. Here, we present an updated and improved version, which builds on larger datasets of protein sequences and annotations, and uses updated component feature predictors as well as revised training procedures. FFPred 2.0 includes support vector regression models for the prediction of 442 Gene Ontology (GO terms, which largely expand the coverage of the ontology and of the biological process category in particular. The GO term list mainly revolves around macromolecular interactions and their role in regulatory, signalling, developmental and metabolic processes. Benchmarking experiments on newly annotated proteins show that FFPred 2.0 provides more accurate functional assignments than its predecessor and the ProtFun server do; also, its assignments can complement information obtained using BLAST-based transfer of annotations, improving especially prediction in the biological process category. Furthermore, FFPred 2.0 can be used to annotate proteins belonging to several eukaryotic organisms with a limited decrease in prediction quality. We illustrate all these points through the use of both precision-recall plots and of the COGIC scores, which we recently proposed as an alternative numerical evaluation measure of function prediction accuracy.

  19. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  1. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

    Directory of Open Access Journals (Sweden)

    Xiaomei Wu

    Full Text Available BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS, which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC. RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS. HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.

  2. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  3. ( Euphausia superba ) transcriptome to identify function genes and ...

    Indian Academy of Sciences (India)

    MA

    Further analysis produced 106,250 unigenes, of which. 31,683 were annotated based on protein homology searches against protein databases. Gene. Ontology (GO) analysis showed that Ion binding, organic substance metabolic process, and cell part were the most abundant terms in molecular function, biological process ...

  4. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Directory of Open Access Journals (Sweden)

    Shibiao Wan

    Full Text Available Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  5. Epistemic Function and Ontology of Analog and Digital Images

    Directory of Open Access Journals (Sweden)

    Aleksandra Łukaszewicz Alcaraz

    2016-01-01

    Full Text Available The important epistemic function of photographic images is their active role in construction and reconstruction of our beliefs concerning the world and human identity, since we often consider photographs as presenting reality or even the Real itself. Because photography can convince people of how different social and ethnic groups and even they themselves look, documentary projects and the dissemination of photographic practices supported the transition from disciplinary society to the present-day society of control. While both analog and digital images are formed from the same basic materia, the ways in which this matter appears are distinctive. In the case of analog photography, we deal with physical and chemical matter, whereas with digital images we face electronic matter. Because digital photography allows endless modification of the image, we can no longer believe in the truthfulness of digital images.

  6. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer.

    Directory of Open Access Journals (Sweden)

    Malin Lando

    2009-11-01

    Full Text Available Integrative analysis of gene dosage, expression, and ontology (GO data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1 and 13q (FAM48A, MED4 correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.

  7. Ontology Sparse Vector Learning Algorithm for Ontology Similarity Measuring and Ontology Mapping via ADAL Technology

    Science.gov (United States)

    Gao, Wei; Zhu, Linli; Wang, Kaiyun

    2015-12-01

    Ontology, a model of knowledge representation and storage, has had extensive applications in pharmaceutics, social science, chemistry and biology. In the age of “big data”, the constructed concepts are often represented as higher-dimensional data by scholars, and thus the sparse learning techniques are introduced into ontology algorithms. In this paper, based on the alternating direction augmented Lagrangian method, we present an ontology optimization algorithm for ontological sparse vector learning, and a fast version of such ontology technologies. The optimal sparse vector is obtained by an iterative procedure, and the ontology function is then obtained from the sparse vector. Four simulation experiments show that our ontological sparse vector learning model has a higher precision ratio on plant ontology, humanoid robotics ontology, biology ontology and physics education ontology data for similarity measuring and ontology mapping applications.

  8. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

    Science.gov (United States)

    Diehl, Alexander D; Meehan, Terrence F; Bradford, Yvonne M; Brush, Matthew H; Dahdul, Wasila M; Dougall, David S; He, Yongqun; Osumi-Sutherland, David; Ruttenberg, Alan; Sarntivijai, Sirarat; Van Slyke, Ceri E; Vasilevsky, Nicole A; Haendel, Melissa A; Blake, Judith A; Mungall, Christopher J

    2016-07-04

    The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the

  9. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat

    2017-09-27

    Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.

  10. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

    Science.gov (United States)

    Zhang, Shu-Bo; Lai, Jian-Huang

    2016-07-15

    Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics

    Science.gov (United States)

    Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth

    2018-01-01

    Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578

  12. Exact Score Distribution Computation for Similarity Searches in Ontologies

    Science.gov (United States)

    Schulz, Marcel H.; Köhler, Sebastian; Bauer, Sebastian; Vingron, Martin; Robinson, Peter N.

    Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., protein function prediction with the Gene Ontology. In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik’s definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology.

  13. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  14. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

    Science.gov (United States)

    Cheng, Liang; Jiang, Yue; Ju, Hong; Sun, Jie; Peng, Jiajie; Zhou, Meng; Hu, Yang

    2018-01-19

    Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.

  15. Ontology Repositories

    OpenAIRE

    Hartmann, J.; Palma, R.; Gómez-Pérez, A.

    2009-01-01

    The growing use and application of ontologies in the last years has led to an increased interest of researchers and practitioners in the development of ontologies, either from scratch o by reusing existing ones. ...

  16. Quantum ontologies

    International Nuclear Information System (INIS)

    Stapp, H.P.

    1988-12-01

    Quantum ontologies are conceptions of the constitution of the universe that are compatible with quantum theory. The ontological orientation is contrasted to the pragmatic orientation of science, and reasons are given for considering quantum ontologies both within science, and in broader contexts. The principal quantum ontologies are described and evaluated. Invited paper at conference: Bell's Theorem, Quantum Theory, and Conceptions of the Universe, George Mason University, October 20-21, 1988. 16 refs

  17. The Identification and the Functional Validation of Eye Development and Regeneration Genes in Schmidtea Mediterranea

    OpenAIRE

    Calvo Lozano, Beatriz

    2015-01-01

    Discovering the master genes necessary to build the eye in an invertebrate model such as S. mediterranea could help us to understand numerous retinopathies and age-related degeneration of the human eye. The aim of this study was to select and determine the functional activity of genes involved in the regeneration and development of the S. mediterranea eye. Gene ontology was the tool used to select the genes; while RNA interference and RNA hybridization provided the first approach towards esta...

  18. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database.

    Science.gov (United States)

    Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2016-01-01

    The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu. © The Author(s) 2016. Published by Oxford University Press.

  19. Ontology Evaluation

    OpenAIRE

    Vrandecic, Zdenko

    2010-01-01

    Ontology evaluation is the task of measuring the quality of an ontology. It enables us to answer the following main question: How to assess the quality of an ontology for the Web? In this thesis a theoretical framework and several methods breathing life into the framework are presented. The application to the above scenarios is explored, and the theoretical foundations are thoroughly grounded in the practical usage of the emerging Semantic Web.

  20. Transcriptome data and gene ontology analysis in human macrophages ingesting modified lipoproteins in the presence or absence of complement protein C1q

    Directory of Open Access Journals (Sweden)

    Minh-Minh Ho

    2016-12-01

    Full Text Available We characterized the transcriptional effects of complement opsonization on foam cell formation in human monocyte-derived macrophages (HMDM. RNA-sequencing was used to identify the pathways modulated by complement protein C1q during HMDM ingestion of the atherogenic lipoproteins oxidized low density lipoprotein (oxLDL and acetylated low density lipoprotein (acLDL. All raw data were submitted to the MIAME-compliant database Gene Expression Omnibus (accession number GEO: GSE80442; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80442. Data presented here include Venn diagram overviews of up- and down-regulated genes for each condition tested, gene ontology analyses of biological processes, molecular functions and cellular components and KEGG pathway analysis. Further investigation of the pathways modulated by C1q in HMDM during ingestion of atherogenic lipoproteins and their functional relevance are described in “Macrophage molecular signaling and inflammatory responses during ingestion of atherogenic lipoproteins are modulated by complement protein C1q” (M.M. Ho, A. Manughian-Peter, W.R. Spivia, A. Taylor, D.A. Fraser, 2016 [1].

  1. Discriminative local subspaces in gene expression data for effective gene function prediction.

    Science.gov (United States)

    Puelma, Tomas; Gutiérrez, Rodrigo A; Soto, Alvaro

    2012-09-01

    Massive amounts of genome-wide gene expression data have become available, motivating the development of computational approaches that leverage this information to predict gene function. Among successful approaches, supervised machine learning methods, such as Support Vector Machines (SVMs), have shown superior prediction accuracy. However, these methods lack the simple biological intuition provided by co-expression networks (CNs), limiting their practical usefulness. In this work, we present Discriminative Local Subspaces (DLS), a novel method that combines supervised machine learning and co-expression techniques with the goal of systematically predict genes involved in specific biological processes of interest. Unlike traditional CNs, DLS uses the knowledge available in Gene Ontology (GO) to generate informative training sets that guide the discovery of expression signatures: expression patterns that are discriminative for genes involved in the biological process of interest. By linking genes co-expressed with these signatures, DLS is able to construct a discriminative CN that links both, known and previously uncharacterized genes, for the selected biological process. This article focuses on the algorithm behind DLS and shows its predictive power using an Arabidopsis thaliana dataset and a representative set of 101 GO terms from the Biological Process Ontology. Our results show that DLS has a superior average accuracy than both SVMs and CNs. Thus, DLS is able to provide the prediction accuracy of supervised learning methods while maintaining the intuitive understanding of CNs. A MATLAB® implementation of DLS is available at http://virtualplant.bio.puc.cl/cgi-bin/Lab/tools.cgi.

  2. Ontological dependency

    NARCIS (Netherlands)

    Stamper, R.K.

    1996-01-01

    Successful ontological analysis depends upon having the right underlying theory. The work described here, exploring how to understand organisations as systems of social norms found that the familiar objectivist position did not work, eventually replacing it with a radically subjectivist ontology

  3. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  4. Using Network Extracted Ontologies to Identify Novel Genes with Roles in Appressorium Development in the Rice Blast Fungus Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Ryan M. Ames

    2017-01-01

    Full Text Available Magnaporthe oryzae is the causal agent of rice blast disease, the most important infection of rice worldwide. Half the world’s population depends on rice for its primary caloric intake and, as such, rice blast poses a serious threat to food security. The stages of M. oryzae infection are well defined, with the formation of an appressorium, a cell type that allows penetration of the plant cuticle, particularly well studied. However, many of the key pathways and genes involved in this disease stage are yet to be identified. In this study, I have used network-extracted ontologies (NeXOs, hierarchical structures inferred from RNA-Seq data, to identify pathways involved in appressorium development, which in turn highlights novel genes with potential roles in this process. This study illustrates the use of NeXOs for pathway identification from large-scale genomics data and also identifies novel genes with potential roles in disease. The methods presented here will be useful to study disease processes in other pathogenic species and these data represent predictions of novel targets for intervention in M. oryzae.

  5. PDON: Parkinson's disease ontology for representation and modeling of the Parkinson's disease knowledge domain.

    Science.gov (United States)

    Younesi, Erfan; Malhotra, Ashutosh; Gündel, Michaela; Scordis, Phil; Kodamullil, Alpha Tom; Page, Matt; Müller, Bernd; Springstubbe, Stephan; Wüllner, Ullrich; Scheller, Dieter; Hofmann-Apitius, Martin

    2015-09-22

    Despite the unprecedented and increasing amount of data, relatively little progress has been made in molecular characterization of mechanisms underlying Parkinson's disease. In the area of Parkinson's research, there is a pressing need to integrate various pieces of information into a meaningful context of presumed disease mechanism(s). Disease ontologies provide a novel means for organizing, integrating, and standardizing the knowledge domains specific to disease in a compact, formalized and computer-readable form and serve as a reference for knowledge exchange or systems modeling of disease mechanism. The Parkinson's disease ontology was built according to the life cycle of ontology building. Structural, functional, and expert evaluation of the ontology was performed to ensure the quality and usability of the ontology. A novelty metric has been introduced to measure the gain of new knowledge using the ontology. Finally, a cause-and-effect model was built around PINK1 and two gene expression studies from the Gene Expression Omnibus database were re-annotated to demonstrate the usability of the ontology. The Parkinson's disease ontology with a subclass-based taxonomic hierarchy covers the broad spectrum of major biomedical concepts from molecular to clinical features of the disease, and also reflects different views on disease features held by molecular biologists, clinicians and drug developers. The current version of the ontology contains 632 concepts, which are organized under nine views. The structural evaluation showed the balanced dispersion of concept classes throughout the ontology. The functional evaluation demonstrated that the ontology-driven literature search could gain novel knowledge not present in the reference Parkinson's knowledge map. The ontology was able to answer specific questions related to Parkinson's when evaluated by experts. Finally, the added value of the Parkinson's disease ontology is demonstrated by ontology-driven modeling of PINK1

  6. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  7. Identification of protein features encoded by alternative exons using Exon Ontology.

    Science.gov (United States)

    Tranchevent, Léon-Charles; Aubé, Fabien; Dulaurier, Louis; Benoit-Pilven, Clara; Rey, Amandine; Poret, Arnaud; Chautard, Emilie; Mortada, Hussein; Desmet, François-Olivier; Chakrama, Fatima Zahra; Moreno-Garcia, Maira Alejandra; Goillot, Evelyne; Janczarski, Stéphane; Mortreux, Franck; Bourgeois, Cyril F; Auboeuf, Didier

    2017-06-01

    Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named "Exon Ontology," based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information. © 2017 Tranchevent et al.; Published by Cold Spring Harbor Laboratory Press.

  8. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    Science.gov (United States)

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  9. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    Directory of Open Access Journals (Sweden)

    Ashutosh Malhotra

    Full Text Available In order to retrieve useful information from scientific literature and electronic medical records (EMR we developed an ontology specific for Multiple Sclerosis (MS.The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology.Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73. The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod. The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports.The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  10. MetaGO: Predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping.

    Science.gov (United States)

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's-homology based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's-homology based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence-homology based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  11. The Use of Gene Ontology Term and KEGG Pathway Enrichment for Analysis of Drug Half-Life.

    Directory of Open Access Journals (Sweden)

    Yu-Hang Zhang

    Full Text Available A drug's biological half-life is defined as the time required for the human body to metabolize or eliminate 50% of the initial drug dosage. Correctly measuring the half-life of a given drug is helpful for the safe and accurate usage of the drug. In this study, we investigated which gene ontology (GO terms and biological pathways were highly related to the determination of drug half-life. The investigated drugs, with known half-lives, were analyzed based on their enrichment scores for associated GO terms and KEGG pathways. These scores indicate which GO terms or KEGG pathways the drug targets. The feature selection method, minimum redundancy maximum relevance, was used to analyze these GO terms and KEGG pathways and to identify important GO terms and pathways, such as sodium-independent organic anion transmembrane transporter activity (GO:0015347, monoamine transmembrane transporter activity (GO:0008504, negative regulation of synaptic transmission (GO:0050805, neuroactive ligand-receptor interaction (hsa04080, serotonergic synapse (hsa04726, and linoleic acid metabolism (hsa00591, among others. This analysis confirmed our results and may show evidence for a new method in studying drug half-lives and building effective computational methods for the prediction of drug half-lives.

  12. SUGOI: automated ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2015-04-01

    Full Text Available A foundational ontology can solve interoperability issues among the domain ontologies aligned to it. However, several foundational ontologies have been developed, hence such interoperability issues exist among domain ontologies. The novel SUGOI tool...

  13. Inferring ontology graph structures using OWL reasoning.

    Science.gov (United States)

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  14. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  15. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

      This thesis entitled "Function analysis of unknown genes" presents the use of proteome analysis for the characterisation of yeast (Saccharomyces cerevisiae) genes and their products (proteins especially those of unknown function). This study illustrates that proteome analysis can be used...... be obtained using proteome analysis. Chapter 1 and 2 provide the basic theoretical aspects of proteome analysis, its principles, the main techniques involved and their use in the studies of the molecular biology of yeast cells. Chapter 3 presents the methods and tools involved in proteome analysis and used...... presents a comparison of the proteomes of three yeast wild type strains CEN.PK2, FY1679 and W303 that are widely used in function analysis projects and proves that FY1679 and W303 strains are more similar to each other than to the CEN.PK2 strain. This study identifies 62 proteins that are differentially...

  16. The Ontology for Biomedical Investigations.

    Science.gov (United States)

    Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias; Brush, Matthew H; Bug, Bill; Chibucos, Marcus C; Clancy, Kevin; Courtot, Mélanie; Derom, Dirk; Dumontier, Michel; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Gibson, Frank; Gonzalez-Beltran, Alejandra; Haendel, Melissa A; He, Yongqun; Heiskanen, Mervi; Hernandez-Boussard, Tina; Jensen, Mark; Lin, Yu; Lister, Allyson L; Lord, Phillip; Malone, James; Manduchi, Elisabetta; McGee, Monnie; Morrison, Norman; Overton, James A; Parkinson, Helen; Peters, Bjoern; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H; Schober, Daniel; Smith, Barry; Soldatova, Larisa N; Stoeckert, Christian J; Taylor, Chris F; Torniai, Carlo; Turner, Jessica A; Vita, Randi; Whetzel, Patricia L; Zheng, Jie

    2016-01-01

    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed

  17. The Ontology for Biomedical Investigations.

    Directory of Open Access Journals (Sweden)

    Anita Bandrowski

    Full Text Available The Ontology for Biomedical Investigations (OBI is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI and Phenotype Attribute and Trait Ontology (PATO without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT. The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org providing details on the people, policies, and issues being

  18. Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology.

    Directory of Open Access Journals (Sweden)

    Lars Malmström

    2007-04-01

    Full Text Available Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown. Yeast proteins were parsed into 14,934 domains, and those lacking sequence similarity to proteins of known structure were folded using the Rosetta de novo structure prediction method on the World Community Grid. This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach. We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them. We have also assigned structural annotations to 7,094 predicted domains based on fold recognition and homology modeling methods. The domain predictions and structural information are available in an online database at http://rd.plos.org/10.1371_journal.pbio.0050076_01.

  19. SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments.

    Science.gov (United States)

    Hayes, Wayne B; Mamano, Nil

    2017-12-07

    Gene Ontology (GO) terms are frequently used to score alignments between protein-protein interaction (PPI) networks. Methods exist to measure GO similarity between proteins in isolation, but proteins in a network alignment are not isolated: each pairing is dependent on every other via the alignment itself. Existing measures fail to take into account the frequency of GO terms across networks, instead imposing arbitrary rules on when to allow GO terms. Here we develop NetGO, a new measure that naturally weighs infrequent, informative GO terms more heavily than frequent, less informative GO terms, without arbitrary cutoffs, instead downweighting GO terms according to their frequency in the networks being aligned. This is a global measure applicable only to alignments, independent of pairwise GO measures, in the same sense that the edge-based EC or S3 scores are global measures of topological similarity independent of pairwise topological similarities. We demonstrate the superiority of NetGO in alignments of predetermined quality and show that NetGO correlates with alignment quality better than any existing GO-based alignment measures. We also demonstrate that NetGO provides a measure of taxonomic similarity between species, consistent with existing taxonomic measuresa feature not shared with existing GObased network alignment measures. Finally, we re-score alignments produced by almost a dozen aligners from a previous study and show that NetGO does a better job at separating good alignments from bad ones. available as part of SANA. whayes@uci.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Stromal Gene Expression and Function in Primary Breast Tumors that Metastasize to Bone Cancer

    Science.gov (United States)

    2006-07-01

    significantly altered gene ontology categories were 7 Description Common Function stefin A1 stfa1 cystatin , cathepsin inhibitor breast cancer ... cancer . Breast Cancer Res, 7: 33-36, 2005. 5. Alcaraz, J., Nelson, C . M., and Bissell, M. J. Biomechanical approaches for studying integration of...analysis of fibroblastic stromagenesis in breast cancer progression. J Mammary Gland Biol Neoplasia, 9: 311-324, 2004. 7. Kuperwasser, C ., Chavarria, T

  1. Rotavirus gene structure and function.

    OpenAIRE

    Estes, M K; Cohen, J

    1989-01-01

    Knowledge of the structure and function of the genes and proteins of the rotaviruses has expanded rapidly. Information obtained in the last 5 years has revealed unexpected and unique molecular properties of rotavirus proteins of general interest to virologists, biochemists, and cell biologists. Rotaviruses share some features of replication with reoviruses, yet antigenic and molecular properties of the outer capsid proteins, VP4 (a protein whose cleavage is required for infectivity, possibly ...

  2. Building ontologies with basic formal ontology

    CERN Document Server

    Arp, Robert; Spear, Andrew D.

    2015-01-01

    In the era of "big data," science is increasingly information driven, and the potential for computers to store, manage, and integrate massive amounts of data has given rise to such new disciplinary fields as biomedical informatics. Applied ontology offers a strategy for the organization of scientific information in computer-tractable form, drawing on concepts not only from computer and information science but also from linguistics, logic, and philosophy. This book provides an introduction to the field of applied ontology that is of particular relevance to biomedicine, covering theoretical components of ontologies, best practices for ontology design, and examples of biomedical ontologies in use. After defining an ontology as a representation of the types of entities in a given domain, the book distinguishes between different kinds of ontologies and taxonomies, and shows how applied ontology draws on more traditional ideas from metaphysics. It presents the core features of the Basic Formal Ontology (BFO), now u...

  3. Ontological Surprises

    DEFF Research Database (Denmark)

    Leahu, Lucian

    2016-01-01

    This paper investigates how we might rethink design as the technological crafting of human-machine relations in the context of a machine learning technique called neural networks. It analyzes Google’s Inceptionism project, which uses neural networks for image recognition. The surprising output...... a hybrid approach where machine learning algorithms are used to identify objects as well as connections between them; finally, it argues for remaining open to ontological surprises in machine learning as they may enable the crafting of different relations with and through technologies....

  4. Functional characterization of endogenous siRNA target genes in Caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Heikkinen Liisa

    2008-06-01

    Full Text Available Abstract Background Small interfering RNA (siRNA molecules mediate sequence specific silencing in RNA interference (RNAi, a gene regulatory phenomenon observed in almost all organisms. Large scale sequencing of small RNA libraries obtained from C. elegans has revealed that a broad spectrum of siRNAs is endogenously transcribed from genomic sequences. The biological role and molecular diversity of C. elegans endogenous siRNA (endo-siRNA molecules, nonetheless, remain poorly understood. In order to gain insight into their biological function, we annotated two large libraries of endo-siRNA sequences, identified their cognate targets, and performed gene ontology analysis to identify enriched functional categories. Results Systematic trends in categorization of target genes according to the specific length of siRNA sequences were observed: 18- to 22-mer siRNAs were associated with genes required for embryonic development; 23-mers were associated uniquely with post-embryonic development; 24–26-mers were associated with phosphorus metabolism or protein modification. Moreover, we observe that some argonaute related genes associate with siRNAs with multiple reads. Sequence frequency graphs suggest that different lengths of siRNAs share similarities in overall sequence structure: the 5' end begins with G, while the body predominates with U and C. Conclusion These results suggest that the lengths of endogenous siRNA molecules are consequential to their biological functions since the gene ontology categories for their cognate mRNA targets vary depending upon their lengths.

  5. Building a biomedical ontology recommender web service

    Directory of Open Access Journals (Sweden)

    Jonquet Clement

    2010-06-01

    Full Text Available Abstract Background Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use. Methods We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal. Results We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated ‘very relevant’ by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version is available to the community and is embedded into BioPortal.

  6. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition.

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2013-04-21

    Prediction of protein subcellular localization is an important yet challenging problem. Recently, several computational methods based on Gene Ontology (GO) have been proposed to tackle this problem and have demonstrated superiority over methods based on other features. Existing GO-based methods, however, do not fully use the GO information. This paper proposes an efficient GO method called GOASVM that exploits the information from the GO term frequencies and distant homologs to represent a protein in the general form of Chou's pseudo-amino acid composition. The method first selects a subset of relevant GO terms to form a GO vector space. Then for each protein, the method uses the accession number (AC) of the protein or the ACs of its homologs to find the number of occurrences of the selected GO terms in the Gene Ontology annotation (GOA) database as a means to construct GO vectors for support vector machines (SVMs) classification. With the advantages of GO term frequencies and a new strategy to incorporate useful homologous information, GOASVM can achieve a prediction accuracy of 72.2% on a new independent test set comprising novel proteins that were added to Swiss-Prot six years later than the creation date of the training set. GOASVM and Supplementary materials are available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/GOASVM.html. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Detecting Functional Structures in E. coli Gene Networks from Expression Data

    Science.gov (United States)

    Chen, Tianlong; Opitz, Madeleine; Bassler, Kevin E.

    The rapidly growing amount of available gene expression data for many organisms makes the development of robust systematic methods for determining the structure and function of regulatory networks from that data an important goal. Recently, methods that use the context likelihood of relatedness to infer a network and then use modularity maximizing community detection algorithms on the inferred network to find the functional structure were shown to be effective. Improvements of these methods will be presented and applied to systematically study Escherichia coli expression data. First robust functionally related communities of genes are identified and then the structure of the more closely related genes within those communities are determined. Results will be compared with gene ontology terms and the RegulonDB database. Predictions of a number of significant new regulatory relations are found. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.

  8. Sample prep for proteomics of breast cancer: proteomics and gene ontology reveal dramatic differences in protein solubilization preferences of radioimmunoprecipitation assay and urea lysis buffers

    Directory of Open Access Journals (Sweden)

    Ngoka Lambert CM

    2008-10-01

    Full Text Available Abstract Background An important step in the proteomics of solid tumors, including breast cancer, consists of efficiently extracting most of proteins in the tumor specimen. For this purpose, Radio-Immunoprecipitation Assay (RIPA buffer is widely employed. RIPA buffer's rapid and highly efficient cell lysis and good solubilization of a wide range of proteins is further augmented by its compatibility with protease and phosphatase inhibitors, ability to minimize non-specific protein binding leading to a lower background in immunoprecipitation, and its suitability for protein quantitation. Results In this work, the insoluble matter left after RIPA buffer extraction of proteins from breast tumors are subjected to another extraction step, using a urea-based buffer. It is shown that RIPA and urea lysis buffers fractionate breast tissue proteins primarily on the basis of molecular weights. The average molecular weight of proteins that dissolve exclusively in urea buffer is up to 60% higher than in RIPA. Gene Ontology (GO and Directed Acyclic Graphs (DAG are used to map the collective biological and biophysical attributes of the RIPA and urea proteomes. The Cellular Component and Molecular Function annotations reveal protein solubilization preferences of the buffers, especially the compartmentalization and functional distributions. It is shown that nearly all extracellular matrix proteins (ECM in the breast tumors and matched normal tissues are found, nearly exclusively, in the urea fraction, while they are mostly insoluble in RIPA buffer. Additionally, it is demonstrated that cytoskeletal and extracellular region proteins are more soluble in urea than in RIPA, whereas for nuclear, cytoplasmic and mitochondrial proteins, RIPA buffer is preferred. Extracellular matrix proteins are highly implicated in cancer, including their proteinase-mediated degradation and remodelling, tumor development, progression, adhesion and metastasis. Thus, if they are not

  9. Markov Chain Ontology Analysis (MCOA

    Directory of Open Access Journals (Sweden)

    Frost H

    2012-02-01

    Full Text Available Abstract Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO, the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing

  10. Markov Chain Ontology Analysis (MCOA).

    Science.gov (United States)

    Frost, H Robert; McCray, Alexa T

    2012-02-03

    Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.

  11. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    Directory of Open Access Journals (Sweden)

    Lijing Xu

    2011-04-01

    Full Text Available High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05. These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT. GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.GCAT is freely available at http://binf1.memphis.edu/gcat.

  12. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    Science.gov (United States)

    Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin

    2011-04-14

    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPvmethod to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.

  13. GeneMANIA: Fast gene network construction and function prediction for Cytoscape [v1; ref status: indexed, http://f1000r.es/3rv

    Directory of Open Access Journals (Sweden)

    Jason Montojo

    2014-07-01

    Full Text Available The GeneMANIA Cytoscape app enables users to construct a composite gene-gene functional interaction network from a gene list. The resulting network includes the genes most related to the original list, and functional annotations from Gene Ontology. The edges are annotated with details about the publication or data source the interactions were derived from. The app leverages GeneMANIA’s database of 1800+ networks, containing over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. Users may also import their own organisms, networks, and expression profiles. The app is compatible with Cytoscape versions 2 and 3.

  14. OntologyWidget – a reusable, embeddable widget for easily locating ontology terms

    Directory of Open Access Journals (Sweden)

    Skene JH Pate

    2007-09-01

    Widget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website 1, as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from http://smd.stanford.edu/ontologyWidget/.

  15. Genes2FANs: connecting genes through functional association networks

    Science.gov (United States)

    2012-01-01

    Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in

  16. Una visión general sobre las imágenes del área de la salud: una propuesta de construcción de una ontología

    Directory of Open Access Journals (Sweden)

    Virginia Bentes Pinto

    2011-10-01

    Full Text Available Las imágenes del área de la salud son de gran importancia para confirmar la existencia o no de una enfermedad, lo que permite una mayor precisión en los diagnósticos y el tratamiento de patologías. Son ricas fuentes de información y, por lo tanto, requieren una organización informacional. Es en ese contexto que se inscribe este artículo en donde se presentan los resultados de una investigación cuyo objetivo es planificar y construir una ontología de la imagen del campo de la nefrología destinado a la representación de la indización y la recuperación de imágenes en ámbitos electrónicos. El corpus de este estudio son las historias clínicas electrónicas de los pacientes. La ontología del texto verbal se construyó con el software Protégé -Universidad de Stanford- e importada para el Active Media Software - Ontology Based Annotation system -Universidad Sheffield- para la construcción de la ontología de la imagen. Los resultados muestran que es posible construir ontologías de textos verbales y no verbales, con la unión de estos dos programas.

  17. Drug target ontology to classify and integrate drug discovery data.

    Science.gov (United States)

    Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande; Turner, John Paul; Vidovic, Dusica; Forlin, Michele; Koleti, Amar; Nguyen, Dac-Trung; Jensen, Lars Juhl; Guha, Rajarshi; Mathias, Stephen L; Ursu, Oleg; Stathias, Vasileios; Duan, Jianbin; Nabizadeh, Nooshin; Chung, Caty; Mader, Christopher; Visser, Ubbo; Yang, Jeremy J; Bologa, Cristian G; Oprea, Tudor I; Schürer, Stephan C

    2017-11-09

    model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/ , Github ( http://github.com/DrugTargetOntology/DTO ), and the NCBO Bioportal ( http://bioportal.bioontology.org/ontologies/DTO ). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource.

  18. Margin based ontology sparse vector learning algorithm and applied in biology science.

    Science.gov (United States)

    Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad

    2017-01-01

    In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  19. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

    Science.gov (United States)

    He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

    2018-02-20

    Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as

  20. An ontology approach to comparative phenomics in plants.

    Science.gov (United States)

    Oellrich, Anika; Walls, Ramona L; Cannon, Ethalinda Ks; Cannon, Steven B; Cooper, Laurel; Gardiner, Jack; Gkoutos, Georgios V; Harper, Lisa; He, Mingze; Hoehndorf, Robert; Jaiswal, Pankaj; Kalberer, Scott R; Lloyd, John P; Meinke, David; Menda, Naama; Moore, Laura; Nelson, Rex T; Pujar, Anuradha; Lawrence, Carolyn J; Huala, Eva

    2015-01-01

    Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of

  1. An ontology approach to comparative phenomics in plants

    KAUST Repository

    Oellrich, Anika

    2015-02-25

    Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics

  2. Functional classification of genes using semantic distance and fuzzy clustering approach: evaluation with reference sets and overlap analysis.

    Science.gov (United States)

    Devignes, Marie-Dominique; Benabderrahmane, Sidahmed; Smaïl-Tabbone, Malika; Napoli, Amedeo; Poch, Olivier

    2012-01-01

    Functional classification aims at grouping genes according to their molecular function or the biological process they participate in. Evaluating the validity of such unsupervised gene classification remains a challenge given the variety of distance measures and classification algorithms that can be used. We evaluate here functional classification of genes with the help of reference sets: KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathways and Pfam clans. These sets represent ground truth for any distance based on GO (Gene Ontology) biological process and molecular function annotations respectively. Overlaps between clusters and reference sets are estimated by the F-score method. We test our previously described IntelliGO semantic distance with hierarchical and fuzzy C-means clustering and we compare results with the state-of-the-art DAVID (Database for Annotation Visualisation and Integrated Discovery) functional classification method. Finally, study of best matching clusters to reference sets leads us to propose a set-difference method for discovering missing information.

  3. Transcriptome Analysis of Porcine PBMCs Reveals the Immune Cascade Response and Gene Ontology Terms Related to Cell Death and Fibrosis in the Progression of Liver Failure

    Directory of Open Access Journals (Sweden)

    YiMin Zhang

    2018-01-01

    Full Text Available Background. The key gene sets involved in the progression of acute liver failure (ALF, which has a high mortality rate, remain unclear. This study aims to gain a deeper understanding of the transcriptional response of peripheral blood mononuclear cells (PBMCs following ALF. Methods. ALF was induced by D-galactosamine (D-gal in a porcine model. PBMCs were separated at time zero (baseline group, 36 h (failure group, and 60 h (dying group after D-gal injection. Transcriptional profiling was performed using RNA sequencing and analysed using DAVID bioinformatics resources. Results. Compared with the baseline group, 816 and 1,845 differentially expressed genes (DEGs were identified in the failure and dying groups, respectively. A total of five and two gene ontology (GO term clusters were enriched in 107 GO terms in the failure group and 154 GO terms in the dying group. These GO clusters were primarily immune-related, including genes regulating the inflammasome complex and toll-like receptor signalling pathways. Specifically, GO terms related to cell death, including apoptosis, pyroptosis, and autophagy, and those related to fibrosis, coagulation dysfunction, and hepatic encephalopathy were enriched. Seven Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, cytokine-cytokine receptor interaction, hematopoietic cell lineage, lysosome, rheumatoid arthritis, malaria, and phagosome and pertussis pathways were mapped for DEGs in the failure group. All of these seven KEGG pathways were involved in the 19 KEGG pathways mapped in the dying group. Conclusion. We found that the dramatic PBMC transcriptome changes triggered by ALF progression was predominantly related to immune responses. The enriched GO terms related to cell death, fibrosis, and so on, as indicated by PBMC transcriptome analysis, seem to be useful in elucidating potential key gene sets in the progression of ALF. A better understanding of these gene sets might be of preventive or

  4. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2016-11-11

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies

  5. Knowledge Management Framework for Emerging Infectious Diseases Preparedness and Response: Design and Development of Public Health Document Ontology.

    Science.gov (United States)

    Zhang, Zhizun; Gonzalez, Mila C; Morse, Stephen S; Venkatasubramanian, Venkat

    2017-10-11

    There are increasing concerns about our preparedness and timely coordinated response across the globe to cope with emerging infectious diseases (EIDs). This poses practical challenges that require exploiting novel knowledge management approaches effectively. This work aims to develop an ontology-driven knowledge management framework that addresses the existing challenges in sharing and reusing public health knowledge. We propose a systems engineering-inspired ontology-driven knowledge management approach. It decomposes public health knowledge into concepts and relations and organizes the elements of knowledge based on the teleological functions. Both knowledge and semantic rules are stored in an ontology and retrieved to answer queries regarding EID preparedness and response. A hybrid concept extraction was implemented in this work. The quality of the ontology was evaluated using the formal evaluation method Ontology Quality Evaluation Framework. Our approach is a potentially effective methodology for managing public health knowledge. Accuracy and comprehensiveness of the ontology can be improved as more knowledge is stored. In the future, a survey will be conducted to collect queries from public health practitioners. The reasoning capacity of the ontology will be evaluated using the queries and hypothetical outbreaks. We suggest the importance of developing a knowledge sharing standard like the Gene Ontology for the public health domain. ©Zhizun Zhang, Mila C Gonzalez, Stephen S Morse, Venkat Venkatasubramanian. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 11.10.2017.

  6. The foundational ontology library ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available A purpose of a foundational ontology is to solve interoperability issues among domain ontologies and they are used for ontology- driven conceptual data modelling. Multiple foundational ontologies have been developed in recent years, and most of them...

  7. Logical development of the cell ontology.

    Science.gov (United States)

    Meehan, Terrence F; Masci, Anna Maria; Abdulla, Amina; Cowell, Lindsay G; Blake, Judith A; Mungall, Christopher J; Diehl, Alexander D

    2011-01-05

    The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL. Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types. Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.

  8. Logical Development of the Cell Ontology

    Directory of Open Access Journals (Sweden)

    Blake Judith A

    2011-01-01

    Full Text Available Abstract Background The Cell Ontology (CL is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL. Results Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types. Conclusion Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.

  9. XML, Ontologies, and Their Clinical Applications.

    Science.gov (United States)

    Yu, Chunjiang; Shen, Bairong

    2016-01-01

    The development of information technology has resulted in its penetration into every area of clinical research. Various clinical systems have been developed, which produce increasing volumes of clinical data. However, saving, exchanging, querying, and exploiting these data are challenging issues. The development of Extensible Markup Language (XML) has allowed the generation of flexible information formats to facilitate the electronic sharing of structured data via networks, and it has been used widely for clinical data processing. In particular, XML is very useful in the fields of data standardization, data exchange, and data integration. Moreover, ontologies have been attracting increased attention in various clinical fields in recent years. An ontology is the basic level of a knowledge representation scheme, and various ontology repositories have been developed, such as Gene Ontology and BioPortal. The creation of these standardized repositories greatly facilitates clinical research in related fields. In this chapter, we discuss the basic concepts of XML and ontologies, as well as their clinical applications.

  10. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  11. Central auditory function of deafness genes.

    Science.gov (United States)

    Willaredt, Marc A; Ebbers, Lena; Nothwang, Hans Gerd

    2014-06-01

    The highly variable benefit of hearing devices is a serious challenge in auditory rehabilitation. Various factors contribute to this phenomenon such as the diversity in ear defects, the different extent of auditory nerve hypoplasia, the age of intervention, and cognitive abilities. Recent analyses indicate that, in addition, central auditory functions of deafness genes have to be considered in this context. Since reduced neuronal activity acts as the common denominator in deafness, it is widely assumed that peripheral deafness influences development and function of the central auditory system in a stereotypical manner. However, functional characterization of transgenic mice with mutated deafness genes demonstrated gene-specific abnormalities in the central auditory system as well. A frequent function of deafness genes in the central auditory system is supported by a genome-wide expression study that revealed significant enrichment of these genes in the transcriptome of the auditory brainstem compared to the entire brain. Here, we will summarize current knowledge of the diverse central auditory functions of deafness genes. We furthermore propose the intimately interwoven gene regulatory networks governing development of the otic placode and the hindbrain as a mechanistic explanation for the widespread expression of these genes beyond the cochlea. We conclude that better knowledge of central auditory dysfunction caused by genetic alterations in deafness genes is required. In combination with improved genetic diagnostics becoming currently available through novel sequencing technologies, this information will likely contribute to better outcome prediction of hearing devices. Copyright © 2014 Elsevier B.V. All rights reserved.

  12. Semantic similarity in biomedical ontologies.

    Directory of Open Access Journals (Sweden)

    Catia Pesquita

    2009-07-01

    Full Text Available In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.

  13. Semantic similarity in biomedical ontologies.

    Science.gov (United States)

    Pesquita, Catia; Faria, Daniel; Falcão, André O; Lord, Phillip; Couto, Francisco M

    2009-07-01

    In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.

  14. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...... classification systems and meta data taxonomies, should be based on ontologies....

  15. Interaction between leptin and leptin receptor in gastric carcinoma: Gene ontology analysis Interacción entre la leptina y su receptor en el carcinoma gástrico: análisis de ontología genética

    Directory of Open Access Journals (Sweden)

    V. Wiwanitkit

    2007-04-01

    Full Text Available Gastric carcinoma is a rare but important malignancy. The link between leptin, a cytokine that is elevated in obese individuals, and cancer development has been proposed. It is noted that leptin and its receptor may play a positive role in the progression in gastric cancer. However, the exact mechanism resulting form the interaction between leptin and leptin receptor has never been clarified. Here, the author used a new gene ontology technology to predict the molecular function and biological process due to the interaction between leptin and leptin receptor. Comparing to leptin and leptin receptor, the leptin-leptin receptor poses the same function and biological process as leptin receptor. This can confirm that leptin receptor has a significant suppressive effect on the expression of leptin. Loss of hormone activity and disturbance of normal cell signaling pathway of leptin can be seen. Blocking of receptor might be rational therapeutic strategy.El carcinoma gástrico es un cáncer muy poco frecuente pero importante. Se ha postulado que la leptina, una citocina que aparece elevada en las personas obesas, está relacionada con el cáncer. Se sabe que la leptina y su receptor pueden desempeñar un papel positivo en la progresión del cáncer gástrico. Sin embargo, nunca se ha dilucidado el mecanismo exacto al que daría lugar la interacción entre la leptina y el receptor de leptina. Aquí, el autor empleó una nueva tecnología de ontología genética para predecir la función molecular y el proceso biológico resultantes de la interacción entre la leptina y su receptor. Frente a la leptina y su receptor, el compuesto leptina-receptor realiza la misma función y el mismo proceso biológico que el receptor de leptina. Esto puede confirmar que el receptor de leptina ejerce un importante efecto supresor sobre la expresión de leptina. Pueden observarse una pérdida de actividad hormonal y la alteración de la vía normal de señalización celular

  16. Mapping between the OBO and OWL ontology languages.

    Science.gov (United States)

    Tirmizi, Syed Hamid; Aitken, Stuart; Moreira, Dilvan A; Mungall, Chris; Sequeda, Juan; Shah, Nigam H; Miranker, Daniel P

    2011-03-07

    Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL. We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source. Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.

  17. An ontology for sensor networks

    Science.gov (United States)

    Compton, Michael; Neuhaus, Holger; Bermudez, Luis; Cox, Simon

    2010-05-01

    Sensors and networks of sensors are important ways of monitoring and digitizing reality. As the number and size of sensor networks grows, so too does the amount of data collected. Users of such networks typically need to discover the sensors and data that fit their needs without necessarily understanding the complexities of the network itself. The burden on users is eased if the network and its data are expressed in terms of concepts familiar to the users and their job functions, rather than in terms of the network or how it was designed. Furthermore, the task of collecting and combining data from multiple sensor networks is made easier if metadata about the data and the networks is stored in a format and conceptual models that is amenable to machine reasoning and inference. While the OGC's (Open Geospatial Consortium) SWE (Sensor Web Enablement) standards provide for the description and access to data and metadata for sensors, they do not provide facilities for abstraction, categorization, and reasoning consistent with standard technologies. Once sensors and networks are described using rich semantics (that is, by using logic to describe the sensors, the domain of interest, and the measurements) then reasoning and classification can be used to analyse and categorise data, relate measurements with similar information content, and manage, query and task sensors. This will enable types of automated processing and logical assurance built on OGC standards. The W3C SSN-XG (Semantic Sensor Networks Incubator Group) is producing a generic ontology to describe sensors, their environment and the measurements they make. The ontology provides definitions for the structure of sensors and observations, leaving the details of the observed domain unspecified. This allows abstract representations of real world entities, which are not observed directly but through their observable qualities. Domain semantics, units of measurement, time and time series, and location and mobility

  18. Gene, environment and cognitive function

    DEFF Research Database (Denmark)

    Xu, Chunsheng; Sun, Jianping; Duan, Haiping

    2015-01-01

    population living under distinct environmental condition as the Western populations. OBJECTIVE: this study aims to explore the genetic and environmental impact on normal cognitive ageing in the Chinese twins. DESIGN/SETTING: cognitive function was measured on 384 complete twin pairs with median age of 50...... factors accounting for 23-33% of the total variances. In contrast, all cognitive performances showed moderate to high influences by the unique environmental factors. CONCLUSIONS: genetic factor and common family environment have a limited contribution to cognitive function in the Chinese adults......BACKGROUND: the genetic and environmental contributions to cognitive function in the old people have been well addressed for the Western populations using twin modelling showing moderate to high heritability. No similar study has been conducted in the world largest and rapidly ageing Chinese...

  19. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Science.gov (United States)

    Zhang, Ping; Wu, Linwei; Rocha, Andrea M.; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D.; Wu, Liyou; Watson, David B.; Adams, Michael W. W.; Alm, Eric J.; Adams, Paul D.; Arkin, Adam P.

    2018-01-01

    ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly (P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. PMID:29463661

  20. Differences among cell-structure ontologies: FMA, GO, & CCO.

    Science.gov (United States)

    Au, Alan P; Li, Xiang; Gennari, John H

    2006-01-01

    When different groups create models or ontologies of the same knowledge domain, this creates challenges for knowledge sharing. To identify these challenges, we compare cellular structure as modeled by the Foundational Model of Anatomy(FMA), the Gene Ontology (GO), and the Cell Component Ontology (CCO). These ontologies all model the physical anatomy of a cell, and we expected them to be similar in scope. However, we discovered that the actual differences among the mare substantial. These differences represent variations based on theory-driven vs. emergent construction,as well as differences in how small application ontologies like the CCO are created from reference ontologies. In this paper, we provide a description and analysis of these differences. By studying differences in language, granularity, breadth of coverage,and model organization, we hope to gain a better understanding of how to map between related ontologies.

  1. Array2BIO: from microarray expression data to functional annotation of co-regulated genes

    Directory of Open Access Journals (Sweden)

    Rasley Amy

    2006-06-01

    Full Text Available Abstract Background There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility. Results Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1 comparative analysis of signal versus control and (2 clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns. Conclusion We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available at http://array2bio.dcode.org.

  2. Sugarcane genes related to mitochondrial function

    Directory of Open Access Journals (Sweden)

    Fonseca Ghislaine V.

    2001-01-01

    Full Text Available Mitochondria function as metabolic powerhouses by generating energy through oxidative phosphorylation and have become the focus of renewed interest due to progress in understanding the subtleties of their biogenesis and the discovery of the important roles which these organelles play in senescence, cell death and the assembly of iron-sulfur (Fe/S centers. Using proteins from the yeast Saccharomyces cerevisiae, Homo sapiens and Arabidopsis thaliana we searched the sugarcane expressed sequence tag (SUCEST database for the presence of expressed sequence tags (ESTs with similarity to nuclear genes related to mitochondrial functions. Starting with 869 protein sequences, we searched for sugarcane EST counterparts to these proteins using the basic local alignment search tool TBLASTN similarity searching program run against 260,781 sugarcane ESTs contained in 81,223 clusters. We were able to recover 367 clusters likely to represent sugarcane orthologues of the corresponding genes from S. cerevisiae, H. sapiens and A. thaliana with E-value <= 10-10. Gene products belonging to all functional categories related to mitochondrial functions were found and this allowed us to produce an overview of the nuclear genes required for sugarcane mitochondrial biogenesis and function as well as providing a starting point for detailed analysis of sugarcane gene structure and physiology.

  3. A comprehensive functional analysis of tissue specificity of human gene expression

    Directory of Open Access Journals (Sweden)

    Guryanov Alexey

    2008-11-01

    Full Text Available Abstract Background In recent years, the maturation of microarray technology has allowed the genome-wide analysis of gene expression patterns to identify tissue-specific and ubiquitously expressed ('housekeeping' genes. We have performed a functional and topological analysis of housekeeping and tissue-specific networks to identify universally necessary biological processes, and those unique to or characteristic of particular tissues. Results We measured whole genome expression in 31 human tissues, identifying 2374 housekeeping genes expressed in all tissues, and genes uniquely expressed in each tissue. Comprehensive functional analysis showed that the housekeeping set is substantially larger than previously thought, and is enriched with vital processes such as oxidative phosphorylation, ubiquitin-dependent proteolysis, translation and energy metabolism. Network topology of the housekeeping network was characterized by higher connectivity and shorter paths between the proteins than the global network. Ontology enrichment scoring and network topology of tissue-specific genes were consistent with each tissue's function and expression patterns clustered together in accordance with tissue origin. Tissue-specific genes were twice as likely as housekeeping genes to be drug targets, allowing the identification of tissue 'signature networks' that will facilitate the discovery of new therapeutic targets and biomarkers of tissue-targeted diseases. Conclusion A comprehensive functional analysis of housekeeping and tissue-specific genes showed that the biological function of housekeeping and tissue-specific genes was consistent with tissue origin. Network analysis revealed that tissue-specific networks have distinct network properties related to each tissue's function. Tissue 'signature networks' promise to be a rich source of targets and biomarkers for disease treatment and diagnosis.

  4. FunGeneNet: a web tool to estimate enrichment of functional interactions in experimental gene sets.

    Science.gov (United States)

    Tiys, Evgeny S; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2018-02-09

    Estimation of functional connectivity in gene sets derived from genome-wide or other biological experiments is one of the essential tasks of bioinformatics. A promising approach for solving this problem is to compare gene networks built using experimental gene sets with random networks. One of the resources that make such an analysis possible is CrossTalkZ, which uses the FunCoup database. However, existing methods, including CrossTalkZ, do not take into account individual types of interactions, such as protein/protein interactions, expression regulation, transport regulation, catalytic reactions, etc., but rather work with generalized types characterizing the existence of any connection between network members. We developed the online tool FunGeneNet, which utilizes the ANDSystem and STRING to reconstruct gene networks using experimental gene sets and to estimate their difference from random networks. To compare the reconstructed networks with random ones, the node permutation algorithm implemented in CrossTalkZ was taken as a basis. To study the FunGeneNet applicability, the functional connectivity analysis of networks constructed for gene sets involved in the Gene Ontology biological processes was conducted. We showed that the method sensitivity exceeds 0.8 at a specificity of 0.95. We found that the significance level of the difference between gene networks of biological processes and random networks is determined by the type of connections considered between objects. At the same time, the highest reliability is achieved for the generalized form of connections that takes into account all the individual types of connections. By taking examples of the thyroid cancer networks and the apoptosis network, it is demonstrated that key participants in these processes are involved in the interactions of those types by which these networks differ from random ones. FunGeneNet is a web tool aimed at proving the functionality of networks in a wide range of sizes of

  5. Integrating phenotype ontologies with PhenomeNET

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2017-12-19

    Background Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. Results Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. Conclusions PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.

  6. Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development

    Directory of Open Access Journals (Sweden)

    Mungall Christopher J

    2010-10-01

    Full Text Available Abstract Background The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation. Results We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology. Conclusions Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at https://sourceforge.net/tracker/?atid=605890&group_id=36855.

  7. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  8. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...

  9. An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome

    Directory of Open Access Journals (Sweden)

    Hongtao Song

    2018-03-01

    Full Text Available Background: Although the cucumber reference genome and its annotation were published several years ago, the functional annotation of predicted genes, particularly protein-coding genes, still requires further improvement. In general, accurately determining orthologous relationships between genes allows for better and more robust functional assignments of predicted genes. As one of the most reliable strategies, the determination of collinearity information may facilitate reliable orthology inferences among genes from multiple related genomes. Currently, the identification of collinear segments has mainly been based on conservation of gene order and orientation. Over the course of plant genome evolution, various evolutionary events have disrupted or distorted the order of genes along chromosomes, making it difficult to use those genes as genome-wide markers for plant genome comparisons.Results: Using the localized LASTZ/MULTIZ analysis pipeline, we aligned 15 genomes, including cucumber and other related angiosperm plants, and identified a set of genomic segments that are short in length, stable in structure, uniform in distribution and highly conserved across all 15 plants. Compared with protein-coding genes, these conserved segments were more suitable for use as genomic markers for detecting collinear segments among distantly divergent plants. Guided by this set of identified collinear genomic segments, we inferred 94,486 orthologous protein-coding gene pairs (OPPs between cucumber and 14 other angiosperm species, which were used as proxies for transferring functional terms to cucumber genes from the annotations of the other 14 genomes. In total, 10,885 protein-coding genes were assigned Gene Ontology (GO terms which was nearly 1,300 more than results collected in Uniprot-proteomic database. Our results showed that annotation accuracy would been improved compared with other existing approaches.Conclusions: In this study, we provided an

  10. Conceptual querying through ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2009-01-01

    is motivated by an obvious need for users to survey huge volumes of objects in query answers. An ontology formalism and a special notion of-instantiated ontology" are introduced. The latter is a structure reflecting the content in the document collection in that; it is a restriction of a general world......We present here ail approach to conceptual querying where the aim is, given a collection of textual database objects or documents, to target an abstraction of the entire database content in terms of the concepts appearing in documents, rather than the documents in the collection. The approach...... knowledge ontology to the concepts instantiated in the collection. The notion of ontology-based similarity is briefly described, language constructs for direct navigation and retrieval of concepts in the ontology are discussed and approaches to conceptual summarization are presented....

  11. Nuclear Nonproliferation Ontology Assessment Team Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Strasburg, Jana D.; Hohimer, Ryan E.

    2012-01-01

    Final Report for the NA22 Simulations, Algorithm and Modeling (SAM) Ontology Assessment Team's efforts from FY09-FY11. The Ontology Assessment Team began in May 2009 and concluded in September 2011. During this two-year time frame, the Ontology Assessment team had two objectives: (1) Assessing the utility of knowledge representation and semantic technologies for addressing nuclear nonproliferation challenges; and (2) Developing ontological support tools that would provide a framework for integrating across the Simulation, Algorithm and Modeling (SAM) program. The SAM Program was going through a large assessment and strategic planning effort during this time and as a result, the relative importance of these two objectives changed, altering the focus of the Ontology Assessment Team. In the end, the team conducted an assessment of the state of art, created an annotated bibliography, and developed a series of ontological support tools, demonstrations and presentations. A total of more than 35 individuals from 12 different research institutions participated in the Ontology Assessment Team. These included subject matter experts in several nuclear nonproliferation-related domains as well as experts in semantic technologies. Despite the diverse backgrounds and perspectives, the Ontology Assessment team functioned very well together and aspects could serve as a model for future inter-laboratory collaborations and working groups. While the team encountered several challenges and learned many lessons along the way, the Ontology Assessment effort was ultimately a success that led to several multi-lab research projects and opened up a new area of scientific exploration within the Office of Nuclear Nonproliferation and Verification.

  12. Practical ontologies for information professionals

    CERN Document Server

    AUTHOR|(CDS)2071712

    2016-01-01

    Practical Ontologies for Information Professionals provides an introduction to ontologies and their development, an essential tool for fighting back against information overload. The development of robust and widely used ontologies is an increasingly important tool in the fight against information overload. The publishing and sharing of explicit explanations for a wide variety of conceptualizations, in a machine readable format, has the power to both improve information retrieval and identify new knowledge. This new book provides an accessible introduction to the following: * What is an ontology? Defining the concept and why it is increasingly important to the information professional * Ontologies and the semantic web * Existing ontologies, such as SKOS, OWL, FOAF, schema.org, and the DBpedia Ontology * Adopting and building ontologies, showing how to avoid repetition of work and how to build a simple ontology with Protege * Interrogating semantic web ontologies * The future of ontologies and the role of the ...

  13. Diverse gene functions in a soil mobilome

    DEFF Research Database (Denmark)

    Luo, Wenting; Xu, Zhuofei; Riber, Leise

    2016-01-01

    , the soil mobilome sampled from a well-characterized field in Hygum, Denmark. Soil bacterial cells were obtained by Nycodenz extraction, total DNA was purified by removing sheared chromosomal DNA using exonuclease digestion, and the remaining circular DNA was amplified with the phi29 polymerase and finally...... sequenced. The soil mobilome represented a wide range of known bacterial gene functions and highlighted the enrichment of plasmids, transposable elements and phages when compared to a well-characterized soil metagenome that, on the other hand, was dominated by basic biosynthesis and metabolism functions....... Approximately one eighth of the gene set was of plasmid-intrinsic traits, including replication, conjugation, mobilization and stability based on Pfam database analysis. Resistance determinants toward aminoglycosides, beta-lactams and glycopeptides as well as multi-drug functions indicated that a substantial...

  14. Ribosomal RNA gene functioning in avian oogenesis.

    Science.gov (United States)

    Koshel, Elena; Galkina, Svetlana; Saifitdinova, Alsu; Dyomin, Alexandr; Deryusheva, Svetlana; Gaginskaya, Elena

    2016-12-01

    Despite long-term exploration into ribosomal RNA gene functioning during the oogenesis of various organisms, many intriguing problems remain unsolved. In this review, we describe nucleolus organizer region (NOR) activity in avian oocytes. Whereas oocytes from an adult avian ovary never reveal the formation of the nucleolus in the germinal vesicle (GV), an ovary from juvenile birds possesses both nucleolus-containing and non-nucleolus-containing oocytes. The evolutionary diversity of oocyte NOR functioning and the potential non-rRNA-related functions of the nucleolus in oocytes are also discussed.

  15. Annotating breast cancer microarray samples using ontologies

    Science.gov (United States)

    Liu, Hongfang; Li, Xin; Yoon, Victoria; Clarke, Robert

    2008-01-01

    As the most common cancer among women, breast cancer results from the accumulation of mutations in essential genes. Recent advance in high-throughput gene expression microarray technology has inspired researchers to use the technology to assist breast cancer diagnosis, prognosis, and treatment prediction. However, the high dimensionality of microarray experiments and public access of data from many experiments have caused inconsistencies which initiated the development of controlled terminologies and ontologies for annotating microarray experiments, such as the standard microarray Gene Expression Data (MGED) ontology (MO). In this paper, we developed BCM-CO, an ontology tailored specifically for indexing clinical annotations of breast cancer microarray samples from the NCI Thesaurus. Our research showed that the coverage of NCI Thesaurus is very limited with respect to i) terms used by researchers to describe breast cancer histology (covering 22 out of 48 histology terms); ii) breast cancer cell lines (covering one out of 12 cell lines); and iii) classes corresponding to the breast cancer grading and staging. By incorporating a wider range of those terms into BCM-CO, we were able to indexed breast cancer microarray samples from GEO using BCM-CO and MGED ontology and developed a prototype system with web interface that allows the retrieval of microarray data based on the ontology annotations. PMID:18999108

  16. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms

    Science.gov (United States)

    2015-01-01

    The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms. PMID:26550571

  17. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms

    Directory of Open Access Journals (Sweden)

    Yang Xiang

    2015-01-01

    Full Text Available The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.

  18. Ontological foundations for evolutionary economics: A Darwinian social ontology

    NARCIS (Netherlands)

    Stoelhorst, J.W.

    2008-01-01

    The purpose of this paper is to further the project of generalized Darwinism by developing a social ontology on the basis of a combined commitment to ontological continuity and ontological commonality. Three issues that are central to the development of a social ontology are addressed: (1) the

  19. Perspectives on ontology learning

    CERN Document Server

    Lehmann, J

    2014-01-01

    Perspectives on Ontology Learning brings together researchers and practitioners from different communities − natural language processing, machine learning, and the semantic web − in order to give an interdisciplinary overview of recent advances in ontology learning.Starting with a comprehensive introduction to the theoretical foundations of ontology learning methods, the edited volume presents the state-of-the-start in automated knowledge acquisition and maintenance. It outlines future challenges in this area with a special focus on technologies suitable for pushing the boundaries beyond the c

  20. Appreciating ontological struggles

    DEFF Research Database (Denmark)

    Danholt, Peter

    Appreciating ontological struggles Peter Danholt, ass. prof., Information studies, Aarhus University In the west – most of us – take for granted that we inhabit a common world, which we share with 6 billion other human beings and multiple other living beings, animals and plants. As Annemarie Mol...... a condition in the world, but as the playing out of an ontological struggle, we become able to appreciate the situation and the treatment differently and in a manner that reconsiders treatment and disease in novel ways. Importantly, when the encounter is conceived of as an ontological struggle it becomes ever...

  1. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  2. Relating protein functional diversity to cell type number identifies genes that determine dynamic aspects of chromatin organisation as potential contributors to organismal complexity.

    Science.gov (United States)

    Lopes Cardoso, Daniela; Sharpe, Colin

    2017-01-01

    Organismal complexity broadly relates to the number of different cell types within an organism and generally increases across a phylogeny. Whilst gene expression will underpin organismal complexity, it has long been clear that a simple count of gene number is not a sufficient explanation. In this paper, we use open-access information from the Ensembl databases to quantify the functional diversity of human genes that are broadly involved in transcription. Functional diversity is described in terms of the numbers of paralogues, protein isoforms and structural domains for each gene. The change in functional diversity is then calculated for up to nine orthologues from the nematode worm to human and correlated to the change in cell-type number. Those with the highest correlation are subject to gene ontology term enrichment and interaction analyses. We found that a range of genes that encode proteins associated with dynamic changes to chromatin are good candidates to contribute to organismal complexity.

  3. The design ontology

    DEFF Research Database (Denmark)

    Storga, Mario; Andreasen, Mogens Myrup; Marjanovic, Dorian

    2010-01-01

    The article presents the research of the nature, building and practical role of a Design Ontology as a potential framework for the more efficient product development (PD) data-, information- and knowledge- description, -explanation, -understanding and -reusing. In the methodology for development...... of the ontology two steps could be identified: empirical research and computer implementation. Empirical research has included domain documentation analysis (Genetic Design Model System developed by Mortensen 1999), identification of the key concepts and relations between them, and categorisation of the concepts...... and relations into taxonomies. As an epistemological foundation for the concepts formalisation, The Suggested Upper Merged Ontology (SUMO) proposed by IEEE, was reused. As the result of the previously described process, the ontology content has been categorised into six main subcategories divided between...

  4. Ontology of fractures

    Science.gov (United States)

    Zhong, Jian; Aydina, Atilla; McGuinness, Deborah L.

    2009-03-01

    Fractures are fundamental structures in the Earth's crust and they can impact many societal and industrial activities including oil and gas exploration and production, aquifer management, CO 2 sequestration, waste isolation, the stabilization of engineering structures, and assessing natural hazards (earthquakes, volcanoes, and landslides). Therefore, an ontology which organizes the concepts of fractures could help facilitate a sound education within, and communication among, the highly diverse professional and academic community interested in the problems cited above. We developed a process-based ontology that makes explicit specifications about fractures, their properties, and the deformation mechanisms which lead to their formation and evolution. Our ontology emphasizes the relationships among concepts such as the factors that influence the mechanism(s) responsible for the formation and evolution of specific fracture types. Our ontology is a valuable resource with a potential to applications in a number of fields utilizing recent advances in Information Technology, specifically for digital data and information in computers, grids, and Web services.

  5. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Directory of Open Access Journals (Sweden)

    Saber Jelokhani-Niaraki

    2015-03-01

    Full Text Available During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  6. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes.

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-03-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  7. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-01-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data. PMID:25873847

  8. MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions.

    Science.gov (United States)

    Blank, Carrine E; Cui, Hong; Moore, Lisa R; Walls, Ramona L

    2016-01-01

    MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO's Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we

  9. Manufacturing ontology through templates

    Directory of Open Access Journals (Sweden)

    Diciuc Vlad

    2017-01-01

    Full Text Available The manufacturing industry contains a high volume of knowhow and of high value, much of it being held by key persons in the company. The passing of this know-how is the basis of manufacturing ontology. Among other methods like advanced filtering and algorithm based decision making, one way of handling the manufacturing ontology is via templates. The current paper tackles this approach and highlights the advantages concluding with some recommendations.

  10. ``Force,'' ontology, and language

    Science.gov (United States)

    Brookes, David T.; Etkina, Eugenia

    2009-06-01

    We introduce a linguistic framework through which one can interpret systematically students’ understanding of and reasoning about force and motion. Some researchers have suggested that students have robust misconceptions or alternative frameworks grounded in everyday experience. Others have pointed out the inconsistency of students’ responses and presented a phenomenological explanation for what is observed, namely, knowledge in pieces. We wish to present a view that builds on and unifies aspects of this prior research. Our argument is that many students’ difficulties with force and motion are primarily due to a combination of linguistic and ontological difficulties. It is possible that students are primarily engaged in trying to define and categorize the meaning of the term “force” as spoken about by physicists. We found that this process of negotiation of meaning is remarkably similar to that engaged in by physicists in history. In this paper we will describe a study of the historical record that reveals an analogous process of meaning negotiation, spanning multiple centuries. Using methods from cognitive linguistics and systemic functional grammar, we will present an analysis of the force and motion literature, focusing on prior studies with interview data. We will then discuss the implications of our findings for physics instruction.

  11. The Cognitive Paradigm Ontology: Design and Application

    Science.gov (United States)

    Laird, Angela R.

    2013-01-01

    We present the basic structure of the Cognitive Paradigm Ontology (CogPO) for human behavioral experiments. While the experimental psychology and cognitive neuroscience literature may refer to certain behavioral tasks by name (e.g., the Stroop paradigm or the Sternberg paradigm) or by function (a working memory task, a visual attention task), these paradigms can vary tremendously in the stimuli that are presented to the subject, the response expected from the subject, and the instructions given to the subject. Drawing from the taxonomy developed and used by the BrainMap project (www.brainmap.org) for almost two decades to describe key components of published functional imaging results, we have developed an ontology capable of representing certain characteristics of the cognitive paradigms used in the fMRI and PET literature. The Cognitive Paradigm Ontology is being developed to be compliant with the Basic Formal Ontology (BFO), and to harmonize where possible with larger ontologies such as RadLex, NeuroLex, or the Ontology of Biomedical Investigations (OBI). The key components of CogPO include the representation of experimental conditions focused on the stimuli presented, the instructions given, and the responses requested. The use of alternate and even competitive terminologies can often impede scientific discoveries. Categorization of paradigms according to stimulus, response, and instruction has been shown to allow advanced data retrieval techniques by searching for similarities and contrasts across multiple paradigm levels. The goal of CogPO is to develop, evaluate, and distribute a domain ontology of cognitive paradigms for application and use in the functional neuroimaging community. PMID:21643732

  12. Ontological visualization of protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Hill David P

    2005-02-01

    Full Text Available Abstract Background Cellular processes require the interaction of many proteins across several cellular compartments. Determining the collective network of such interactions is an important aspect of understanding the role and regulation of individual proteins. The Gene Ontology (GO is used by model organism databases and other bioinformatics resources to provide functional annotation of proteins. The annotation process provides a mechanism to document the binding of one protein with another. We have constructed protein interaction networks for mouse proteins utilizing the information encoded in the GO annotations. The work reported here presents a methodology for integrating and visualizing information on protein-protein interactions. Results GO annotation at Mouse Genome Informatics (MGI captures 1318 curated, documented interactions. These include 129 binary interactions and 125 interaction involving three or more gene products. Three networks involve over 30 partners, the largest involving 109 proteins. Several tools are available at MGI to visualize and analyze these data. Conclusions Curators at the MGI database annotate protein-protein interaction data from experimental reports from the literature. Integration of these data with the other types of data curated at MGI places protein binding data into the larger context of mouse biology and facilitates the generation of new biological hypotheses based on physical interactions among gene products.

  13. Ontology-based validation and identification of regulatory phenotypes

    KAUST Repository

    Kulmanov, Maxat

    2018-01-31

    Motivation: Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations Results: We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. Our method can also be applied to the rule-based prediction of phenotypes from functions. We show that the predicted phenotypes can be utilized for identification of protein-protein interactions and gene-disease associations. Based on experimental functional annotations, we predict phenotypes for 1,986 genes in mouse and 7,301 genes in human for which no experimental phenotypes have yet been determined.

  14. Open Biomedical Ontology-based Medline exploration

    Science.gov (United States)

    Xuan, Weijian; Dai, Manhong; Mirel, Barbara; Song, Jean; Athey, Brian; Watson, Stanley J; Meng, Fan

    2009-01-01

    Background Effective Medline database exploration is critical for the understanding of high throughput experimental results and the development of novel hypotheses about the mechanisms underlying the targeted biological processes. While existing solutions enhance Medline exploration through different approaches such as document clustering, network presentations of underlying conceptual relationships and the mapping of search results to MeSH and Gene Ontology trees, we believe the use of multiple ontologies from the Open Biomedical Ontology can greatly help researchers to explore literature from different perspectives as well as to quickly locate the most relevant Medline records for further investigation. Results We developed an ontology-based interactive Medline exploration solution called PubOnto to enable the interactive exploration and filtering of search results through the use of multiple ontologies from the OBO foundry. The PubOnto program is a rich internet application based on the FLEX platform. It contains a number of interactive tools, visualization capabilities, an open service architecture, and a customizable user interface. It is freely accessible at: . PMID:19426463

  15. Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature.

    Science.gov (United States)

    Shrestha, Rosemary; Arnaud, Elizabeth; Mauleon, Ramil; Senger, Martin; Davenport, Guy F; Hancock, David; Morrison, Norman; Bruskiewich, Richard; McLaren, Graham

    2010-01-01

    Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders. These databases provide comparative phenotypic and genotypic information that can help elucidate functional aspects of plant and agricultural biology. To facilitate data sharing within and between these databases and the retrieval of information, the crop ontology (CO) database was designed to provide controlled vocabulary sets for several economically important plant species. Existing public ontologies and equivalent catalogues of concepts covering the range of crop science information and descriptors for crops and crop-related traits were collected from breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit tool. All terms within an ontology were assigned a globally unique CO term identifier. The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and maize are also included. In addition, multi-crop passport terms are included as controlled vocabularies for sharing information on germplasm. Two web-based online resources were built to make these COs available to the scientific community: the 'CO Lookup Service' for browsing the CO; and the 'Crops Terminizer', an ontology text mark-up tool. The controlled vocabularies of the CO are being used to curate several CGIAR centres' agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative

  16. FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis.

    Science.gov (United States)

    Zhang, Yun; Topham, David J; Thakar, Juilee; Qiu, Xing

    2017-07-01

    Gene set enrichment analyses (GSEAs) are widely used in genomic research to identify underlying biological mechanisms (defined by the gene sets), such as Gene Ontology terms and molecular pathways. There are two caveats in the currently available methods: (i) they are typically designed for group comparisons or regression analyses, which do not utilize temporal information efficiently in time-series of transcriptomics measurements; and (ii) genes overlapping in multiple molecular pathways are considered multiple times in hypothesis testing. We propose an inferential framework for GSEA based on functional data analysis, which utilizes the temporal information based on functional principal component analysis, and disentangles the effects of overlapping genes by a functional extension of the elastic-net regression. Furthermore, the hypothesis testing for the gene sets is performed by an extension of Mann-Whitney U test which is based on weighted rank sums computed from correlated observations. By using both simulated datasets and a large-scale time-course gene expression data on human influenza infection, we demonstrate that our method has uniformly better receiver operating characteristic curves, and identifies more pathways relevant to immune-response to human influenza infection than the competing approaches. The methods are implemented in R package FUNNEL, freely and publicly available at: https://github.com/yunzhang813/FUNNEL-GSEA-R-Package . xing_qiu@urmc.rochester.edu or juilee_thakar@urmc.rochester.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  17. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

    Science.gov (United States)

    Smith, Barry; Ashburner, Michael; Rosse, Cornelius; Bard, Jonathan; Bug, William; Ceusters, Werner; Goldberg, Louis J; Eilbeck, Karen; Ireland, Amelia; Mungall, Christopher J; Leontis, Neocles; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H; Shah, Nigam; Whetzel, Patricia L; Lewis, Suzanna

    2007-11-01

    The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.

  18. DeMO: An Ontology for Discrete-event Modeling and Simulation

    Science.gov (United States)

    Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

    2011-01-01

    Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

  19. Margin based ontology sparse vector learning algorithm and applied in biology science

    Directory of Open Access Journals (Sweden)

    Wei Gao

    2017-01-01

    Full Text Available In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  20. Ontology: ambiguity and accuracy

    Directory of Open Access Journals (Sweden)

    Marcelo Schiessl

    2012-08-01

    Full Text Available Ambiguity is a major obstacle to information retrieval. It is source of several researches in Information Science. Ontologies have been studied in order to solve problems related to ambiguities. Paradoxically, “ontology” term is also ambiguous and it is understood according to the use by the community. Philosophy and Computer Science seems to have the most accentuated difference related to the term sense. The former holds undisputed tradition and authority. The latter, in despite of being quite recent, holds an informal sense, but pragmatic. Information Science acts ranging from philosophical to computational approaches so as to get organized collections based on balance between users’ necessities and available information. The semantic web requires informational cycle automation and demands studies related to ontologies. Consequently, revisiting relevant approaches for the study of ontologies plays a relevant role as a way to provide useful ideas to researchers maintaining philosophical rigor, and convenience provided by computers.

  1. Ontological engineering versus metaphysics

    Science.gov (United States)

    Tataj, Emanuel; Tomanek, Roman; Mulawka, Jan

    2011-10-01

    It has been recognized that ontologies are a semantic version of world wide web and can be found in knowledge-based systems. A recent time survey of this field also suggest that practical artificial intelligence systems may be motivated by this research. Especially strong artificial intelligence as well as concept of homo computer can also benefit from their use. The main objective of this contribution is to present and review already created ontologies and identify the main advantages which derive such approach for knowledge management systems. We would like to present what ontological engineering borrows from metaphysics and what a feedback it can provide to natural language processing, simulations and modelling. The potential topics of further development from philosophical point of view is also underlined.

  2. A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

    Science.gov (United States)

    Feng, Shou; Fu, Ping; Zheng, Wenbin

    2018-03-01

    Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.

  3. Prioritising lexical patterns to increase axiomatisation in biomedical ontologies. The role of localisation and modularity.

    Science.gov (United States)

    Quesada-Martínez, M; Fernández-Breis, J T; Stevens, R; Mikroyannidi, E

    2015-01-01

    This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". In previous work, we have defined methods for the extraction of lexical patterns from labels as an initial step towards semi-automatic ontology enrichment methods. Our previous findings revealed that many biomedical ontologies could benefit from enrichment methods using lexical patterns as a starting point.Here, we aim to identify which lexical patterns are appropriate for ontology enrichment, driving its analysis by metrics to prioritised the patterns. We propose metrics for suggesting which lexical regularities should be the starting point to enrich complex ontologies. Our method determines the relevance of a lexical pattern by measuring its locality in the ontology, that is, the distance between the classes associated with the pattern, and the distribution of the pattern in a certain module of the ontology. The methods have been applied to four significant biomedical ontologies including the Gene Ontology and SNOMED CT. The metrics provide information about the engineering of the ontologies and the relevance of the patterns. Our method enables the suggestion of links between classes that are not made explicit in the ontology. We propose a prioritisation of the lexical patterns found in the analysed ontologies. The locality and distribution of lexical patterns offer insights into the further engineering of the ontology. Developers can use this information to improve the axiomatisation of their ontologies.

  4. Learning expressive ontologies

    CERN Document Server

    Völker, J

    2009-01-01

    This publication advances the state-of-the-art in ontology learning by presenting a set of novel approaches to the semi-automatic acquisition, refinement and evaluation of logically complex axiomatizations. It has been motivated by the fact that the realization of the semantic web envisioned by Tim Berners-Lee is still hampered by the lack of ontological resources, while at the same time more and more applications of semantic technologies emerge from fast-growing areas such as e-business or life sciences. Such knowledge-intensive applications, requiring large scale reasoning over complex domai

  5. Process attributes in bio-ontologies

    Directory of Open Access Journals (Sweden)

    Andrade André Q

    2012-08-01

    Full Text Available Abstract Background Biomedical processes can provide essential information about the (mal- functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency. Results We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity. Conclusions We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.

  6. Knowledge Representation in Patient Safety Reporting: An Ontological Approach

    Directory of Open Access Journals (Sweden)

    Liang Chen

    2016-10-01

    Full Text Available Purpose: The current development of patient safety reporting systems is criticized for loss of information and low data quality due to the lack of a uniformed domain knowledge base and text processing functionality. To improve patient safety reporting, the present paper suggests an ontological representation of patient safety knowledge. Design/methodology/approach: We propose a framework for constructing an ontological knowledge base of patient safety. The present paper describes our design, implementation, and evaluation of the ontology at its initial stage. Findings: We describe the design and initial outcomes of the ontology implementation. The evaluation results demonstrate the clinical validity of the ontology by a self-developed survey measurement. Research limitations: The proposed ontology was developed and evaluated using a small number of information sources. Presently, US data are used, but they are not essential for the ultimate structure of the ontology. Practical implications: The goal of improving patient safety can be aided through investigating patient safety reports and providing actionable knowledge to clinical practitioners. As such, constructing a domain specific ontology for patient safety reports serves as a cornerstone in information collection and text mining methods. Originality/value: The use of ontologies provides abstracted representation of semantic information and enables a wealth of applications in a reporting system. Therefore, constructing such a knowledge base is recognized as a high priority in health care.

  7. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes

    Directory of Open Access Journals (Sweden)

    Datta Somnath

    2006-08-01

    Full Text Available Abstract Background A cluster analysis is the most commonly performed procedure (often regarded as a first step on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity, often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO for the annotated genes of the relevant species. Results In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI. As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI. For a given clustering algorithm and an expression data set, it measures the consistency of the clustering

  8. Core Semantics for Public Ontologies

    National Research Council Canada - National Science Library

    Suni, Niranjan

    2005-01-01

    ... (schemas or ontologies) with respect to objects. The DARPA Agent Markup Language (DAML) through the use of ontologies provides a very powerful way to describe objects and their relationships to other objects...

  9. A Method for Building Personalized Ontology Summaries

    OpenAIRE

    Queiroz-Sousa, Paulo Orlando; Salgado, Ana Carolina; Pires, Carlos Eduardo

    2013-01-01

    In the context of ontology engineering, the ontology understanding is the basis for its further developmentand reuse. One intuitive eective approach to support ontology understanding is the process of ontology summarizationwhich highlights the most important concepts of an ontology. Ontology summarization identies an excerpt from anontology that contains the most relevant concepts and produces an abridged ontology. In this article, we present amethod for summarizing ontologies that represent ...

  10. Summarization by domain ontology navigation

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2013-01-01

    of the subject. In between these two extremes, conceptual summaries encompass selected concepts derived using background knowledge. We address in this paper an approach where conceptual summaries are provided through a conceptualization as given by an ontology. The ontology guiding the summarization can...... be a simple taxonomy or a generative domain ontology. A domain ontology can be provided by a preanalysis of a domain corpus and can be used to condense improved summaries that better reflects the conceptualization of a given domain....

  11. The First Organ-Based Ontology for Arthropods (Ontology of Arthropod Circulatory Systems - OArCS) and its Integration into a Novel Formalization Scheme for Morphological Descriptions.

    Science.gov (United States)

    Wirkner, Christian S; Göpel, Torben; Runge, Jens; Keiler, Jonas; Klussmann-Fricke, Bastian-Jesper; Huckstorf, Katarina; Scholz, Stephan; Mikó, István; J Yoder, Matthew; Richter, Stefan

    2017-09-01

    ontology. That is, descriptions in ontologies are only descriptions of individuals if they are necessary/and or sufficient representations of attributes (independently) observed and recorded for an individual. In addition, we here present for the first time an entirely new approach to formalizing phenotypic research, a semantic model for the description of a complex organ system in a highly disparate taxon, the arthropods. We demonstrate this with a formalized morphological description of the hemolymph vascular system in one specimen of the European garden spider Araneus diadematus. Our description targets five categories of descriptive statement: "position", "spatial relationships", "shape", "constituents", and "connections", as the corresponding formalizations constitute exemplary patterns useful not only when talking about the circulatory system, but also in descriptions in general. The downstream applications of computer-parsable morphological descriptions are widespread, with their core utility being the fact that they make it possible to compare collective description sets in computational time, that is, very quickly. Among other things, this facilitates the identification of phenotypic plasticity and variation when single individuals are compared, the identification of those traits which correlate between and within taxa, and the identification of links between morphological traits and genetic (using GO, Gene Ontology) or environmental (using ENVO, Environmental Ontology) factors. [Arthropoda; concept; function; hemolymph vascular system; homology; terminology.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. An ontology-based exploration of the concepts and relationships in the activities and participation component of the international classification of functioning, disability and health.

    Science.gov (United States)

    Della Mea, Vincenzo; Simoncello, Andrea

    2012-02-28

    The International Classification of Functioning, Disability and Health (ICF) is a classification of health and health-related issues, aimed at describing and measuring health and disability at both individual and population levels. Here we discuss a preliminary qualitative and quantitative analysis of the relationships used in the Activities and Participation component of ICF, and a preliminary mapping to SUMO (Suggested Upper Merged Ontology) concepts. The aim of the analysis is to identify potential logical problems within this component of ICF, and to understand whether activities and participation might be defined more formally than in the current version of ICF. In the relationship analysis, we used four predicates among those available in SUMO for processes (Patient, Instrument, Agent, and subProcess). While at the top level subsumption was used in most cases (90%), at the lower levels the percentage of other relationships rose to 41%. Chapters were heterogeneous in the relationships used and some of the leaves of the tree seemed to represent properties or parts of the parent concept rather than subclasses. Mapping of ICF to SUMO proved partially feasible, with the activity concepts being mapped mostly (but not totally) under the IntentionalProcess concept in SUMO. On the other hand, the participation concept has not been mapped to any upper level concept. Our analysis of the relationships within ICF revealed issues related to confusion between classes and their properties, incorrect classifications, and overemphasis on subsumption, confirming what already observed by other researchers. However, it also suggested some properties for Activities that could be included in a more formal model: number of agents involved, the instrument used to carry out the activity, the object of the activity, complexity of the task, and an enumeration of relevant subtasks.

  13. An ontology-based exploration of the concepts and relationships in the activities and participation component of the international classification of functioning, disability and health

    Directory of Open Access Journals (Sweden)

    Della Mea Vincenzo

    2012-02-01

    Full Text Available Abstract Background The International Classification of Functioning, Disability and Health (ICF is a classification of health and health-related issues, aimed at describing and measuring health and disability at both individual and population levels. Here we discuss a preliminary qualitative and quantitative analysis of the relationships used in the Activities and Participation component of ICF, and a preliminary mapping to SUMO (Suggested Upper Merged Ontology concepts. The aim of the analysis is to identify potential logical problems within this component of ICF, and to understand whether activities and participation might be defined more formally than in the current version of ICF. Results In the relationship analysis, we used four predicates among those available in SUMO for processes (Patient, Instrument, Agent, and subProcess. While at the top level subsumption was used in most cases (90%, at the lower levels the percentage of other relationships rose to 41%. Chapters were heterogeneous in the relationships used and some of the leaves of the tree seemed to represent properties or parts of the parent concept rather than subclasses. Mapping of ICF to SUMO proved partially feasible, with the activity concepts being mapped mostly (but not totally under the IntentionalProcess concept in SUMO. On the other hand, the participation concept has not been mapped to any upper level concept. Conclusions Our analysis of the relationships within ICF revealed issues related to confusion between classes and their properties, incorrect classifications, and overemphasis on subsumption, confirming what already observed by other researchers. However, it also suggested some properties for Activities that could be included in a more formal model: number of agents involved, the instrument used to carry out the activity, the object of the activity, complexity of the task, and an enumeration of relevant subtasks.

  14. Age-Related Gene Expression in the Frontal Cortex Suggests Synaptic Function Changes in Specific Inhibitory Neuron Subtypes

    Directory of Open Access Journals (Sweden)

    Leon French

    2017-05-01

    Full Text Available Genome-wide expression profiling of the human brain has revealed genes that are differentially expressed across the lifespan. Characterizing these genes adds to our understanding of both normal functions and pathological conditions. Additionally, the specific cell-types that contribute to the motor, sensory and cognitive declines during aging are unclear. Here we test if age-related genes show higher expression in specific neural cell types. Our study leverages data from two sources of murine single-cell expression data and two sources of age-associations from large gene expression studies of postmortem human brain. We used nonparametric gene set analysis to test for age-related enrichment of genes associated with specific cell-types; we also restricted our analyses to specific gene ontology groups. Our analyses focused on a primary pair of single-cell expression data from the mouse visual cortex and age-related human post-mortem gene expression information from the orbitofrontal cortex. Additional pairings that used data from the hippocampus, prefrontal cortex, somatosensory cortex and blood were used to validate and test specificity of our findings. We found robust age-related up-regulation of genes that are highly expressed in oligodendrocytes and astrocytes, while genes highly expressed in layer 2/3 glutamatergic neurons were down-regulated across age. Genes not specific to any neural cell type were also down-regulated, possibly due to the bulk tissue source of the age-related genes. A gene ontology-driven dissection of the cell-type enriched genes highlighted the strong down-regulation of genes involved in synaptic transmission and cell-cell signaling in the Somatostatin (Sst neuron subtype that expresses the cyclin dependent kinase 6 (Cdk6 and in the vasoactive intestinal peptide (Vip neuron subtype expressing myosin binding protein C, slow type (Mybpc1. These findings provide new insights into cell specific susceptibility to normal aging

  15. Using a Foundational Ontology for Reengineering a Software Enterprise Ontology

    Science.gov (United States)

    Perini Barcellos, Monalessa; de Almeida Falbo, Ricardo

    The knowledge about software organizations is considerably relevant to software engineers. The use of a common vocabulary for representing the useful knowledge about software organizations involved in software projects is important for several reasons, such as to support knowledge reuse and to allow communication and interoperability between tools. Domain ontologies can be used to define a common vocabulary for sharing and reuse of knowledge about some domain. Foundational ontologies can be used for evaluating and re-designing domain ontologies, giving to these real-world semantics. This paper presents an evaluating of a Software Enterprise Ontology that was reengineered using the Unified Foundation Ontology (UFO) as basis.

  16. Taking Critical Ontology Seriously

    DEFF Research Database (Denmark)

    Wigger, Angela; Horn, Laura

    2017-01-01

    privilege ontology over epistemology – that is, why we need to accept that social reality is constituted by complex power relations that evolve from a constant dialectical interplay of structure and agency over time, and that these power relations are revealed in both ideational and material dimensions...

  17. Dahlbeck and Pure Ontology

    Science.gov (United States)

    Mackenzie, Jim

    2016-01-01

    This article responds to Johan Dahlbeck's "Towards a pure ontology: Children's bodies and morality" ["Educational Philosophy and Theory," vol. 46 (1), 2014, pp. 8-23 (EJ1026561)]. His arguments from Nietzsche and Spinoza do not carry the weight he supposes, and the conclusions he draws from them about pedagogy would be…

  18. OWL Web Ontology Language

    NARCIS (Netherlands)

    Staab, S.; Studer, R.; Antoniou, Grigoris; Van Harmelen, Frank; Staab, S; Studer, R

    2004-01-01

    The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing

  19. A computational procedure for functional characterization of potential marker genes from molecular data: Alzheimer's as a case study

    Directory of Open Access Journals (Sweden)

    Barla Annalisa

    2011-07-01

    Full Text Available Abstract Background A molecular characterization of Alzheimer's Disease (AD is the key to the identification of altered gene sets that lead to AD progression. We rely on the assumption that candidate marker genes for a given disease belong to specific pathogenic pathways, and we aim at unveiling those pathways stable across tissues, treatments and measurement systems. In this context, we analyzed three heterogeneous datasets, two microarray gene expression sets and one protein abundance set, applying a recently proposed feature selection method based on regularization. Results For each dataset we identified a signature that was successively evaluated both from the computational and functional characterization viewpoints, estimating the classification error and retrieving the most relevant biological knowledge from different repositories. Each signature includes genes already known to be related to AD and genes that are likely to be involved in the pathogenesis or in the disease progression. The integrated analysis revealed a meaningful overlap at the functional level. Conclusions The identification of three gene signatures showing a relevant overlap of pathways and ontologies, increases the likelihood of finding potential marker genes for AD.

  20. Patient Centric Ontology for Telehealth Domain

    DEFF Research Database (Denmark)

    Jørgensen, Daniel Bjerring; Hallenborg, Kasper; Demazeau, Yves

    2015-01-01

    to the needs, habits, and personality of the patient through user modeling and context awareness. The ontology will be our foundation for user modeling of patients in the telehealth domain, and hence it is one of the initial steps toward our vision. Compared to other ontologies within the domain, ours has...... explicit focus on: 1) personality traits of the patient, which is vital for fulfillment of our vision in term of adaptability, and 2) use of international standards to describe diseases, func-tioning and physiological measurement – ICD, ICF and SNOMED respectively – to promote interoperability...

  1. Mining Association Rules among Gene Functions in Clusters of Similar Gene Expression Maps.

    Science.gov (United States)

    An, Li; Obradovic, Zoran; Smith, Desmond; Bodenreider, Olivier; Megalooikonomou, Vasileios

    2009-11-01

    Association rules mining methods have been recently applied to gene expression data analysis to reveal relationships between genes and different conditions and features. However, not much effort has focused on detecting the relation between gene expression maps and related gene functions. Here we describe such an approach to mine association rules among gene functions in clusters of similar gene expression maps on mouse brain. The experimental results show that the detected association rules make sense biologically. By inspecting the obtained clusters and the genes having the gene functions of frequent itemsets, interesting clues were discovered that provide valuable insight to biological scientists. Moreover, discovered association rules can be potentially used to predict gene functions based on similarity of gene expression maps.

  2. Epistemology and ontology in core ontologies: FOLaw and LRI-Core, two core ontologies for law

    NARCIS (Netherlands)

    Breukers, J.A.P.J.; Hoekstra, R.J.

    2004-01-01

    For more than a decade constructing ontologies for legal domains, we, at the Leibniz Center for Law, felt really the need to develop a core ontology for law that would enable us to re-use the common denominator of the various legal domains. In this paper we present two core ontologies for law. The

  3. Activity theories and the ontology of psychology

    DEFF Research Database (Denmark)

    Mammen, Jens Skaun; Mironenko, Irina

    2015-01-01

    to exceed some, mostly implicit, ontological restrictions in traditional AT and free it from an embracement of functionalism and mechanicism, rooted in Renaissance Physics. The analysis goes back to Aristotle’s understanding of the freely moving animal in its ecology and introduces some dualities...

  4. Benchmarking ontologies: bigger or better?

    Directory of Open Access Journals (Sweden)

    Lixia Yao

    2011-01-01

    Full Text Available A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1 four of the most common medical ontologies with respect to a corpus of medical documents and (2 seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.

  5. Analysis of the robustness of network-based disease-gene prioritization methods reveals redundancy in the human interactome and functional diversity of disease-genes.

    Directory of Open Access Journals (Sweden)

    Emre Guney

    Full Text Available Complex biological systems usually pose a trade-off between robustness and fragility where a small number of perturbations can substantially disrupt the system. Although biological systems are robust against changes in many external and internal conditions, even a single mutation can perturb the system substantially, giving rise to a pathophenotype. Recent advances in identifying and analyzing the sequential variations beneath human disorders help to comprehend a systemic view of the mechanisms underlying various disease phenotypes. Network-based disease-gene prioritization methods rank the relevance of genes in a disease under the hypothesis that genes whose proteins interact with each other tend to exhibit similar phenotypes. In this study, we have tested the robustness of several network-based disease-gene prioritization methods with respect to the perturbations of the system using various disease phenotypes from the Online Mendelian Inheritance in Man database. These perturbations have been introduced either in the protein-protein interaction network or in the set of known disease-gene associations. As the network-based disease-gene prioritization methods are based on the connectivity between known disease-gene associations, we have further used these methods to categorize the pathophenotypes with respect to the recoverability of hidden disease-genes. Our results have suggested that, in general, disease-genes are connected through multiple paths in the human interactome. Moreover, even when these paths are disturbed, network-based prioritization can reveal hidden disease-gene associations in some pathophenotypes such as breast cancer, cardiomyopathy, diabetes, leukemia, parkinson disease and obesity to a greater extend compared to the rest of the pathophenotypes tested in this study. Gene Ontology (GO analysis highlighted the role of functional diversity for such diseases.

  6. Ontology-based Information Retrieval

    DEFF Research Database (Denmark)

    Styltsvig, Henrik Bulskov

    In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information...... retrieval. This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use......, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario. To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun...

  7. Development of an Ontology for Periodontitis.

    Science.gov (United States)

    Suzuki, Asami; Takai-Igarashi, Takako; Nakaya, Jun; Tanaka, Hiroshi

    2015-01-01

    In the clinical dentists and periodontal researchers' community, there is an obvious demand for a systems model capable of linking the clinical presentation of periodontitis to underlying molecular knowledge. A computer-readable representation of processes on disease development will give periodontal researchers opportunities to elucidate pathways and mechanisms of periodontitis. An ontology for periodontitis can be a model for integration of large variety of factors relating to a complex disease such as chronic inflammation in different organs accompanied by bone remodeling and immune system disorders, which has recently been referred to as osteoimmunology. Terms characteristic of descriptions related to the onset and progression of periodontitis were manually extracted from 194 review articles and PubMed abstracts by experts in periodontology. We specified all the relations between the extracted terms and constructed them into an ontology for periodontitis. We also investigated matching between classes of our ontology and that of Gene Ontology Biological Process. We developed an ontology for periodontitis called Periodontitis-Ontology (PeriO). The pathological progression of periodontitis is caused by complex, multi-factor interrelationships. PeriO consists of all the required concepts to represent the pathological progression and clinical treatment of periodontitis. The pathological processes were formalized with reference to Basic Formal Ontology and Relation Ontology, which accounts for participants in the processes realized by biological objects such as molecules and cells. We investigated the peculiarity of biological processes observed in pathological progression and medical treatments for the disease in comparison with Gene Ontology Biological Process (GO-BP) annotations. The results indicated that peculiarities of Perio existed in 1) granularity and context dependency of both the conceptualizations, and 2) causality intrinsic to the pathological processes

  8. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  9. Stability Analysis of Learning Algorithms for Ontology Similarity Computation

    Directory of Open Access Journals (Sweden)

    Wei Gao

    2013-01-01

    Full Text Available Ontology, as a useful tool, is widely applied in lots of areas such as social science, computer science, and medical science. Ontology concept similarity calculation is the key part of the algorithms in these applications. A recent approach is to make use of similarity between vertices on ontology graphs. It is, instead of pairwise computations, based on a function that maps the vertex set of an ontology graph to real numbers. In order to obtain this, the ranking learning problem plays an important and essential role, especially k-partite ranking algorithm, which is suitable for solving some ontology problems. A ranking function is usually used to map the vertices of an ontology graph to numbers and assign ranks of the vertices through their scores. Through studying a training sample, such a function can be learned. It contains a subset of vertices of the ontology graph. A good ranking function means small ranking mistakes and good stability. For ranking algorithms, which are in a well-stable state, we study generalization bounds via some concepts of algorithmic stability. We also find that kernel-based ranking algorithms stated as regularization schemes in reproducing kernel Hilbert spaces satisfy stability conditions and have great generalization abilities.

  10. Completeness, supervenience and ontology

    International Nuclear Information System (INIS)

    Maudlin, Tim W E

    2007-01-01

    In 1935, Einstein, Podolsky and Rosen raised the issue of the completeness of the quantum description of a physical system. What they had in mind is whether or not the quantum description is informationally complete, in that all physical features of a system can be recovered from it. In a collapse theory such as the theory of Ghirardi, Rimini and Weber, the quantum wavefunction is informationally complete, and this has often been taken to suggest that according to that theory the wavefunction is all there is. If we distinguish the ontological completeness of a description from its informational completeness, we can see that the best interpretations of the GRW theory must postulate more physical ontology than just the wavefunction

  11. Insider Threat Indicator Ontology

    Science.gov (United States)

    2016-05-25

    Figure 12 shows a key with the symbols used to visualize the ontology. Figure 12: Diagram Key CMU/SEI-2016-TR-007 | SOFTWARE ENGINEERING...security guard is defined as an employee who guards, patrols, or monitors a premises to prevent theft, violence , or infractions of rules...malicious actions of an insider. Event Organization isVictimOrganizationOf hasWife hasSpouse This relates a married woman to her spouse. Person Person

  12. Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies.

    Science.gov (United States)

    Lamy, Jean-Baptiste

    2017-07-01

    Ontologies are widely used in the biomedical domain. While many tools exist for the edition, alignment or evaluation of ontologies, few solutions have been proposed for ontology programming interface, i.e. for accessing and modifying an ontology within a programming language. Existing query languages (such as SPARQL) and APIs (such as OWLAPI) are not as easy-to-use as object programming languages are. Moreover, they provide few solutions to difficulties encountered with biomedical ontologies. Our objective was to design a tool for accessing easily the entities of an OWL ontology, with high-level constructs helping with biomedical ontologies. From our experience on medical ontologies, we identified two difficulties: (1) many entities are represented by classes (rather than individuals), but the existing tools do not permit manipulating classes as easily as individuals, (2) ontologies rely on the open-world assumption, whereas the medical reasoning must consider only evidence-based medical knowledge as true. We designed a Python module for ontology-oriented programming. It allows access to the entities of an OWL ontology as if they were objects in the programming language. We propose a simple high-level syntax for managing classes and the associated "role-filler" constraints. We also propose an algorithm for performing local closed world reasoning in simple situations. We developed Owlready, a Python module for a high-level access to OWL ontologies. The paper describes the architecture and the syntax of the module version 2. It details how we integrated the OWL ontology model with the Python object model. The paper provides examples based on Gene Ontology (GO). We also demonstrate the interest of Owlready in a use case focused on the automatic comparison of the contraindications of several drugs. This use case illustrates the use of the specific syntax proposed for manipulating classes and for performing local closed world reasoning. Owlready has been successfully

  13. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies.

    Science.gov (United States)

    Walls, Ramona L; Deck, John; Guralnick, Robert; Baskauf, Steve; Beaman, Reed; Blum, Stanley; Bowers, Shawn; Buttigieg, Pier Luigi; Davies, Neil; Endresen, Dag; Gandolfo, Maria Alejandra; Hanner, Robert; Janning, Alyssa; Krishtalka, Leonard; Matsunaga, Andréa; Midford, Peter; Morrison, Norman; Ó Tuama, Éamonn; Schildhauer, Mark; Smith, Barry; Stucky, Brian J; Thomer, Andrea; Wieczorek, John; Whitacre, Jamie; Wooley, John

    2014-01-01

    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

  14. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies.

    Directory of Open Access Journals (Sweden)

    Ramona L Walls

    Full Text Available The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques, as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1 individual organisms, including voucher specimens from ecological studies and museum specimens, 2 bulk or environmental samples (e.g., gut contents, soil, water that include DNA, other molecules, and potentially many organisms, especially microbes, and 3 survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and

  15. Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

    Science.gov (United States)

    Baskauf, Steve; Blum, Stanley; Bowers, Shawn; Davies, Neil; Endresen, Dag; Gandolfo, Maria Alejandra; Hanner, Robert; Janning, Alyssa; Krishtalka, Leonard; Matsunaga, Andréa; Midford, Peter; Tuama, Éamonn Ó.; Schildhauer, Mark; Smith, Barry; Stucky, Brian J.; Thomer, Andrea; Wieczorek, John; Whitacre, Jamie; Wooley, John

    2014-01-01

    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers

  16. Identification of vernalization responsive genes in the winter wheat ...

    Indian Academy of Sciences (India)

    The Jing841-specific DEGs were screened and subjected to functional annotation using gene ontology (GO) database.Vernalization responsive genes among the specific genes were selected for validation by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and the expression change over the time ...

  17. Analysis of multiplex gene expression maps obtained by voxelation

    Directory of Open Access Journals (Sweden)

    Smith Desmond J

    2009-04-01

    Full Text Available Abstract Background Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. Results To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in

  18. Comparative mapping reveals similar linkage of functional genes to ...

    Indian Academy of Sciences (India)

    logous genes and QTL of yield-related traits by silico map- ping and population mapping in O. sativa. Our results revealed that B. napus and O. sativa shared homologous se- quences of genes with similar functions, as well as consistent linkage relationships between genes and agronomic traits. Materials and methods.

  19. Functional models for large-scale gene regulation networks: realism and fiction.

    Science.gov (United States)

    Lagomarsino, Marco Cosentino; Bassetti, Bruno; Castellani, Gastone; Remondini, Daniel

    2009-04-01

    High-throughput experiments are shedding light on the topology of large regulatory networks and at the same time their functional states, namely the states of activation of the nodes (for example transcript or protein levels) in different conditions, times, environments. We now possess a certain amount of information about these two levels of description, stored in libraries, databases and ontologies. A current challenge is to bridge the gap between topology and function, i.e. developing quantitative models aimed at characterizing the expression patterns of large sets of genes. However, approaches that work well for small networks become impossible to master at large scales, mainly because parameters proliferate. In this review we discuss the state of the art of large-scale functional network models, addressing the issue of what can be considered as "realistic" and what the main limitations may be. We also show some directions for future work, trying to set the goals that future models should try to achieve. Finally, we will emphasize the possible benefits in the understanding of biological mechanisms underlying complex multifactorial diseases, and in the development of novel strategies for the description and the treatment of such pathologies.

  20. A unified anatomy ontology of the vertebrate skeletal system.

    Directory of Open Access Journals (Sweden)

    Wasila M Dahdul

    Full Text Available The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO, to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish and multispecies (teleost, amphibian vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages, and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO, Gene Ontology (GO, Uberon, and Cell Ontology (CL, and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  1. Standardized description of scientific evidence using the Evidence Ontology (ECO).

    Science.gov (United States)

    Chibucos, Marcus C; Mungall, Christopher J; Balakrishnan, Rama; Christie, Karen R; Huntley, Rachael P; White, Owen; Blake, Judith A; Lewis, Suzanna E; Giglio, Michelle

    2014-01-01

    The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org. © The Author(s) 2014. Published by Oxford University Press.

  2. A unified anatomy ontology of the vertebrate skeletal system.

    Science.gov (United States)

    Dahdul, Wasila M; Balhoff, James P; Blackburn, David C; Diehl, Alexander D; Haendel, Melissa A; Hall, Brian K; Lapp, Hilmar; Lundberg, John G; Mungall, Christopher J; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E; Vickaryous, Matthew K; Westerfield, Monte; Mabee, Paula M

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  3. An Ontology for Software Engineering Education

    Science.gov (United States)

    Ling, Thong Chee; Jusoh, Yusmadi Yah; Adbullah, Rusli; Alwi, Nor Hayati

    2013-01-01

    Software agents communicate using ontology. It is important to build an ontology for specific domain such as Software Engineering Education. Building an ontology from scratch is not only hard, but also incur much time and cost. This study aims to propose an ontology through adaptation of the existing ontology which is originally built based on a…

  4. Nonviral gene transfection nanoparticles: function and applications in the brain.

    Science.gov (United States)

    Roy, Indrajit; Stachowiak, Michal K; Bergey, Earl J

    2008-06-01

    In vivo transfer and expression of foreign genes allows for the elucidation of functions of genes in living organisms and generation of disease models in animals that more closely resemble the etiology of human diseases. Gene therapy holds promise for the cure of a number of diseases at the fundamental level. Synthetic "nonviral" materials are fast gaining popularity as safe and efficient vectors for delivering genes to target organs. Not only can nanoparticles function as efficient gene carriers, they also can simultaneously carry diagnostic probes for direct "real-time" visualization of gene transfer and downstream processes. This review has focused on the central nervous system (CNS) as the target for nonviral gene transfer, with special emphasis on organically modified silica (ORMOSIL) nanoparticles developed in our laboratory. These nanoparticles have shown robust gene transfer efficiency in brain cells in vivo and allowed to investigate mechanisms that control neurogenesis as well as neurodegenerative disorders.

  5. Functional analysis of the molecular interactions of TATA box-containing genes and essential genes.

    Science.gov (United States)

    Bae, Sang-Hun; Han, Hyun Wook; Moon, Jisook

    2015-01-01

    Genes can be divided into TATA-containing genes and TATA-less genes according to the presence of TATA box elements at promoter regions. TATA-containing genes tend to be stress-responsive, whereas many TATA-less genes are known to be related to cell growth or "housekeeping" functions. In a previous study, we demonstrated that there are striking differences among four gene sets defined by the presence of TATA box (TATA-containing) and essentiality (TATA-less) with respect to number of associated transcription factors, amino acid usage, and functional annotation. Extending this research in yeast, we identified KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways that are statistically enriched in TATA-containing or TATA-less genes and evaluated the possibility that the enriched pathways are related to stress or growth as reflected by the individual functions of the genes involved. According to their enrichment for either of these two gene sets, we sorted KEGG pathways into TATA-containing-gene-enriched pathways (TEPs) and essential-gene-enriched pathways (EEPs). As expected, genes in TEPs and EEPs exhibited opposite results in terms of functional category, transcriptional regulation, codon adaptation index, and network properties, suggesting the possibility that the bipolar patterns in these pathways also contribute to the regulation of the stress response and to cell survival. Our findings provide the novel insight that significant enrichment of TATA-binding or TATA-less genes defines pathways as stress-responsive or growth-related.

  6. Nursing theories as nursing ontologies.

    Science.gov (United States)

    Flaming, Don

    2004-10-01

    By understanding the constructions of knowledge we currently label nursing theories as nursing ontologies, nurses can perceive these conceptualizations differently. Paul Ricoeur and Stephen White offer a conceptualization of ontology that differs from traditional, realist perspectives because they assume that a person's experience of a phenomenon (e.g., nursing) will change, but also maintain some stability. Discussing nursing ontologies, rather than nursing theories, might increase philosophy's status in nursing and may also more accurately reflect the experience of being a nurse.

  7. Semantic Similarity in Biomedical Ontologies

    OpenAIRE

    Pesquita, Catia; Faria, Daniel; Falc?o, Andr? O.; Lord, Phillip; Couto, Francisco M.

    2009-01-01

    In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies hav...

  8. MicroRNA and gene signature of severe cutaneous drug ...

    African Journals Online (AJOL)

    Functional annotation by DAVID web tool. DAVID stands for: Database for Annotation,. Visualization and Integrated Discovery. DAVID tool was used to annotate/confirm the function of the genes list related to Granulysin. Selection of. Gene Ontology GO standings with adjusted p- value less than 0.05. Network and Pathway ...

  9. Inferring gene expression dynamics via functional regression analysis

    Directory of Open Access Journals (Sweden)

    Leng Xiaoyan

    2008-01-01

    Full Text Available Abstract Background Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene expression associated with different developmental stages to each other to study patterns of long-term developmental gene regulation. We use tools from functional data analysis to study dynamic changes by relating temporal gene expression profiles of different developmental stages to each other. Results We demonstrate that functional regression methodology can pinpoint relationships that exist between temporary gene expression profiles for different life cycle phases and incorporates dimension reduction as needed for these high-dimensional data. By applying these tools, gene expression profiles for pupa and adult phases are found to be strongly related to the profiles of the same genes obtained during the embryo phase. Moreover, one can distinguish between gene groups that exhibit relationships with positive and others with negative associations between later life and embryonal expression profiles. Specifically, we find a positive relationship in expression for muscle development related genes, and a negative relationship for strictly maternal genes for Drosophila, using temporal gene expression profiles. Conclusion Our findings point to specific reactivation patterns of gene expression during the Drosophila life cycle which differ in characteristic ways between various gene groups. Functional regression emerges as a useful tool for relating gene expression patterns from different developmental stages, and avoids the problems with large numbers of parameters and multiple testing that affect alternative approaches.

  10. NegGOA: negative GO annotations selection using ontology structure.

    Science.gov (United States)

    Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

    2016-10-01

    Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. FOXA2 regulates a network of genes involved in critical functions of human intestinal epithelial cells.

    Science.gov (United States)

    Gosalia, Nehal; Yang, Rui; Kerschner, Jenny L; Harris, Ann

    2015-07-01

    The forkhead box A (FOXA) family of pioneer transcription factors is critical for the development of many endoderm-derived tissues. Their importance in regulating biological processes in the lung and liver is extensively characterized, though much less is known about their role in intestine. Here we investigate the contribution of FOXA2 to coordinating intestinal epithelial cell function using postconfluent Caco2 cells, differentiated into an enterocyte-like model. FOXA2 binding sites genome-wide were determined by ChIP-seq and direct targets of the factor were validated by ChIP-qPCR and siRNA-mediated depletion of FOXA1/2 followed by RT-qPCR. Peaks of FOXA2 occupancy were frequent at loci contributing to gene ontology pathways of regulation of cell migration, cell motion, and plasma membrane function. Depletion of both FOXA1 and FOXA2 led to a significant reduction in the expression of multiple transmembrane proteins including ion channels and transporters, which form a network that is essential for maintaining normal ion and solute transport. One of the targets was the adenosine A2B receptor, and reduced receptor mRNA levels were associated with a functional decrease in intracellular cyclic AMP. We also observed that 30% of FOXA2 binding sites contained a GATA motif and that FOXA1/A2 depletion reduced GATA-4, but not GATA-6 protein levels. These data show that FOXA2 plays a pivotal role in regulating intestinal epithelial cell function. Moreover, that the FOXA and GATA families of transcription factors may work cooperatively to regulate gene expression genome-wide in the intestinal epithelium. Copyright © 2015 the American Physiological Society.

  12. Functional genomics complements quantitative genetics in identifying disease-gene associations.

    Directory of Open Access Journals (Sweden)

    Yuanfang Guan

    2010-11-01

    Full Text Available An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL linkage mapping and genome-wide association studies (GWAS. However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD, a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary

  13. Ontology authoring with Forza

    CSIR Research Space (South Africa)

    Keet, CM

    2014-11-01

    Full Text Available step to make the construction of ontologies more agile and apt to the needs of organisations and business enterprises. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided... is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. CIKM’13, October 27 - November 01 2013, San Francisco, CA, USA...

  14. Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

    Science.gov (United States)

    Armean, Irina M; Lilley, Kathryn S; Trotter, Matthew W B; Pilkington, Nicholas C V; Holden, Sean B

    2018-01-30

    Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S. cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤ 0.97, outperforming go2ppi - a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. https://github.com/ima23/maxent-ppi. sbh11@cl.cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  15. ONSET: Automated foundational ontology selection and explanation

    CSIR Research Space (South Africa)

    Khan, Z

    2012-10-01

    Full Text Available It has been shown that using a foundational ontology for domain ontology development is beneficial in theory and practice. However, developers have difficulty with choosing the appropriate foundational ontology, and why. In order to solve...

  16. When natural selection gives gene function the cold shoulder.

    Science.gov (United States)

    Cutter, Asher D; Jovelin, Richard

    2015-11-01

    It is tempting to invoke organismal selection as perpetually optimizing the function of any given gene. However, natural selection can drive genic functional change without improvement of biochemical activity, even to the extinction of gene activity. Detrimental mutations can creep in owing to linkage with other selectively favored loci. Selection can promote functional degradation, irrespective of genetic drift, when adaptation occurs by loss of gene function. Even stabilizing selection on a trait can lead to divergence of the underlying molecular constituents. Selfish genetic elements can also proliferate independent of any functional benefits to the host genome. Here we review the logic and evidence for these diverse processes acting in genome evolution. This collection of distinct evolutionary phenomena - while operating through easily understandable mechanisms - all contribute to the seemingly counterintuitive notion that maintenance or improvement of a gene's biochemical function sometimes do not determine its evolutionary fate. © 2015 WILEY Periodicals, Inc.

  17. Towards ontology based search and knowledgesharing using domain ontologies

    DEFF Research Database (Denmark)

    Zambach, Sine

    This paper reports on work in progress. We present work on domain specific verbs and their role as relations in domain ontologies. The domain ontology which is in focus for our research is modeled in cooperation with the Danish biotech company Novo Nordic. Two of the main purposes of domain...... ontologies for enterprises are as background for search and knowledge sharing used for e.g. multi lingual product development. Our aim is to use linguistic methods and logic to construct consistent ontologies that can be used in both a search perspective and as knowledge sharing.This focuses on identifying...... verbs for relations in the ontology modeling. For this work we use frequency lists from a biomedical text corpus of different genres as well as a study of the relations used in other biomedical text mining tools. In addition, we discuss how these relations can be used in broarder perspective....

  18. Methods for transient assay of gene function in floral tissues

    Directory of Open Access Journals (Sweden)

    Pathirana Nilangani N

    2007-01-01

    Full Text Available Abstract Background There is considerable interest in rapid assays or screening systems for assigning gene function. However, analysis of gene function in the flowers of some species is restricted due to the difficulty of producing stably transformed transgenic plants. As a result, experimental approaches based on transient gene expression assays are frequently used. Biolistics has long been used for transient over-expression of genes of interest, but has not been exploited for gene silencing studies. Agrobacterium-infiltration has also been used, but the focus primarily has been on the transient transformation of leaf tissue. Results Two constructs, one expressing an inverted repeat of the Antirrhinum majus (Antirrhinum chalcone synthase gene (CHS and the other an inverted repeat of the Antirrhinum transcription factor gene Rosea1, were shown to effectively induce CHS and Rosea1 gene silencing, respectively, when introduced biolistically into petal tissue of Antirrhinum flowers developing in vitro. A high-throughput vector expressing the Antirrhinum CHS gene attached to an inverted repeat of the nos terminator was also shown to be effective. Silencing spread systemically to create large zones of petal tissue lacking pigmentation, with transmission of the silenced state spreading both laterally within the affected epidermal cell layer and into lower cell layers, including the epidermis of the other petal surface. Transient Agrobacterium-mediated transformation of petal tissue of tobacco and petunia flowers in situ or detached was also achieved, using expression of the reporter genes GUS and GFP to visualise transgene expression. Conclusion We demonstrate the feasibility of using biolistics-based transient RNAi, and transient transformation of petal tissue via Agrobacterium infiltration to study gene function in petals. We have also produced a vector for high throughput gene silencing studies, incorporating the option of using T-A cloning to

  19. Application of neuroanatomical ontologies for neuroimaging data annotation

    Directory of Open Access Journals (Sweden)

    Jessica A Turner

    2010-06-01

    Full Text Available The annotation of functional neuroimaging results for data sharing and reuse is particularly challenging, due to the diversity of terminologies of neuroanatomical structures and cortical parcellation schemes. To address this challenge, we extended the Foundational Model of Anatomy Ontology (FMA to include cytoarchitectural, Brodmann area labels, and a morphological cortical labeling scheme (e.g., the part of Brodmann area 6 in the left precentral gyrus. This representation was also used to augment the neuroanatomical axis of RadLex, the ontology for clinical imaging. The resulting neuroanatomical ontology contains explicit relationships indicating which brain regions are “part of” which other regions, across cytoarchitectural and morphological labeling schemas. We annotated a large functional neuroimaging dataset with terms from the ontology and applied a reasoning engine to analyze this dataset in conjunction with the ontology, and achieved successful inferences from the most specific level (e.g., how many subjects showed activation in a sub-part of the middle frontal gyrus to more general (how many activations were found in areas connected via a known white matter tract?. In summary, we have produced a neuroanatomical ontology that harmonizes several different terminologies of neuroanatomical structures and cortical parcellation schemes. This neuranatomical ontology is publicly available as a view of FMA at the Bioportal website at http://rest.bioontology.org/bioportal/ontologies/download/10005. The ontological encoding of anatomic knowledge can be exploited by computer reasoning engines to make inferences about neuroanatomical relationships described in imaging datasets using different terminologies. This approach could ultimately enable knowledge discovery from large, distributed fMRI studies or medical record mining.

  20. Human Intellectual Disability Genes Form Conserved Functional Modules in Drosophila

    Science.gov (United States)

    Oortveld, Merel A. W.; Keerthikumar, Shivakumar; Oti, Martin; Nijhof, Bonnie; Fernandes, Ana Clara; Kochinke, Korinna; Castells-Nobau, Anna; van Engelen, Eva; Ellenkamp, Thijs; Eshuis, Lilian; Galy, Anne; van Bokhoven, Hans; Habermann, Bianca; Brunner, Han G.; Zweier, Christiane; Verstreken, Patrik; Huynen, Martijn A.; Schenck, Annette

    2013-01-01

    Intellectual Disability (ID) disorders, defined by an IQ below 70, are genetically and phenotypically highly heterogeneous. Identification of common molecular pathways underlying these disorders is crucial for understanding the molecular basis of cognition and for the development of therapeutic intervention strategies. To systematically establish their functional connectivity, we used transgenic RNAi to target 270 ID gene orthologs in the Drosophila eye. Assessment of neuronal function in behavioral and electrophysiological assays and multiparametric morphological analysis identified phenotypes associated with knockdown of 180 ID gene orthologs. Most of these genotype-phenotype associations were novel. For example, we uncovered 16 genes that are required for basal neurotransmission and have not previously been implicated in this process in any system or organism. ID gene orthologs with morphological eye phenotypes, in contrast to genes without phenotypes, are relatively highly expressed in the human nervous system and are enriched for neuronal functions, suggesting that eye phenotyping can distinguish different classes of ID genes. Indeed, grouping genes by Drosophila phenotype uncovered 26 connected functional modules. Novel links between ID genes successfully predicted that MYCN, PIGV and UPF3B regulate synapse development. Drosophila phenotype groups show, in addition to ID, significant phenotypic similarity also in humans, indicating that functional modules are conserved. The combined data indicate that ID disorders, despite their extreme genetic diversity, are caused by disruption of a limited number of highly connected functional modules. PMID:24204314

  1. Ontology through a Mindfulness Process

    Science.gov (United States)

    Bearance, Deborah; Holmes, Kimberley

    2015-01-01

    Traditionally, when ontology is taught in a graduate studies course on social research, there is a tendency for this concept to be examined through the process of lectures and readings. Such an approach often leaves graduate students to grapple with a personal embodiment of this concept and to comprehend how ontology can ground their research.…

  2. Tracking Changes during Ontology Evolution

    NARCIS (Netherlands)

    Noy, Natalya F.; Kunnatur, Sandhya; Klein, Michel; Musen, Mark A.

    2004-01-01

    As ontology development becomes a collaborative process, developers face the problem of maintaining versions of ontologies akin to maintaining versions of software code or versions of documents in large projects. Traditional versioning systems enable users to compare versions, examine changes, and

  3. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Barley Stem Rust Resistance Genes: Structure and Function

    Directory of Open Access Journals (Sweden)

    Andris Kleinhofs

    2009-07-01

    Full Text Available Rusts are biotrophic pathogens that attack many plant species but are particularly destructive on cereal crops. The stem rusts (caused by have historically caused severe crop losses and continue to threaten production today. Barley ( L. breeders have controlled major stem rust epidemics since the 1940s with a single durable resistance gene . As new epidemics have threatened, additional resistance genes were identified to counter new rust races, such as the complex locus against races QCCJ and TTKSK. To understand how these genes work, we initiated research to clone and characterize them. The gene encodes a unique protein kinase with dual kinase domains, an active kinase, and a pseudokinase. Function of both domains is essential to confer resistance. The and genes are closely linked and function coordinately to confer resistance to several wheat ( L. stem rust races, including the race TTKSK (also called Ug99 that threatens the world's barley and wheat crops. The gene encodes typical resistance gene domains NBS, LRR, and protein kinase but is unique in that all three domains reside in a single gene, a previously unknown structure among plant disease resistance genes. The gene encodes an actin depolymerizing factor that functions in cytoskeleton rearrangement.

  5. Investigating Gene Function in Cereal Rust Fungi by Plant-Mediated Virus-Induced Gene Silencing.

    Science.gov (United States)

    Panwar, Vinay; Bakkeren, Guus

    2017-01-01

    Cereal rust fungi are destructive pathogens, threatening grain production worldwide. Targeted breeding for resistance utilizing host resistance genes has been effective. However, breakdown of resistance occurs frequently and continued efforts are needed to understand how these fungi overcome resistance and to expand the range of available resistance genes. Whole genome sequencing, transcriptomic and proteomic studies followed by genome-wide computational and comparative analyses have identified large repertoire of genes in rust fungi among which are candidates predicted to code for pathogenicity and virulence factors. Some of these genes represent defence triggering avirulence effectors. However, functions of most genes still needs to be assessed to understand the biology of these obligate biotrophic pathogens. Since genetic manipulations such as gene deletion and genetic transformation are not yet feasible in rust fungi, performing functional gene studies is challenging. Recently, Host-induced gene silencing (HIGS) has emerged as a useful tool to characterize gene function in rust fungi while infecting and growing in host plants. We utilized Barley stripe mosaic virus-mediated virus induced gene silencing (BSMV-VIGS) to induce HIGS of candidate rust fungal genes in the wheat host to determine their role in plant-fungal interactions. Here, we describe the methods for using BSMV-VIGS in wheat for functional genomics study in cereal rust fungi.

  6. Knowledge Representation in Patient Safety Reporting: An Ontological Approach

    OpenAIRE

    Liang Chen; Yang Gong

    2016-01-01

    Purpose: The current development of patient safety reporting systems is criticized for loss of information and low data quality due to the lack of a uniformed domain knowledge base and text processing functionality. To improve patient safety reporting, the present paper suggests an ontological representation of patient safety knowledge. Design/methodology/approach: We propose a framework for constructing an ontological knowledge base of patient safety. The present paper describes our desig...

  7. PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants

    OpenAIRE

    Qi Liu; Changjun Ding; Yanguang Chu; Jiafei Chen; Weixi Zhang; Bingyu Zhang; Qinjun Huang; Xiaohua Su

    2016-01-01

    Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network...

  8. In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles.

    Science.gov (United States)

    Lin, Frank P Y; Coiera, Enrico; Lan, Ruiting; Sintchenko, Vitali

    2009-03-17

    In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks

  9. In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

    Directory of Open Access Journals (Sweden)

    Lan Ruiting

    2009-03-01

    Full Text Available Abstract Background In silico candidate gene prioritisation (CGP aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC of 0.911 in Escherichia coli K-12 (EC-K12 and 0.978 Streptococcus agalactiae 2603 (SA-2603 genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our

  10. Functionally enigmatic genes: a case study of the brain ignorome.

    Directory of Open Access Journals (Sweden)

    Ashutosh K Pandey

    Full Text Available What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.

  11. Functionally Enigmatic Genes: A Case Study of the Brain Ignorome

    Science.gov (United States)

    Pandey, Ashutosh K.; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W.

    2014-01-01

    What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed—the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases—ELMOD1, TMEM88B, and DZANK1—we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes. PMID:24523945

  12. Understanding doublecortin-like kinase gene function through transgenesis

    NARCIS (Netherlands)

    Schenk, Geert J.

    2010-01-01

    Doublecortin (DCX) and DCX-domain containing Doublecortin-Like Kinase (DCLK) gene splice variants function during embryonic development, where they play a role in microtubule binding. Although a role for the DCLK gene during embryogenesis is clearly established, it encodes multiple, different

  13. The ALMT Gene Family Performs Multiple Functions in Plants

    Directory of Open Access Journals (Sweden)

    Jie Liu

    2018-02-01

    Full Text Available The aluminium activated malate transporter (ALMT gene family is named after the first member of the family identified in wheat (Triticum aestivum L.. The product of this gene controls resistance to aluminium (Al toxicity. ALMT genes encode transmembrane proteins that function as anion channels and perform multiple functions involving the transport of organic anions (e.g., carboxylates and inorganic anions in cells. They share a PF11744 domain and are classified in the Fusaric acid resistance protein-like superfamily, CL0307. The proteins typically have five to seven transmembrane regions in the N-terminal half and a long hydrophillic C-terminal tail but predictions of secondary structure vary. Although widely spread in plants, relatively little information is available on the roles performed by other members of this family. In this review, we summarized functions of ALMT gene families, including Al resistance, stomatal function, mineral nutrition, microbe interactions, fruit acidity, light response and seed development.

  14. Developmentally distinct MYB genes encode functionally equivalent proteins in Arabidopsis.

    Science.gov (United States)

    Lee, M M; Schiefelbein, J

    2001-05-01

    The duplication and divergence of developmental control genes is thought to have driven morphological diversification during the evolution of multicellular organisms. To examine the molecular basis of this process, we analyzed the functional relationship between two paralogous MYB transcription factor genes, WEREWOLF (WER) and GLABROUS1 (GL1), in Arabidopsis. The WER and GL1 genes specify distinct cell types and exhibit non-overlapping expression patterns during Arabidopsis development. Nevertheless, reciprocal complementation experiments with a series of gene fusions showed that WER and GL1 encode functionally equivalent proteins, and their unique roles in plant development are entirely due to differences in their cis-regulatory sequences. Similar experiments with a distantly related MYB gene (MYB2) showed that its product cannot functionally substitute for WER or GL1. Furthermore, an analysis of the WER and GL1 proteins shows that conserved sequences correspond to specific functional domains. These results provide new insights into the evolution of the MYB gene family in Arabidopsis, and, more generally, they demonstrate that novel developmental gene function may arise solely by the modification of cis-regulatory sequences.

  15. Ontology in association rules.

    Science.gov (United States)

    Ferraz, Inhaúma Neves; Garcia, Ana Cristina Bicharra

    2013-01-01

    Data mining has emerged to address the problem of transforming data into useful knowledge. Although most data mining techniques, such as the use of association rules, may substantially reduce the search effort over large data sets, often, the consequential outcomes surpass the amount of information humanly manageable. On the other hand, important association rules may be overlooked owing to the setting of the support threshold, which is a very subjective metric, but rooted in most data mining techniques. This paper presents a study on the effects, in terms of precision and recall, of using a data preparation technique, called SemPrune, which is built on domain ontology. SemPrune is intended for pre- and post-processing phases of data mining. Identifying generalization/specialization relations, as well as composition/decomposition relations, is the key to successfully applying SemPrune.

  16. Building a developmental toxicity ontology.

    Science.gov (United States)

    Baker, Nancy; Boobis, Alan; Burgoon, Lyle; Carney, Edward; Currie, Richard; Fritsche, Ellen; Knudsen, Thomas; Laffont, Madeleine; Piersma, Aldert H; Poole, Alan; Schneider, Steffen; Daston, George

    2018-04-03

    As more information is generated about modes of action for developmental toxicity and more data are generated using high-throughput and high-content technologies, it is becoming necessary to organize that information. This report discussed the need for a systematic representation of knowledge about developmental toxicity (i.e., an ontology) and proposes a method to build one based on knowledge of developmental biology and mode of action/ adverse outcome pathways in developmental toxicity. This report is the result of a consensus working group developing a plan to create an ontology for developmental toxicity that spans multiple levels of biological organization. This report provide a description of some of the challenges in building a developmental toxicity ontology and outlines a proposed methodology to meet those challenges. As the ontology is built on currently available web-based resources, a review of these resources is provided. Case studies on one of the most well-understood morphogens and developmental toxicants, retinoic acid, are presented as examples of how such an ontology might be developed. This report outlines an approach to construct a developmental toxicity ontology. Such an ontology will facilitate computer-based prediction of substances likely to induce human developmental toxicity. © 2018 Wiley Periodicals, Inc.

  17. Bioinformatics tools for predicting GPCR gene functions.

    Science.gov (United States)

    Suwa, Makiko

    2014-01-01

    The automatic classification of GPCRs by bioinformatics methodology can provide functional information for new GPCRs in the whole 'GPCR proteome' and this information is important for the development of novel drugs. Since GPCR proteome is classified hierarchically, general ways for GPCR function prediction are based on hierarchical classification. Various computational tools have been developed to predict GPCR functions; those tools use not simple sequence searches but more powerful methods, such as alignment-free methods, statistical model methods, and machine learning methods used in protein sequence analysis, based on learning datasets. The first stage of hierarchical function prediction involves the discrimination of GPCRs from non-GPCRs and the second stage involves the classification of the predicted GPCR candidates into family, subfamily, and sub-subfamily levels. Then, further classification is performed according to their protein-protein interaction type: binding G-protein type, oligomerized partner type, etc. Those methods have achieved predictive accuracies of around 90 %. Finally, I described the future subject of research of the bioinformatics technique about functional prediction of GPCR.

  18. Ranking, selecting, and prioritising genes with desirability functions

    Directory of Open Access Journals (Sweden)

    Stanley E. Lazic

    2015-11-01

    Full Text Available In functional genomics experiments, researchers often select genes to follow-up or validate from a long list of differentially expressed genes. Typically, sharp thresholds are used to bin genes into groups such as significant/non-significant or fold change above/below a cut-off value, and ad hoc criteria are also used such as favouring well-known genes. Binning, however, is inefficient and does not take the uncertainty of the measurements into account. Furthermore, p-values, fold-changes, and other outcomes are treated as equally important, and relevant genes may be overlooked with such an approach. Desirability functions are proposed as a way to integrate multiple selection criteria for ranking, selecting, and prioritising genes. These functions map any variable to a continuous 0–1 scale, where one is maximally desirable and zero is unacceptable. Multiple selection criteria are then combined to provide an overall desirability that is used to rank genes. In addition to p-values and fold-changes, further experimental results and information contained in databases can be easily included as criteria. The approach is demonstrated with a breast cancer microarray data set. The functions and an example data set can be found in the desiR package on CRAN (https://cran.r-project.org/web/packages/desiR/ and the development version is available on GitHub (https://github.com/stanlazic/desiR.

  19. Ontological knowledge structure of intuitive biology

    Science.gov (United States)

    Martin, Suzanne Michele

    It has become increasingly important for individuals to understand infections disease, as there has been a tremendous rise in viral and bacterial disease. This research examines systematic misconceptions regarding the characteristics of viruses and bacteria present in individuals previously educated in biological sciences at a college level. 90 pre-nursing students were administered the Knowledge Acquisition Device (KAD) which consists of 100 True/False items that included statements about the possible attributes of four entities: bacteria, virus, amoeba, and protein. Thirty pre-nursing students, who incorrectly stated that viruses were alive, were randomly assigned to three conditions. (1) exposed to information about the ontological nature of viruses, (2) Information about viruses, (3) control. In the condition that addressed the ontological nature of a virus, all of those participants were able to classify viruses correctly as not alive; however any items that required inferences, such as viruses come in male and female forms or viruses breed with each other to make baby viruses were still incorrectly answered by all conditions in the posttest. It appears that functional knowledge, ex. If a virus is alive or dead, or how it is structured, is not enough for an individual to have a full and accurate understanding of viruses. Ontological knowledge information may alter the functional knowledge but underlying inferences remain systematically incorrect.

  20. Sample ontology, GOstat and ontology term enrichment - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available y to express samples in phase2.0 It is based on Cell Ontology, Disease Ontology a...nd Pan-vertebrate Uberon Ontology. The file format is OBO. Data file File name: Ontology File URL: ftp://ftp....biosciencedbc.jp/archive/fantom5/datafiles/LATEST/extra/Ontology/ File size: 1.8 MB Simple search URL - Dat

  1. Biochemical mechanisms determine the functional compatibility of heterologous genes

    DEFF Research Database (Denmark)

    Porse, Andreas; Schou, Thea S.; Munck, Christian

    2018-01-01

    Elucidating the factors governing the functional compatibility of horizontally transferred genes is important to understand bacterial evolution, including the emergence and spread of antibiotic resistance, and to successfully engineer biological systems. In silico efforts and work using single-gene...... libraries have suggested that sequence composition is a strong barrier for the successful integration of heterologous genes. Here we sample 200 diverse genes, representing >80% of sequenced antibiotic resistance genes, to interrogate the factors governing genetic compatibility in new hosts. In contrast...... to previous work, we find that GC content, codon usage, and mRNA-folding energy are of minor importance for the compatibility of mechanistically diverse gene products at moderate expression. Instead, we identify the phylogenetic origin, and the dependence of a resistance mechanism on host physiology, as major...

  2. Annotation modeling with formal ontologies: Implications for informal ontologies

    Science.gov (United States)

    Lumb, L. I.; Freemantle, J. R.; Lederman, J. I.; Aldridge, K. D.

    2009-04-01

    Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium's (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Framework (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL's internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus, the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

  3. Gene-environment interactions involving functional variants

    DEFF Research Database (Denmark)

    Barrdahl, Myrto; Rudolph, Anja; Hopper, John L

    2017-01-01

    epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER.......01. The strongest interaction result in relation to overall breast cancer risk was found between CFLAR-rs7558475 and current smoking (ORint  = 0.77, 95% CI: 0.67-0.88, pint  = 1.8 × 10(-4) ). The interaction with the strongest statistical evidence was found between 5q14-rs7707921 and alcohol consumption (ORint =1.......36, 95% CI: 1.16-1.59, pint  = 1.9 × 10(-5) ) in relation to ER- disease risk. The remaining two gene-environment interactions were also identified in relation to ER- breast cancer risk and were found between 3p21-rs6796502 and age at menarche (ORint  = 1.26, 95% CI: 1.12-1.43, pint =1.8 × 10...

  4. Functional Analysis of an ATP-Binding Cassette Transporter Gene in Botrytis cinerea by Gene Disruption

    OpenAIRE

    Masami, NAKAJIMA; Junko, SUZUKI; Takehiko, HOSAKA; Tadaaki, HIBI; Katsumi, AKUTSU; School of Agriculture, Ibaraki University; School of Agriculture, Ibaraki University; School of Agriculture, Ibaraki University; Department of Agriculture and Environmental Biology, The University of Tokyo; School of Agriculture, Ibaraki University

    2001-01-01

    The BMR1 gene encoding an ABC transporter was cloned from Botrytis cinerea. To examine the function of BMR1 in B.cinerea, we isolated BMR1-deficient mutants after gene disruption. Disruption vector pBcDF4 was constructed by replacing the BMR1-coding region with a hygromycin B phosphotransferase gene(hph)cassette. The BMR1 disruptants had an increased sensitivity to polyoxin and iprobenfos. Polyoxin and iprobenfos, structurally unrelated compounds, may therefore be substrates of BMR1.

  5. Aplicación de visualización de una ontología para el dominio del análisis del semen humano Application to visualize an ontology for the human semen analysis domain

    Directory of Open Access Journals (Sweden)

    Roberto Casañas

    2007-06-01

    Full Text Available En este trabajo se presenta el diseño e implementación de una ontología para el dominio del análisis del semen humano, cuyo objetivo es representar, organizar, formalizar y estandarizar el conocimiento del dominio, para que éste pueda ser compartido y reutilizado por distintos grupos de personas y aplicaciones de software. Para visualizar la ontología se desarrolló una aplicación basada en una arquitectura cliente/servidor para ambientes Web, la cual está constituida por un módulo de Administración y otro de Acceso Público. A través del primero se mantiene el sitio Web de la ontología, mientras que el segundo permite a los usuarios acceder al conocimiento almacenado y a un conjunto de recursos tales como imágenes, videos, artículos relativos al dominio, manuales y protocolos de laboratorio. La arquitectura propuesta facilita la observación y recuperación de las complejas estructuras de conocimiento, así como la navegación y administración de la información representada en la ontología. El enfoque utilizado en el diseño de los mecanismos de recuperación de información está dirigido tanto a usuarios poco familiarizados con el vocabulario del dominio, como a aquellos que ya lo conocen. Esta funcionalidad es de especial interés dado lo heterogénea que resulta la audiencia a la que está dirigida la ontología, como son profesionales y estudiantes de las ciencias de la salud, entre otros. La metodología Methontology fue seleccionada para desarrollar la ontología y se utilizó el editor Protégé para su implementación.The following work presents the design and implementation of an ontology for human semen analysis whose objective is to present, organize, formalize and standardize the domain knowledge, in order to be shared and reused by different groups of people and software applications. To visualize this ontology, a Web application based on a client/server architecture was developed, which is constituted by an

  6. The uniqueness of human social ontology

    Directory of Open Access Journals (Sweden)

    Anne L. C. Runehov

    2013-07-01

    Full Text Available Darwin’s theory of evolution argued that the human race evolved from the same original cellas all other animals. Biological principles such as randomness, adaption and natural selection led to the evolution of different species including the human species. Based on this evolutionary sameness, DonaldR. Griffin (1915-2003 challenged the behaviourist claim that animal communication is characterized asmerely groans of pain. This paper argues that (1 all animals are embedded in a social system. (2 However,that does not mean that all animals are social animals. (3 That the human social ontology remains to beunique due to a gene-cultural co-evolution.

  7. Introducing defeasibility into OWL ontologies

    CSIR Research Space (South Africa)

    Casini, G

    2015-10-01

    Full Text Available comprehensively. A major barrier is the lack of naturally occurring ontologies with defeasible features - the ideal candidates for evaluation. Such data is unavailable due to absence of tool support for representing defeasible features. In the past, defeasible...

  8. A Mobile Army of Ontologies

    DEFF Research Database (Denmark)

    Juul, Jesper

    2015-01-01

    can be considered most useful for different ludo-analytical questions. Are there differences between the existential statuses of virtual videogame worlds, tangible board game worlds, and the mundane? Can the experience of the player be used to explain the existence of the (video)game artifact? How do......Presentation at the Ludo-ontologies panel. Do we need ludo-ontologies, and what are they? In this event several scholars of games and videogames discuss these questions from a variety of perspectives. What different game and videogame ontologies exist and could exist, and why they are important...... for game and videogame research? The round table is designed to promote ludo-ontological dialogue in order to make these questions visible and debated. A series of short presentations (approximately 10 minutes each) will be followed by an intense debate through freeform dialogue. After the industrial...

  9. The effect of functional compensation among duplicate genes can ...

    Indian Academy of Sciences (India)

    Gene duplicates have the inherent property of initially being functionally redundant. This means that they can compensate for the effect of deleterious variation occurring at one or more sister sites. Here, I present data bearing on evolutionary theory that illustrates the manner in which any functional adaptation in duplicate ...

  10. The effect of functional compensation among duplicate genes can ...

    Indian Academy of Sciences (India)

    Abstract. Gene duplicates have the inherent property of initially being functionally redundant. This means that they can compensate for the effect of deleterious variation occurring at one or more sister sites. Here, I present data bearing on evolutionary theory that illustrates the manner in which any functional adaptation in ...

  11. Finding the best visualization of an ontology

    DEFF Research Database (Denmark)

    Fabritius, Christina Valentin; Madsen, Nadia Lyngaa; Clausen, Jens

    2004-01-01

    An ontology is a classification model for a given domain. In information retrieval ontologies are used to perform broad searches. An ontology can be visualized as nodes and edges. Each node represents an element and each edge a relation between a parent and a child element. Working with an ontology...... should be feasible for on-line processing and what-if analysis of ontologies....

  12. Finding the best visualization of an ontology

    DEFF Research Database (Denmark)

    Fabritius, Christina; Madsen, Nadia; Clausen, Jens

    2006-01-01

    An ontology is a classification model for a given domain.In information retrieval ontologies are used to perform broad searches.An ontology can be visualized as nodes and edges. Each node represents an element and each edge a relation between a parent and a child element. Working with an ontology...... should be feasible for on-line processing and what-if analysis of ontologies....

  13. Enhanced Search Method for Ontology Classification

    OpenAIRE

    Je Min Kim; Soon Hyen Kwon; Young Tack Park

    2012-01-01

    The web ontology language (OWL) has become a W3C recommendation to publish and share ontologies on the semantic web. In order to infer implicit information (classification, satisfiability and realization) of OWL ontology, a number of OWL reasoners have been introduced. Ontology classification is to compute a partial ordering or hierarchy of named concepts in the ontology using the subsumption testing. Most of the reasoners use both top-down and bottom-up searches using subsumption testing for...

  14. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  15. Molecular and Functional Characterization of Broccoli EMBRYONIC FLOWER 2 Genes

    Science.gov (United States)

    Chen, Long-Fang O.; Lin, Chun-Hung; Lai, Ying-Mi; Huang, Jia-Yuan; Sung, Zinmay Renee

    2012-01-01

    Polycomb group (PcG) proteins regulate major developmental processes in Arabidopsis. EMBRYONIC FLOWER 2 (EMF2), the VEFS domain-containing PcG gene, regulates diverse genetic pathways and is required for vegetative development and plant survival. Despite widespread EMF2-like sequences in plants, little is known about their function other than in Arabidopsis and rice. To study the role of EMF2 in broccoli (Brassica oleracea var. italica cv. Elegance) development, we identified two broccoli EMF2 (BoEMF2) genes with sequence homology to and a similar gene expression pattern to that in Arabidopsis (AtEMF2). Reducing their expression in broccoli resulted in aberrant phenotypes and gene expression patterns. BoEMF2 regulates genes involved in diverse developmental and stress programs similar to AtEMF2 in Arabidopsis. However, BoEMF2 differs from AtEMF2 in the regulation of flower organ identity, cell proliferation and elongation, and death-related genes, which may explain the distinct phenotypes. The expression of BoEMF2.1 in the Arabidopsis emf2 mutant (Rescued emf2) partially rescued the mutant phenotype and restored the gene expression pattern to that of the wild type. Many EMF2-mediated molecular and developmental functions are conserved in broccoli and Arabidopsis. Furthermore, the restored gene expression pattern in Rescued emf2 provides insights into the molecular basis of PcG-mediated growth and development. PMID:22537758

  16. Automated discovery of functional generality of human gene expression programs.

    Directory of Open Access Journals (Sweden)

    Georg K Gerber

    2007-08-01

    Full Text Available An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-kappaB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal

  17. Engineering Ethics: Ontology and Politics

    OpenAIRE

    Conlon, Eddie

    2015-01-01

    Ontology...acts as both gatekeeper and bouncer for methodology” (Archer 1995: 22). This exploratory paper, through a focus on the relationship between structure and agency, examines the underlying social ontologies informing the teaching, and researching of the teaching, of engineering ethics. It argues that current approaches are deficient and that Critical Realism can provide the basis for a more robust and inclusive research agenda for understanding engineering practice and the teaching ...

  18. Musical Ontology: Critical, not Metaphysical

    Directory of Open Access Journals (Sweden)

    Jonathan A. Neufeld

    2014-01-01

    Full Text Available The ontology of musical works often sets the boundaries within which evaluation of musical works and performances takes place. Questions of ontology are therefore often taken to be prior to and apart from the evaluative questions considered by either performers as they present works to audiences or an audience’s critical reflection on a performance. In this paper I argue that, while the ontology of musical works may well set the boundaries of legitimate evaluation, ontological questions should not be considered as prior to or apart from critical evaluation. Rather, ontological claims are a type of critical evaluation made within musical practices. I argue that philosophers of music might learn from the debate in political philosophy about the difficulty of setting the limits of public reason in a way that remains open to a plurality of legitimate evaluative perspectives. Just as pre-political or metaphysical identification of the boundaries of public reason fail to accommodate the fact of pluralism in contemporary democratic politics, so too does a metaphysical identification of the boundaries of legitimate evaluation of musical works and performances fail to accommodate the fact of pluralism in contemporary musical practices. I apply John Rawls’s formulation of political liberalism, arguing that musical ontology should be critical, not metaphysical.

  19. Polyploidization altered gene functions in cotton (Gossypium spp.).

    Science.gov (United States)

    Xu, Zhanyou; Yu, John Z; Cho, Jaemin; Yu, Jing; Kohel, Russell J; Percy, Richard G

    2010-12-16

    Cotton (Gossypium spp.) is an important crop plant that is widely grown to produce both natural textile fibers and cottonseed oil. Cotton fibers, the economically more important product of the cotton plant, are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that large numbers of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across At and Dt subgenomes of tetraploid AD cottons. In the present study, the organization and evolution of the fiber development genes were investigated through the construction of an integrated genetic and physical map of fiber development genes whose functions have been verified and confirmed. A total of 535 cotton fiber development genes, including 103 fiber transcription factors, 259 fiber development genes, and 173 SSR-contained fiber ESTs, were analyzed at the subgenome level. A total of 499 fiber related contigs were selected and assembled. Together these contigs covered about 151 Mb in physical length, or about 6.7% of the tetraploid cotton genome. Among the 499 contigs, 397 were anchored onto individual chromosomes. Results from our studies on the distribution patterns of the fiber development genes and transcription factors between the At and Dt subgenomes showed that more transcription factors were from Dt subgenome than At, whereas more fiber development genes were from At subgenome than Dt. Combining our mapping results with previous reports that more fiber QTLs were mapped in Dt subgenome than At subgenome, the results suggested a new functional hypothesis for tetraploid cotton. After the merging of the two diploid Gossypium genomes, the At subgenome has provided most of the genes for fiber development, because it continues to function similar to its fiber producing diploid A genome ancestor. On the other hand, the Dt subgenome, with its non-fiber producing D genome ancestor

  20. An ontology-driven, diagnostic modeling system.

    Science.gov (United States)

    Haug, Peter J; Ferraro, Jeffrey P; Holmen, John; Wu, Xinzi; Mynam, Kumar; Ebert, Matthew; Dean, Nathan; Jones, Jason

    2013-06-01

    To present a system that uses knowledge stored in a medical ontology to automate the development of diagnostic decision support systems. To illustrate its function through an example focused on the development of a tool for diagnosing pneumonia. We developed a system that automates the creation of diagnostic decision-support applications. It relies on a medical ontology to direct the acquisition of clinic data from a clinical data warehouse and uses an automated analytic system to apply a sequence of machine learning algorithms that create applications for diagnostic screening. We refer to this system as the ontology-driven diagnostic modeling system (ODMS). We tested this system using samples of patient data collected in Salt Lake City emergency rooms and stored in Intermountain Healthcare's enterprise data warehouse. The system was used in the preliminary development steps of a tool to identify patients with pneumonia in the emergency department. This tool was compared with a manually created diagnostic tool derived from a curated dataset. The manually created tool is currently in clinical use. The automatically created tool had an area under the receiver operating characteristic curve of 0.920 (95% CI 0.916 to 0.924), compared with 0.944 (95% CI 0.942 to 0.947) for the manually created tool. Initial testing of the ODMS demonstrates promising accuracy for the highly automated results and illustrates the route to model improvement. The use of medical knowledge, embedded in ontologies, to direct the initial development of diagnostic computing systems appears feasible.

  1. Ontologies and Formation Spaces for Conceptual ReDesign of Systems

    Directory of Open Access Journals (Sweden)

    J. Bíla

    2005-01-01

    Full Text Available This paper discusses ontologies, methods for developing them and languages for representing them. A special ontology for computational support of the Conceptual ReDesign Process (CRDP is introduced with a simple illustrative example of an application. The ontology denoted as Global context (GLB combines features of general semantic networks and features of UML language. The ontology is task-oriented and domain-oriented, and contains three basic strata – GLBExpl(stratum of Explanation, GLBFAct (stratum of Fields of Activities and GLBEnv (stratum of Environment, with their sub-strata. The ontology has been developed to represent functions of systems and their components in CRDP. The main difference between this ontology and ontologies which have been developed to identify functions (the semantic details in those ontologies must be as deep as possible is in the style of the description of the functions. In the proposed ontology, Formation Spaces were used as lower semantic categories the semantic deepness of which is variable and depends on the actual solution approach of a specialised Conceptual Designer.

  2. Function and Diversification of MADS-Box Genes in Rice

    OpenAIRE

    Takahiro Yamaguchi; Hiro-Yuki Hirano

    2006-01-01

    MADS-box genes play critical roles in a number of developmental processes in flowering plants, such as specification of floral organ identity, control of flowering time, and regulation of fruit development. Because of their crucial functions in flower development, diversification of the MADS-box gene family has been suggested to be a major factor responsible for floral diversity during radiation of the flowering plants. Inflorescences and flowers in the grass species have unique structures th...

  3. Stably Expressed Genes Involved in Basic Cellular Functions.

    Directory of Open Access Journals (Sweden)

    Kejian Wang

    Full Text Available Stably Expressed Genes (SEGs whose expression varies within a narrow range may be involved in core cellular processes necessary for basic functions. To identify such genes, we re-analyzed existing RNA-Seq gene expression profiles across 11 organs at 4 developmental stages (from immature to old age in both sexes of F344 rats (n = 4/group; 320 samples. Expression changes (calculated as the maximum expression / minimum expression for each gene of >19000 genes across organs, ages, and sexes ranged from 2.35 to >109-fold, with a median of 165-fold. The expression of 278 SEGs was found to vary ≤4-fold and these genes were significantly involved in protein catabolism (proteasome and ubiquitination, RNA transport, protein processing, and the spliceosome. Such stability of expression was further validated in human samples where the expression variability of the homologous human SEGs was significantly lower than that of other genes in the human genome. It was also found that the homologous human SEGs were generally less subject to non-synonymous mutation than other genes, as would be expected of stably expressed genes. We also found that knockout of SEG homologs in mouse models was more likely to cause complete preweaning lethality than non-SEG homologs, corroborating the fundamental roles played by SEGs in biological development. Such stably expressed genes and pathways across life-stages suggest that tight control of these processes is important in basic cellular functions and that perturbation by endogenous (e.g., genetics or exogenous agents (e.g., drugs, environmental factors may cause serious adverse effects.

  4. Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data

    Directory of Open Access Journals (Sweden)

    Li Min

    2012-05-01

    Full Text Available Abstract Background Identification of protein complexes and functional modules from protein-protein interaction (PPI networks is crucial to understanding the principles of cellular organization and predicting protein functions. In the past few years, many computational methods have been proposed. However, most of them considered the PPI networks as static graphs and overlooked the dynamics inherent within these networks. Moreover, few of them can distinguish between protein complexes and functional modules. Results In this paper, a new framework is proposed to distinguish between protein complexes and functional modules by integrating gene expression data into protein-protein interaction (PPI data. A series of time-sequenced subnetworks (TSNs is constructed according to the time that the interactions were activated. The algorithm TSN-PCD was then developed to identify protein complexes from these TSNs. As protein complexes are significantly related to functional modules, a new algorithm DFM-CIN is proposed to discover functional modules based on the identified complexes. The experimental results show that the combination of temporal gene expression data with PPI data contributes to identifying protein complexes more precisely. A quantitative comparison based on f-measure reveals that our algorithm TSN-PCD outperforms the other previous protein complex discovery algorithms. Furthermore, we evaluate the identified functional modules by using “Biological Process” annotated in GO (Gene Ontology. The validation shows that the identified functional modules are statistically significant in terms of “Biological Process”. More importantly, the relationship between protein complexes and functional modules are studied. Conclusions The proposed framework based on the integration of PPI data and gene expression data makes it possible to identify protein complexes and functional modules more effectively. Moveover, the proposed new framework and

  5. PHYLOGENOMICS - GUIDED VALIDATION OF FUNCTION FOR CONSERVED UNKNOWN GENES

    Energy Technology Data Exchange (ETDEWEB)

    V, DE CRECY-LAGARD; D, HANSON A

    2012-01-03

    Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown function, or wrongly or vaguely annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We accordingly set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction is integrative, coupling the extensive post-genomic resources available for plants with comparative genomics based on hundreds of microbial genomes, and functional genomic datasets from model microorganisms. The early phase is computer-assisted; later phases incorporate intellectual input from expert plant and microbial biochemists. The approach thus bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is much more powerful than purely computational approaches to identifying gene-function associations. Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) are conserved between plants and prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology .. independent characteristics associated in the SEED database with the prokaryotic members of each family, specifically gene clustering and phyletic spread, as well as availability of functional genomics data, and publications that could link candidate families to general metabolic areas, or to specific functions. In-depth comparative genomic analysis was then performed for about 500 top candidate families, which connected ~55 of them to general areas of metabolism and led to specific functional predictions for a subset of ~25 more. Twenty predicted functions were experimentally tested in at least one prokaryotic organism

  6. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  7. The Drosophila melanogaster methuselah gene: a novel gene with ancient functions.

    Directory of Open Access Journals (Sweden)

    Ana Rita Araújo

    Full Text Available The Drosophila melanogaster G protein-coupled receptor gene, methuselah (mth, has been described as a novel gene that is less than 10 million years old. Nevertheless, it shows a highly specific expression pattern in embryos, larvae, and adults, and has been implicated in larval development, stress resistance, and in the setting of adult lifespan, among others. Although mth belongs to a gene subfamily with 16 members in D. melanogaster, there is no evidence for functional redundancy in this subfamily. Therefore, it is surprising that a novel gene influences so many traits. Here, we explore the alternative hypothesis that mth is an old gene. Under this hypothesis, in species distantly related to D. melanogaster, there should be a gene with features similar to those of mth. By performing detailed phylogenetic, synteny, protein structure, and gene expression analyses we show that the D. virilis GJ12490 gene is the orthologous of mth in species distantly related to D. melanogaster. We also show that, in D. americana (a species of the virilis group of Drosophila, a common amino acid polymorphism at the GJ12490 orthologous gene is significantly associated with developmental time, size, and lifespan differences. Our results imply that GJ12490 orthologous genes are candidates for developmental time and lifespan differences in Drosophila in general.

  8. THE SPAСE OF EDUCATIONAL RESEARCH ACTIVITY OF STUDENTS BASED ON ASSOCIATION ONTOLOGICALLY INTERFACE AND GIS-TECHNOLOGIES

    Directory of Open Access Journals (Sweden)

    Maryna A. Popova

    2014-04-01

    Full Text Available This article discusses the ontologies and ontological computer interface use as an effective means of integration, aggregation and visualization of distributed information resources and systems through the use of semantic properties to create and use of information space in education and research activities of students. The approach of combining ontologies features and geospatial analytical tools functions of GIS is described. The technique of ontological interface applying by creating a thematic map layers in GIS environment based on thematic ontology is presented.

  9. Use of the CIM Ontology

    Energy Technology Data Exchange (ETDEWEB)

    Neumann, Scott; Britton, Jay; Devos, Arnold N.; Widergren, Steven E.

    2006-02-08

    There are many uses for the Common Information Model (CIM), an ontology that is being standardized through Technical Committee 57 of the International Electrotechnical Commission (IEC TC57). The most common uses to date have included application modeling, information exchanges, information management and systems integration. As one should expect, there are many issues that become apparent when the CIM ontology is applied to any one use. Some of these issues are shortcomings within the current draft of the CIM, and others are a consequence of the different ways in which the CIM can be applied using different technologies. As the CIM ontology will and should evolve, there are several dangers that need to be recognized. One is overall consistency and impact upon applications when extending the CIM for a specific need. Another is that a tight coupling of the CIM to specific technologies could limit the value of the CIM in the longer term as an ontology, which becomes a larger issue over time as new technologies emerge. The integration of systems is one specific area of interest for application of the CIM ontology. This is an area dominated by the use of XML for the definition of messages. While this is certainly true when using Enterprise Application Integration (EAI) products, it is even more true with the movement towards the use of Web Services (WS), Service-Oriented Architectures (SOA) and Enterprise Service Buses (ESB) for integration. This general IT industry trend is consistent with trends seen within the IEC TC57 scope of power system management and associated information exchange. The challenge for TC57 is how to best leverage the CIM ontology using the various XML technologies and standards for integration. This paper will provide examples of how the CIM ontology is used and describe some specific issues that should be addressed within the CIM in order to increase its usefulness as an ontology. It will also describe some of the issues and challenges that will

  10. The function and evolution of Wnt genes in arthropods.

    Science.gov (United States)

    Murat, Sophie; Hopfen, Corinna; McGregor, Alistair P

    2010-11-01

    Wnt signalling is required for a wide range of developmental processes, from cleavage to patterning and cell migration. There are 13 subfamilies of Wnt ligand genes and this diverse repertoire appeared very early in metazoan evolution. In this review, we first summarise the known Wnt gene repertoire in various arthropods. Insects appear to have lost several Wnt subfamilies, either generally, such as Wnt3, or in lineage specific patterns, for example, the loss of Wnt7 in Anopheles. In Drosophila and Acyrthosiphon, only seven and six Wnt subfamilies are represented, respectively; however, the finding of nine Wnt genes in Tribolium suggests that arthropods had a larger repertoire ancestrally. We then discuss what is currently known about the expression and developmental function of Wnt ligands in Drosophila and other insects in comparison to other arthropods, such as the spiders Achaearanea and Cupiennius. We conclude that studies of Wnt genes have given us much insight into the developmental roles of some of these ligands. However, given the frequent loss of Wnt genes in insects and the derived development of Drosophila, further studies of these important genes are required in a broader range of arthropods to fully understand their developmental function and evolution. Copyright © 2010 Elsevier Ltd. All rights reserved.

  11. Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

    Science.gov (United States)

    Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

    2017-08-01

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  12. Comparing categories among geographic ontologies

    Science.gov (United States)

    Kavouras, Marinos; Kokla, Margarita; Tomai, Eleni

    2005-03-01

    Numerous attempts have been made to generate semantic "mappings" between different ontologies, or create aligned/integrated ones. An essential step towards their success is the ability to compare the categories involved. This paper introduces a systematic methodology for comparing categories met in geographic ontologies. The methodology explores/extracts semantic information provided by categories' definitions. The first step towards this goal is the recognition of syntactic and lexical patterns in definitions, which help to identify (a) semantic properties such as purpose, location, cover, and (b) semantic relations such as hypernym, part of, has-parts, etc. At the second step, a similarity measure among categories is applied, in order to explore how (the) extracted properties and relations interrelate. This framework enables us to (a) better understand the impact of context in cross-ontology "mappings", (b) evaluate the "quality" of definitions as to whether they respect mere ontological aspects (such as unambiguous taxonomies), and (c) deal more effectively with the problem of semantic translation among geographic ontologies.

  13. The ontology of Gero's FBS  model of designing

    DEFF Research Database (Denmark)

    Galle, Per

    2009-01-01

    Recent work by Vermaas and Dorst has led to constructive criticism and conceptual clarification of Gero’s FBS (Function-Behaviour-Structure) model of designing. In this paper Vermaas’ and Dorst’s version of the model is scrutinized, with an emphasis on its temporal aspect and ontological implicat......Recent work by Vermaas and Dorst has led to constructive criticism and conceptual clarification of Gero’s FBS (Function-Behaviour-Structure) model of designing. In this paper Vermaas’ and Dorst’s version of the model is scrutinized, with an emphasis on its temporal aspect and ontological...

  14. Using riboswitches to regulate gene expression and define gene function in mycobacteria.

    Science.gov (United States)

    Van Vlack, Erik R; Seeliger, Jessica C

    2015-01-01

    Mycobacteria include both environmental species and many pathogenic species such as Mycobacterium tuberculosis, an intracellular pathogen that is the causative agent of tuberculosis in humans. Inducible gene expression is a powerful tool for examining gene function and essentiality, both in in vitro culture and in host cell infections. The theophylline-inducible artificial riboswitch has recently emerged as an alternative to protein repressor-based systems. The riboswitch is translationally regulated and is combined with a mycobacterial promoter that provides transcriptional control. We here provide methods used by our laboratory to characterize the riboswitch response to theophylline in reporter strains, recombinant organisms containing riboswitch-regulated endogenous genes, and in host cell infections. These protocols should facilitate the application of both existing and novel artificial riboswitches to the exploration of gene function in mycobacteria. © 2015 Elsevier Inc. All rights reserved.

  15. Functional consequences of integrin gene mutations in mice

    DEFF Research Database (Denmark)

    Bouvard, D; Brakebusch, C; Gustafsson, E

    2001-01-01

    Integrins are cell-surface receptors responsible for cell attachment to extracellular matrices and to other cells. The application of mouse genetics has significantly increased our understanding of integrin function in vivo. In this review, we summarize the phenotypes of mice carrying mutant inte...... integrin genes and compare them with phenotypes of mice lacking the integrin ligands....

  16. Functional analysis helps importance of unclassified mismatch repair genes

    NARCIS (Netherlands)

    Ou, Jianghua; Niessen, Renee C.; L tzen, Anne; Sijmons, Rolf H.; Kleibeuker, Jan. H.; De Wind, Niels; Rasmussen, Lene Juel; Hofstra, Robert M. W.

    2007-01-01

    Hereditary nonpolyposis colorectal cancer (HNPCC) or Lynch syndrome is caused by DNA variations in the DNA mismatch repair (MMR) genes MSH2, MLH1, MSH6, and PMS2. Many of the mutations identified result in premature termination of translation and thus in loss-of-function of the encoded mutated

  17. ( Euphausia superba ) transcriptome to identify function genes and ...

    Indian Academy of Sciences (India)

    MA

    database, Superba SE, was described by Hunt et al (2017) and KrillDB was developed (Sales et al., 2017) for purpose of free accession to annotation information for users. However, the availability of molecular data concerning function genes, microsatellites, and single nucleotide polymorphism (SNP) in E. superba is still ...

  18. Analysis of gene functions in Maize chlorotic mottle virus.

    Science.gov (United States)

    Scheets, Kay

    2016-08-15

    Gene functions of strains of Maize chlorotic mottle virus, which comprises the monotypic genus Machlomovirus, have not been previously identified. In this study mutagenesis of the seven genes encoded in maize chlorotic mottle virus (MCMV) showed that the genes with positional and sequence similarity to their homologs in viruses of related tombusvirid genera had similar functions. p50 and its readthrough protein p111 are the only proteins required for replication in maize protoplasts, and they function at a low level in trans. Two movement proteins, p7a and p7b, and coat protein, encoded on subgenomic RNA1, are required for cell-to-cell movement in maize, and p7a and p7b function in trans. A unique protein, p31, expressed as a readthrough extension of p7a, is required for efficient systemic infection. The 5' proximal MCMV gene encodes a unique 32kDa protein that is not required for replication or movement. Transcripts lacking p32 expression accumulate to about 1/3 the level of wild type transcripts in protoplasts and produce delayed, mild infections in maize plants. Additional studies on p32, p31 and the unique amino-terminal region of p50 are needed to further characterize the life cycle of this unique tombusvirid. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Gene Discovery and Functional Analyses in the Model Plant Arabidopsis

    DEFF Research Database (Denmark)

    Feng, Cai-ping; Mundy, J.

    2006-01-01

    The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions, TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also...

  20. Gene-environment interaction and male reproductive function

    DEFF Research Database (Denmark)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L

    2010-01-01

    that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism...

  1. Bone marrow transplantations to study gene function in hematopoietic cells

    NARCIS (Netherlands)

    de Winther, Menno P. J.; Heeringa, Peter

    2011-01-01

    Immune cells are derived from hematopoietic stem cells in the bone marrow. Experimental replacement of bone marrow offers the unique possibility to replace immune cells, to study gene function in mouse models of disease. Over the past decades, this technique has been used extensively to study, for

  2. Complex Topographic Feature Ontology Patterns

    Science.gov (United States)

    Varanka, Dalia E.; Jerris, Thomas J.

    2015-01-01

    Semantic ontologies are examined as effective data models for the representation of complex topographic feature types. Complex feature types are viewed as integrated relations between basic features for a basic purpose. In the context of topographic science, such component assemblages are supported by resource systems and found on the local landscape. Ontologies are organized within six thematic modules of a domain ontology called Topography that includes within its sphere basic feature types, resource systems, and landscape types. Context is constructed not only as a spatial and temporal setting, but a setting also based on environmental processes. Types of spatial relations that exist between components include location, generative processes, and description. An example is offered in a complex feature type ‘mine.’ The identification and extraction of complex feature types are an area for future research.

  3. Terminological Ontologies for Risk and Vulnerability Analysis

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2014-01-01

    Risk and vulnerability analyses are an important preliminary stage in civil contingency planning. The Danish Emergency Management Agency has developed a generic model and a set of tools that may be used in the preparedness planning, i.e. for identifying and describing society’s critical functions......, for formulating threat scenarios and for assessing consequences. Terminological ontologies, which are systems of domain specific concepts comprising concept relations and characteristics, are useful, both when describing the central concepts of risk and vulnerability analysis (meta concepts), and for further...

  4. Protein-protein networks construction and their relevance measurement based on multi-epitope-ligand-kartographie and gene ontology data of T-cell surface proteins for polymyositis.

    Science.gov (United States)

    Li, Fang-Zhen; Gao, Feng

    2012-08-01

    Polymyositis is an inflammatory myopathy characterized by muscle invasion of T-cells penetrating the basal lamina and displacing the plasma membrane of normal muscle fibers. In order to understand the different adhesive mechanisms at the T-cell surface, Schubert randomly selected 19 proteins expressed at the T-cell surface and studied them using MELK technique [4], among which 15 proteins are picked up for further study by us. Two types of functional similarity networks are constructed for these proteins. The first type is MELK similarity network, which is constructed based on their MELK data by using the McNemar's test [24]. The second type is GO similarity network, which is constructed based on their GO annotation data by using the RSS method to measuring functional similarity. Then the subset surprisology theory is employed to measure the degree of similarity between two networks. Our computing results show that these two types of networks are high related. This conclusion added new values on MELK technique and expanded its applications greatly.

  5. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Mundy, J.; Willenbrock, Hanni

    2007-01-01

    for deriving 'Functional Association(s) by Response Overlap' (FARO) between microarray gene expression studies. The transcriptional response is defined by the set of differentially expressed genes independent from the magnitude or direction of the change. This approach overcomes the limited comparability...... to confirm and further delineate the functions of Arabidopsis MAP kinase 4 in disease and stress responses. Furthermore, we find that a large, well-defined set of genes responds in opposing directions to different stress conditions and predict the effects of different stress combinations. This demonstrates...

  6. Multimedia ontology representation and applications

    CERN Document Server

    Chaudhury, Santanu; Ghosh, Hiranmay

    2015-01-01

    The result of more than 15 years of collective research, Multimedia Ontology: Representation and Applications provides a theoretical foundation for understanding the nature of media data and the principles involved in its interpretation. The book presents a unified approach to recent advances in multimedia and explains how a multimedia ontology can fill the semantic gap between concepts and the media world. It relays real-life examples of implementations in different domains to illustrate how this gap can be filled.The book contains information that helps with building semantic, content-based

  7. Root justifications for ontology repair

    CSIR Research Space (South Africa)

    Moodley, K

    2011-08-01

    Full Text Available stream_source_info Moodley_2011.pdf.txt stream_content_type text/plain stream_size 32328 Content-Encoding ISO-8859-1 stream_name Moodley_2011.pdf.txt Content-Type text/plain; charset=ISO-8859-1 Root Justi cations... the ontology, based on the no- tion of root justi cations [8, 9]. In Section 5, we discuss the implementation of a Prot eg e3 plugin which demonstrates our approach to ontology repair. In this section we also discuss some experimental results comparing...

  8. Platonic wholes and quantum ontology

    CERN Document Server

    Woszczek, Marek

    2015-01-01

    The subject of the book is a reconsideration of the internalistic model of composition of the Platonic type, more radical than traditional, post-Aristotelian externalistic compositionism, and its application in the field of the ontology of quantum theory. At the centre of quantum ontology is nonseparability. Quantum wholes are atemporal wholes governed by internalistic logic and they are primitive, global physical entities, requiring an extreme relativization of the fundamental notions of mechanics. That ensures quantum theory to be fully consistent with the relativistic causal structure, with

  9. Elucidating gene function and function evolution through comparison of co-expression networks in plants

    Directory of Open Access Journals (Sweden)

    Marek eMutwil

    2014-08-01

    Full Text Available The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23. In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We show that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that, in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution.

  10. Technique for designing a domain ontology

    OpenAIRE

    Palagin, A. V.; Petrenko, N. G.; Malakhov, K. S.

    2018-01-01

    The article describes the technique for designing a domain ontology, shows the flowchart of algorithm design and example of constructing a fragment of the ontology of the subject area of Computer Science is considered.

  11. Ontologies, Knowledge Bases and Knowledge Management

    National Research Council Canada - National Science Library

    Chalupsky, Hans

    2002-01-01

    ...) an application called Strategy Development Assistant (SDA) that uses that ontology. The JFACC ontology served as a basis for knowledge sharing among several applications in the domain of air campaign planning...

  12. A Bayesian Network Approach to Ontology Mapping

    National Research Council Canada - National Science Library

    Pan, Rong; Ding, Zhongli; Yu, Yang; Peng, Yun

    2005-01-01

    .... In this approach, the source and target ontologies are first translated into Bayesian networks (BN); the concept mapping between the two ontologies are treated as evidential reasoning between the two translated BNs...

  13. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    Science.gov (United States)

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Genes with high penetrance for syndromic and non-syndromic autism typically function within the nucleus and regulate gene expression.

    Science.gov (United States)

    Casanova, Emily L; Sharp, Julia L; Chakraborty, Hrishikesh; Sumi, Nahid Sultana; Casanova, Manuel F

    2016-01-01

    Intellectual disability (ID), autism, and epilepsy share frequent yet variable comorbidities with one another. In order to better understand potential genetic divergence underlying this variable risk, we studied genes responsible for monogenic IDs, grouped according to their autism and epilepsy comorbidities. Utilizing 465 different forms of ID with known molecular origins, we accessed available genetic databases in conjunction with gene ontology (GO) to determine whether the genetics underlying ID diverge according to its comorbidities with autism and epilepsy and if genes highly penetrant for autism or epilepsy share distinctive features that set them apart from genes that confer comparatively variable or no apparent risk. The genetics of ID with autism are relatively enriched in terms associated with nervous system-specific processes and structural morphogenesis. In contrast, we find that ID with highly comorbid epilepsy (HCE) is modestly associated with lipid metabolic processes while ID without autism or epilepsy comorbidity (ID only) is enriched at the Golgi membrane. Highly comorbid autism (HCA) genes, on the other hand, are strongly enriched within the nucleus, are typically involved in regulation of gene expression, and, along with IDs with more variable autism, share strong ties with a core protein-protein interaction (PPI) network integral to basic patterning of the CNS. According to GO terminology, autism-related gene products are integral to neural development. While it is difficult to draw firm conclusions regarding IDs unassociated with autism, it is clear that the majority of HCA genes are tightly linked with general dysregulation of gene expression, suggesting that disturbances to the chronology of neural maturation and patterning may be key in conferring susceptibility to autism spectrum conditions.

  15. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  16. Aligning ontologies and integrating textual evidence for pathway analysis of microarray data

    Energy Technology Data Exchange (ETDEWEB)

    Gopalan, Banu; Posse, Christian; Sanfilippo, Antonio P.; Stenzel-Poore, Mary; Stevens, S.L.; Castano, Jose; Beagley, Nathaniel; Riensche, Roderick M.; Baddeley, Bob; Simon, R.P.; Pustejovsky, James

    2006-10-08

    Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

  17. Assessment of community-submitted ontology annotations from a novel database-journal partnership.

    Science.gov (United States)

    Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

    2012-01-01

    As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.

  18. Loss of genes implicated in gastric function during platypus evolution.

    Science.gov (United States)

    Ordoñez, Gonzalo R; Hillier, Ladeana W; Warren, Wesley C; Grützner, Frank; López-Otín, Carlos; Puente, Xose S

    2008-01-01

    The duck-billed platypus (Ornithorhynchus anatinus) belongs to the mammalian subclass Prototheria, which diverged from the Theria line early in mammalian evolution. The platypus genome sequence provides a unique opportunity to illuminate some aspects of the biology and evolution of these animals. We show that several genes implicated in food digestion in the stomach have been deleted or inactivated in platypus. Comparison with other vertebrate genomes revealed that the main genes implicated in the formation and activity of gastric juice have been lost in platypus. These include the aspartyl proteases pepsinogen A and pepsinogens B/C, the hydrochloric acid secretion stimulatory hormone gastrin, and the alpha subunit of the gastric H+/K+-ATPase. Other genes implicated in gastric functions, such as the beta subunit of the H+/K+-ATPase and the aspartyl protease cathepsin E, have been inactivated because of the acquisition of loss-of-function mutations. All of these genes are highly conserved in vertebrates, reflecting a unique pattern of evolution in the platypus genome not previously seen in other mammalian genomes. The observed loss of genes involved in gastric functions might be responsible for the anatomical and physiological differences in gastrointestinal tract between monotremes and other vertebrates, including small size, lack of glands, and high pH of the monotreme stomach. This study contributes to a better understanding of the mechanisms that underlie the evolution of the platypus genome, might extend the less-is-more evolutionary model to monotremes, and provides novel insights into the importance of gene loss events during mammalian evolution.

  19. Ontology Learning - Suggesting Associations from Text

    OpenAIRE

    Kvarv, Gøran Sveia

    2007-01-01

    In many applications, large-scale ontologies have to be constructed and maintained. A manual construction of an ontology is a time consuming and resource demanding process, often involving some domain experts. It would therefore be beneficial to support this process with tools that automates the construction of an ontology. This master thesis has examined the use of association rules for suggesting associations between words in text. In ontology learning, concepts are often extracted from d...

  20. Drosha regulates gene expression independently of RNA cleavage function

    DEFF Research Database (Denmark)

    Gromak, Natalia; Dienstbier, Martin; Macias, Sara

    2013-01-01

    Drosha is the main RNase III-like enzyme involved in the process of microRNA (miRNA) biogenesis in the nucleus. Using whole-genome ChIP-on-chip analysis, we demonstrate that, in addition to miRNA sequences, Drosha specifically binds promoter-proximal regions of many human genes in a transcription......-terminal protein-interaction domain, which associates with the RNA-binding protein CBP80 and RNA Polymerase II. Consequently, we uncover a previously unsuspected RNA cleavage-independent function of Drosha in the regulation of human gene expression....

  1. Transcriptome analysis during somatic embryogenesis of the tropical monocot Elaeis guineensis: evidence for conserved gene functions in early development.

    Science.gov (United States)

    Lin, Hsiang-Chun; Morcillo, Fabienne; Dussert, Stéphane; Tranchant-Dubreuil, Christine; Tregear, James W; Tranbarger, Timothy John

    2009-05-01

    With the aim of understanding the molecular mechanisms underlying somatic embryogenesis (SE) in oil palm, we examined transcriptome changes that occur when embryogenic suspension cells are initiated to develop somatic embryos. Two reciprocal suppression subtractive hybridization (SSH) libraries were constructed from oil palm embryogenic cell suspensions: one in which embryo development was blocked by the presence of the synthetic auxin analogue 2,4-dichlorophenoxyacetic acid (2,4-D: ) in the medium (proliferation library); and another in which cells were stimulated to form embryos by the removal of 2,4-D: from the medium (initiation library). A total of 1867 Expressed Sequence Tags (ESTs) consisting of 1567 potential unigenes were assembled from the two libraries. Functional annotation indicated that 928 of the ESTs correspond to proteins that have either no similarity to sequences in public databases or are of unknown function. Gene Ontology (GO) terms assigned to the two EST populations give clues to the underlying molecular functions, biological processes and cellular components involved in the initiation of embryo development. Macroarrays were used for transcript profiling the ESTs during SE. Hierarchical cluster analysis of differential transcript accumulation revealed 4 distinct profiles containing a total of 192 statistically significant developmentally regulated transcripts. Similarities and differences between the global results obtained with in vitro systems from dicots, monocots and gymnosperms will be discussed.

  2. Aspects of ontology visualization and integration

    NARCIS (Netherlands)

    Dmitrieva, Joelia Borisovna

    2011-01-01

    In this thesis we will describe and discuss methodologies for ontology visualization and integration. Two visualization methods will be elaborated. In one method the ontology is visualized with the node-link technique, and with the other method the ontology is visualized with the containment

  3. development of ontological knowledge representation: learning ...

    African Journals Online (AJOL)

    Preferred Customer

    This group of authors describes use of ontologies for knowledge organization in a given domain. In the context of computer science, ontologies have been applied in the field of artificial intelligence in order to facilitate knowledge sharing and reuse of acquired knowledge (15). Soon, ontologies have gained great popularity.

  4. Sample evaluation of ontology-matching systems

    NARCIS (Netherlands)

    Hage, W.R. van; Isaac, A.; Aleksovski, Z.

    2007-01-01

    Ontology matching exists to solve practical problems. Hence, methodologies to find and evaluate solutions for ontology matching should be centered on practical problems. In this paper we propose two statistically-founded evaluation techniques to assess ontology-matching performance that are based on

  5. Family expansion and gene rearrangements contributed to the functional specialization of PRDM genes in vertebrates

    Directory of Open Access Journals (Sweden)

    Alcalay Myriam

    2007-10-01

    Full Text Available Abstract Background Progressive diversification of paralogs after gene expansion is essential to increase their functional specialization. However, mode and tempo of this divergence remain mostly unclear. Here we report the comparative analysis of PRDM genes, a family of putative transcriptional regulators involved in human tumorigenesis. Results Our analysis assessed that the PRDM genes originated in metazoans, expanded in vertebrates and further duplicated in primates. We experimentally showed that fast-evolving paralogs are poorly expressed, and that the most recent duplicates, such as primate-specific PRDM7, acquire tissue-specificity. PRDM7 underwent major structural rearrangements that decreased the number of encoded Zn-Fingers and modified gene splicing. Through internal duplication and activation of a non-canonical splice site (GC-AG, PRDM7 can acquire a novel intron. We also detected an alternative isoform that can retain the intron in the mature transcript and that is predominantly expressed in human melanocytes. Conclusion Our findings show that (a molecular evolution of paralogs correlates with their expression pattern; (b gene diversification is obtained through massive genomic rearrangements; and (c splicing modification contributes to the functional specialization of novel genes.

  6. Functional analysis of prognostic gene expression network genes in metastatic breast cancer models.

    Directory of Open Access Journals (Sweden)

    Thomas R Geiger

    Full Text Available Identification of conserved co-expression networks is a useful tool for clustering groups of genes enriched for common molecular or cellular functions [1]. The relative importance of genes within networks can frequently be inferred by the degree of connectivity, with those displaying high connectivity being significantly more likely to be associated with specific molecular functions [2]. Previously we utilized cross-species network analysis to identify two network modules that were significantly associated with distant metastasis free survival in breast cancer. Here, we validate one of the highly connected genes as a metastasis associated gene. Tpx2, the most highly connected gene within a proliferation network specifically prognostic for estrogen receptor positive (ER+ breast cancers, enhances metastatic disease, but in a tumor autonomous, proliferation-independent manner. Histologic analysis suggests instead that variation of TPX2 levels within disseminated tumor cells may influence the transition between dormant to actively proliferating cells in the secondary site. These results support the co-expression network approach for identification of new metastasis-associated genes to provide new information regarding the etiology of breast cancer progression and metastatic disease.

  7. IGF-I Gene Therapy in Aging Rats Modulates Hippocampal Genes Relevant to Memory Function.

    Science.gov (United States)

    Pardo, Joaquín; Abba, Martin C; Lacunza, Ezequiel; Ogundele, Olalekan M; Paiva, Isabel; Morel, Gustavo R; Outeiro, Tiago F; Goya, Rodolfo G

    2018-03-14

    In rats, learning and memory performance decline during normal aging, which makes this rodent species a suitable model to evaluate therapeutic strategies. In aging rats, insulin-like growth factor-I (IGF-I), is known to significantly improve spatial memory accuracy as compared to control counterparts. A constellation of gene expression changes underlie the hippocampal phenotype of aging but no studies on the effects of IGF-I on the hippocampal transcriptome of old rodents have been documented. Here, we assessed the effects of IGF-I gene therapy on spatial memory performance in old female rats and compared them with changes in the hippocampal transcriptome. In the Barnes maze test, experimental rats showed a significantly higher exploratory frequency of the goal hole than controls. Hippocampal RNA-sequencing showed that 219 genes are differentially expressed in 28-month-old rats intracerebroventricularly injected with an adenovector expressing rat IGF-I as compared with placebo adenovector-injected counterparts. From the differentially expressed genes, 81 were down and 138 upregulated. From those genes, a list of functionally relevant genes, concerning hippocampal IGF-I expression, synaptic plasticity as well as neuronal function was identified. Our results provide an initial glimpse at the molecular mechanisms underlying the neuroprotective actions of IGF-I in the aging brain.

  8. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  9. Statistical algorithms for ontology-based annotation of scientific literature.

    Science.gov (United States)

    Chakrabarti, Chayan; Jones, Thomas B; Luger, George F; Xu, Jiawei F; Turner, Matthew D; Laird, Angela R; Turner, Jessica A

    2014-01-01

    Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology's structure. We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge. We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score. Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations.

  10. Sistem Promosi Pariwisata Menggunakan Ontologi

    Directory of Open Access Journals (Sweden)

    Adi Kurniawan

    2013-03-01

    Full Text Available Pariwisata merupakan sektor yang penting di Indonesia. World Tourism Organization (WTO meramalkan pada tahun  2019,  bahwa  industri pariwisata Asia Pasifik akan mengalami perkembangan yang menjanjikan terutama dari segi pendapatan. Sistem  promosi pariwisata berbasis konteks yang ada hanya mengakomodasi pelancong yang sudah memiliki rencana dengan jelas (pelancong terencana, sedangkan pelancong yang sekedar ingin menjelajahi kota, berjalan-jalan  atau menghabiskan waktu luang (pelancong dadakan belum ada yang mengakomodasi. Salah satu solusi tersebut adalah dengan menggunakan teknologi piranti bergerak dan ontologi.  Piranti bergerak memudahkan pelancong untuk mendapatkan informasi kapanpun dan dimanapun. Sedangkan penggunaan ontologi akan mempermudah penyajian informasi yang lebih relevan kepada pelancong. Ontologi dalam konteks studi ini adalah ontologi probabilitas dengan pendekatan bayesian network. Pengujian sistem dibagi menjadi dua bagian yaitu uji validitas kebutuhan sistem dengan menggunakan perkaka Requirements Traceability Matrixs (RTM dan pengujian sistem purwarupa dengan pengujian kotak hitam. Secara umum, fungsionalitas sistem berjalan baik dan sesuai dengan rancangan sistem.

  11. Ontology for the Intelligence Analyst

    Science.gov (United States)

    2012-12-01

    an artillery tractor that runs on wheels ⎣ tracked artillery tractor =def: an artillery tractor that runs on caterpillar track 20...in response to identified situational needs of analysts, and architectural requirements are designed to ensure coherent evolution of the SE resource...Coordinated Evolution of Ontologies to Support Biomedical Data Integration”, Nature Biotechnology, 25 (11), November 2007, 1251-1255. 13

  12. Ontological problems of contemporary linguistics

    Directory of Open Access Journals (Sweden)

    А В Бондаренко

    2009-03-01

    Full Text Available The article studies linguistic ontology problems such as evolution of essential-existential views of language, interrelation within Being-Language-Man triad, linguistics gnosiological principles, language essence localization, and «expression» as language metalinguistic unit as well as architectonics of language personality et alia.

  13. CLO : The cell line ontology

    NARCIS (Netherlands)

    Sarntivijai, Sirarat; Lin, Yu; Xiang, Zuoshuang; Meehan, Terrence F.; Diehl, Alexander D.; Vempati, Uma D.; Schuerer, Stephan C.; Pang, Chao; Malone, James; Parkinson, Helen; Liu, Yue; Takatsuki, Terue; Saijo, Kaoru; Masuya, Hiroshi; Nakamura, Yukio; Brush, Matthew H.; Haendel, Melissa A.; Zheng, Jie; Stoeckert, Christian J.; Peters, Bjoern; Mungall, Christopher J.; Carey, Thomas E.; States, David J.; Athey, Brian D.; He, Yongqun

    2014-01-01

    Background: Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO

  14. Anytime classification by ontology approximation

    NARCIS (Netherlands)

    Schlobach, S.; Blaauw, E.; El Kebir, M.; Ten Teije, A.; Van Harmelen, F.; Bortoli, S.; Hobbelman, M.C.; Millian, K.; Ren, Y.; Stam, S.; Thomassen, P.; Van Het Schip, R.; Van Willigem, W.

    2007-01-01

    Reasoning with large or complex ontologies is one of the bottle-necks of the Semantic Web. In this paper we present an anytime algorithm for classification based on approximate subsumption. We give the formal definitions for approximate subsumption, and show its monotonicity and soundness; we show

  15. Quantum physics and relational ontology

    Energy Technology Data Exchange (ETDEWEB)

    Cordovil, Joao [Center of Philosophy of Sciences of University of Lisbon (Portugal)

    2013-07-01

    The discovery of the quantum domain of reality put a serious ontological challenge, a challenge that is still well present in the recent developments of Quantum Physics. Physics was conceived from an atomistic conception of the world, reducing it, in all its diversity, to two types of entities: simple, individual and immutable entities (atoms, in metaphysical sense) and composite entities, resulting solely from combinations. Linear combinations, additive, indifferent to the structure or to the context. However, the discovery of wave-particle dualism and the developments in Quantum Field Theories and in Quantum Nonlinear Physical, showed that quantum entities are not, in metaphysical sense, neither simple, nor merely the result of linear (or additive) combinations. In other words, the ontological foundations of Physics revealed as inadequate to account for the nature of quantum entities. Then a fundamental challenge arises: How to think the ontic nature of these entities? In my view, this challenge appeals to a relational and dynamist ontology of physical entities. This is the central hypothesis of this communication. In this sense, this communication has two main intentions: 1) positively characterize this relational and dynamist ontology; 2) show some elements of its metaphysical suitability to contemporary Quantum Physics.

  16. Constitutive rules, language, and ontology

    NARCIS (Netherlands)

    Hindriks, Frank

    It is a commonplace within philosophy that the ontology of institutions can be captured in terms of constitutive rules. What exactly such rules are, however, is not well understood. They are usually contrasted to regulative rules: constitutive rules (such as the rules of chess) make institutional

  17. miRNA-mediated functional changes through co-regulating function related genes.

    Directory of Open Access Journals (Sweden)

    Jie He

    Full Text Available BACKGROUND: MicroRNAs play important roles in various biological processes involving fairly complex mechanism. Analysis of genome-wide miRNA microarray demonstrate that a single miRNA can regulate hundreds of genes, but the regulative extent on most individual genes is surprisingly mild so that it is difficult to understand how a miRNA provokes detectable functional changes with such mild regulation. RESULTS: To explore the internal mechanism of miRNA-mediated regulation, we re-analyzed the data collected from genome-wide miRNA microarray with bioinformatics assay, and found that the transfection of miR-181b and miR-34a in Hela and HCT-116 tumor cells regulated large numbers of genes, among which, the genes related to cell growth and cell death demonstrated high Enrichment scores, suggesting that these miRNAs may be important in cell growth and cell death. MiR-181b induced changes in protein expression of most genes that were seemingly related to enhancing cell growth and decreasing cell death, while miR-34a mediated contrary changes of gene expression. Cell growth assays further confirmed this finding. In further study on miR-20b-mediated osteogenesis in hMSCs, miR-20b was found to enhance osteogenesis by activating BMPs/Runx2 signaling pathway in several stages by co-repressing of PPARγ, Bambi and Crim1. CONCLUSIONS: With its multi-target characteristics, miR-181b, miR-34a and miR-20b provoked detectable functional changes by co-regulating functionally-related gene groups or several genes in the same signaling pathway, and thus mild regulation from individual miRNA targeting genes could have contributed to an additive effect. This might also be one of the modes of miRNA-mediated gene regulation.

  18. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  19. Comparative analysis of the time-dependent functional and molecular changes in spinal cord degeneration induced by the G93A SOD1 gene mutation and by mechanical compression

    Directory of Open Access Journals (Sweden)

    Priestley John V

    2008-10-01

    Full Text Available Abstract Background Mutations of the superoxide dismutase 1 (SOD1 gene are linked to amyotrophic lateral sclerosis (ALS, an invariably fatal neurological condition involving cortico-spinal degeneration. Mechanical injury can also determine spinal cord degeneration and act as a risk factor for the development of ALS. Results We have performed a comparative ontological analysis of the gene expression profiles of thoracic cord samples from rats carrying the G93A SOD1 gene mutation and from wild-type littermates subjected to mechanical compression of the spinal cord. Common molecular responses and gene expression changes unique to each experimental paradigm were evaluated against the functional development of each animal model. Gene Ontology categories crucial to protein folding, extracellular matrix and axonal formation underwent early activation in both experimental paradigms, but decreased significantly in the spinal cord from animals recovering from injury after 7 days and from the G93A SOD1 mutant rats at end-stage disease. Functional improvement after compression coincided with a massive up-regulation of growth-promoting gene categories including factors involved in angiogenesis and transcription, overcoming the more transitory surge of pro-apoptotic components and cell-cycle genes. The cord from G93A SOD1 mutants showed persistent over-expression of apoptotic and stress molecules with fewer neurorestorative signals, while functional deterioration was ongoing. Conclusion this study illustrates how cytoskeletal protein metabolism is central to trauma and genetically-induced spinal cord degeneration and elucidates the main molecular events accompanying functional recovery or decline in two different animal models of spinal cord degeneration.

  20. Huntington's Disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database

    Directory of Open Access Journals (Sweden)

    Kalathur Ravi Kiran

    2012-06-01

    Full Text Available Abstract Background Huntington’s disease (HD is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling, but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling. For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are

  1. Huntington's disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database.

    Science.gov (United States)

    Kalathur, Ravi Kiran Reddy; Hernández-Prieto, Miguel A; Futschik, Matthias E

    2012-06-28

    Huntington's disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally

  2. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.

    Science.gov (United States)

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-02-21

    Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.

  3. Ontology- and graph-based similarity assessment in biological networks.

    Science.gov (United States)

    Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2010-10-15

    A standard systems-based approach to biomarker and drug target discovery consists of placing putative biomarkers in the context of a network of biological interactions, followed by different 'guilt-by-association' analyses. The latter is typically done based on network structural features. Here, an alternative analysis approach in which the networks are analyzed on a 'semantic similarity' space is reported. Such information is extracted from ontology-based functional annotations. We present SimTrek, a Cytoscape plugin for ontology-based similarity assessment in biological networks. http://rosalind.infj.ulst.ac.uk/SimTrek.html francisco.azuaje@crp-sante.lu Supplementary data are available at Bioinformatics online.

  4. Gene therapy rescues cone function in congenital achromatopsia.

    Science.gov (United States)

    Komáromy, András M; Alexander, John J; Rowlan, Jessica S; Garcia, Monique M; Chiodo, Vince A; Kaya, Asli; Tanaka, Jacqueline C; Acland, Gregory M; Hauswirth, William W; Aguirre, Gustavo D

    2010-07-01

    The successful restoration of visual function with recombinant adeno-associated virus (rAAV)-mediated gene replacement therapy in animals and humans with an inherited disease of the retinal pigment epithelium has ushered in a new era of retinal therapeutics. For many retinal disorders, however, targeting of therapeutic vectors to mutant rods and/or cones will be required. In this study, the primary cone photoreceptor disorder achromatopsia served as the ideal translational model to develop gene therapy directed to cone photoreceptors. We demonstrate that rAAV-mediated gene replacement therapy with different forms of the human red cone opsin promoter led to the restoration of cone function and day vision in two canine models of CNGB3 achromatopsia, a neuronal channelopathy that is the most common form of achromatopsia in man. The robustness and stability of the observed treatment effect was mutation independent, but promoter and age dependent. Subretinal administration of rAAV5-hCNGB3 with a long version of the red cone opsin promoter in younger animals led to a stable therapeutic effect for at least 33 months. Our results hold promise for future clinical trials of cone-directed gene therapy in achromatopsia and other cone-specific disorders.

  5. Computing an Ontological Semantics for a Natural Language Fragment

    DEFF Research Database (Denmark)

    Szymczak, Bartlomiej Antoni

    The key objective of the research that has been carried out has been to establish theoretically sound connections between the following two areas: • Computational processing of texts in natural language by means of logical methods • Theories and methods for engineering of formal ontologies We have...... tried to establish a domain independent “ontological semantics” for relevant fragments of natural language. The purpose of this research is to develop methods and systems for taking advantage of formal ontologies for the purpose of extracting the meaning contents of texts. This functionality...... is desirable e.g. for future content–based search systems in contrast to today’s keyword based search systems (viz., Google) which rely chiefly on recognition of stated keywords in the targeted text. Logical methods were introduced into semantic theories for natural language already during the 60’s in what...

  6. Sponge Microbiota are a Reservoir of Functional Antibiotic Resistance Genes

    DEFF Research Database (Denmark)

    Versluis, Dennis; de Evgrafov, Mari Cristina Rodriguez; Sommer, Morten Otto Alexander

    2016-01-01

    Wide application of antibiotics has contributed to the evolution of multi-drug resistant human pathogens, resulting in poorer treatment outcomes for infections. In the marine environment, seawater samples have been investigated as a resistance reservoir; however, no studies have methodically...... examined sponges as a reservoir of antibiotic resistance. Sponges could be important in this respect because they often contain diverse microbial communities that have the capacity to produce bioactive metabolites. Here, we applied functional metagenomics to study the presence and diversity of functional......). Fifteen of 37 inserts harbored resistance genes that shared resistance gene could be identified with high confidence, in which case we predicted resistance to be mainly mediated by antibiotic efflux. One marine-specific ampicillin-resistance...

  7. Gradient Learning Algorithms for Ontology Computing

    Science.gov (United States)

    Gao, Wei; Zhu, Linli

    2014-01-01

    The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting. PMID:25530752

  8. Gradient learning algorithms for ontology computing.

    Science.gov (United States)

    Gao, Wei; Zhu, Linli

    2014-01-01

    The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting.

  9. Gradient Learning Algorithms for Ontology Computing

    Directory of Open Access Journals (Sweden)

    Wei Gao

    2014-01-01

    Full Text Available The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting.

  10. The epistemology and ontology of human-computer interaction

    NARCIS (Netherlands)

    Brey, Philip A.E.

    2005-01-01

    This paper analyzes epistemological and ontological dimensions of Human-Computer Interaction (HCI) through an analysis of the functions of computer systems in relation to their users. It is argued that the primary relation between humans and computer systems has historically been epistemic:

  11. Gene expression profiling for human iPS-derived motor neurons from sporadic ALS patients reveals a strong association between mitochondrial functions and neurodegeneration

    Science.gov (United States)

    Alves, Chrystian J.; Dariolli, Rafael; Jorge, Frederico M.; Monteiro, Matheus R.; Maximino, Jessica R.; Martins, Roberto S.; Strauss, Bryan E.; Krieger, José E.; Callegaro, Dagoberto; Chadi, Gerson

    2015-01-01

    Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease that leads to widespread motor neuron death, general palsy and respiratory failure. The most prevalent sporadic ALS form is not genetically inherited. Attempts to translate therapeutic strategies have failed because the described mechanisms of disease are based on animal models carrying specific gene mutations and thus do not address sporadic ALS. In order to achieve a better approach to study the human disease, human induced pluripotent stem cell (hiPSC)-differentiated motor neurons were obtained from motor nerve fibroblasts of sporadic ALS and non-ALS subjects using the STEMCCA Cre-Excisable Constitutive Polycistronic Lentivirus system and submitted to microarray analyses using a whole human genome platform. DAVID analyses of differentially expressed genes identified molecular function and biological process-related genes through Gene Ontology. REVIGO highlighted the related functions mRNA and DNA binding, GTP binding, transcription (co)-repressor activity, lipoprotein receptor binding, synapse organization, intracellular transport, mitotic cell cycle and cell death. KEGG showed pathways associated with Parkinson's disease and oxidative phosphorylation, highlighting iron homeostasis, neurotrophic functions, endosomal trafficking and ERK signaling. The analysis of most dysregulated genes and those representative of the majority of categorized genes indicates a strong association between mitochondrial function and cellular processes possibly related to motor neuron degeneration. In conclusion, iPSC-derived motor neurons from motor nerve fibroblasts of sporadic ALS patients may recapitulate key mechanisms of neurodegeneration and may offer an opportunity for translational investigation of sporadic ALS. Large gene profiling of differentiated motor neurons from sporadic ALS patients highlights mitochondrial participation in the establishment of autonomous mechanisms associated with sporadic ALS

  12. Ontologies as integrative tools for plant science.

    Science.gov (United States)

    Walls, Ramona L; Athreya, Balaji; Cooper, Laurel; Elser, Justin; Gandolfo, Maria A; Jaiswal, Pankaj; Mungall, Christopher J; Preece, Justin; Rensing, Stefan; Smith, Barry; Stevenson, Dennis W

    2012-08-01

    Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web. This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae). Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.

  13. Functional screening of antibiotic resistance genes from human gut microbiota reveals a novel gene fusion.

    Science.gov (United States)

    Cheng, Gong; Hu, Yongfei; Yin, Yeshi; Yang, Xi; Xiang, Chunsheng; Wang, Baohong; Chen, Yanfei; Yang, Fengling; Lei, Fang; Wu, Na; Lu, Na; Li, Jing; Chen, Quanze; Li, Lanjuan; Zhu, Baoli

    2012-11-01

    The human gut microbiota has a high density of bacteria that are considered a reservoir for antibiotic resistance genes (ARGs). In this study, one fosmid metagenomic library generated from the gut microbiota of four healthy humans was used to screen for ARGs against seven antibiotics. Eight new ARGs were obtained: one against amoxicillin, six against d-cycloserine, and one against kanamycin. The new amoxicillin resistance gene encodes a protein with 53% identity to a class D β-lactamase from Riemerella anatipestifer RA-GD. The six new d-cycloserine resistance genes encode proteins with 73-81% identity to known d-alanine-d-alanine ligases. The new kanamycin resistance gene encodes a protein of 274 amino acids with an N-terminus (amino acids 1-189) that has 42% identity to the 6'-aminoglycoside acetyltransferase [AAC(6')] from Enterococcus hirae and a C-terminus (amino acids 190-274) with 35% identity to a hypothetical protein from Clostridiales sp. SSC/2. A functional study on the novel kanamycin resistance gene showed that only the N-terminus conferred kanamycin resistance. Our results showed that functional metagenomics is a useful tool for the identification of new ARGs. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  14. ¿Ontología u Ontologías?

    OpenAIRE

    Vélez León, Paulo

    2015-01-01

    En recientes décadas se ha observado un renovado interés por algunos de los temas clásicos de la ontología, desde áreas de conocimiento externas a la filosofía, sin embargo, este renacimiento ontológico ha «estimulado» una multiplicidad y diversidad de teorías y concepciones «ontológicas» que ha dado como consecuencia una proliferación de «ontologías» y de interminables batallas para determinar qué tipo de «entidades» estudian sus respectivos «dominios», que a su vez se consideran autónomos e...

  15. ¿Ontología u Ontologías?

    OpenAIRE

    Vélez León, Paulo

    2015-01-01

    [ES] En recientes décadas se ha observado un renovado interés por algunos de los temas clásicos de la ontología, desde áreas de conocimiento externas a la filosofía, sin embargo, este renacimiento ontológico ha «estimulado» una multiplicidad y diversidad de teorías y concepciones «ontológicas» que ha dado como consecuencia una proliferación de «ontologías» y de interminables batallas para determinar qué tipo de «entidades» estudian sus respectivos «dominios», que a su vez se consideran autóno...

  16. Knowledge-based analysis of functional impacts of mutations in ...

    Indian Academy of Sciences (India)

    We developed a knowledge-based method to analyse the functional impacts of mutations in miRNA seed regions. We computed the gene ontology-based similarity score GOSS and the GOSS percentile score for all 517 SNPs in miRNA seeds. In addition to the annotation of SNPs for their functional effects, in the present ...

  17. Cloning and functional characterization of carotenoid cleavage dioxygenase 4 genes.

    Science.gov (United States)

    Huang, Fong-Chin; Molnár, Péter; Schwab, Wilfried

    2009-01-01

    Although a number of plant carotenoid cleavage dioxygenase (CCD) genes have been functionally characterized in different plant species, little is known about the biochemical role and enzymatic activities of members of the subclass 4 (CCD4). To gain insight into their biological function, CCD4 genes were isolated from apple (Malus x domestica, MdCCD4), chrysanthemum (Chrysanthemum x morifolium, CmCCD4a), rose (Rosa x damascena, RdCCD4), and osmanthus (Osmanthus fragrans, OfCCD4), and were expressed, together with AtCCD4, in Escherichia coli. In vivo assays showed that CmCCD4a and MdCCD4 cleaved beta-carotene well to yield beta-ionone, while OfCCD4, RdCCD4, and AtCCD4 were almost inactive towards this substrate. No cleavage products were found for any of the five CCD4 genes when they were co-expressed in E. coli strains that accumulated cis-zeta-carotene and lycopene. In vitro assays, however, demonstrated the breakdown of 8'-apo-beta-caroten-8'-al by AtCCD4 and RdCCD4 to beta-ionone, while this apocarotenal was almost not degraded by OfCCD4, CmCCD4a, and MdCCD4. Sequence analysis of genomic clones of CCD4 genes revealed that RdCCD4, like AtCCD4, contains no intron, while MdCCD, OfCCD4, and CmCCD4a contain introns. These results indicate that plants produce at least two different forms of CCD4 proteins. Although CCD4 enzymes cleave their substrates at the same position (9,10 and 9',10'), they might have different biochemical functions as they accept different (apo)-carotenoid substrates, show various expression patterns, and are genomically differently organized.

  18. Ontological realism: A methodology for coordinated evolution of scientific ontologies.

    Science.gov (United States)

    Smith, Barry; Ceusters, Werner

    2010-11-15

    Since 2002 we have been testing and refining a methodology for ontology development that is now being used by multiple groups of researchers in different life science domains. Gary Merrill, in a recent paper in this journal, describes some of the reasons why this methodology has been found attractive by researchers in the biological and biomedical sciences. At the same time he assails the methodology on philosophical grounds, focusing specifically on our recommendation that ontologies developed for scientific purposes should be constructed in such a way that their terms are seen as referring to what we call universals or types in reality. As we show, Merrill's critique is of little relevance to the success of our realist project, since it not only reveals no actual errors in our work but also criticizes views on universals that we do not in fact hold. However, it nonetheless provides us with a valuable opportunity to clarify the realist methodology, and to show how some of its principles are being applied, especially within the framework of the OBO (Open Biomedical Ontologies) Foundry initiative.

  19. Transcriptome analysis by GeneTrail revealed regulation of functional categories in response to alterations of iron homeostasis in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Lenhof Hans-Peter

    2011-05-01

    Full Text Available Abstract Background High-throughput technologies have opened new avenues to study biological processes and pathways. The interpretation of the immense amount of data sets generated nowadays needs to be facilitated in order to enable biologists to identify complex gene networks and functional pathways. To cope with this task multiple computer-based programs have been developed. GeneTrail is a freely available online tool that screens comparative transcriptomic data for differentially regulated functional categories and biological pathways extracted from common data bases like KEGG, Gene Ontology (GO, TRANSPATH and TRANSFAC. Additionally, GeneTrail offers a feature that allows screening of individually defined biological categories that are relevant for the respective research topic. Results We have set up GeneTrail for the use of Arabidopsis thaliana. To test the functionality of this tool for plant analysis, we generated transcriptome data of root and leaf responses to Fe deficiency and the Arabidopsis metal homeostasis mutant nas4x-1. We performed Gene Set Enrichment Analysis (GSEA with eight meaningful pairwise comparisons of transcriptome data sets. We were able to uncover several functional pathways including metal homeostasis that were affected in our experimental situations. Representation of the differentially regulated functional categories in Venn diagrams uncovered regulatory networks at the level of whole functional pathways. Over-Representation Analysis (ORA of differentially regulated genes identified in pairwise comparisons revealed specific functional plant physiological categories as major targets upon Fe deficiency and in nas4x-1. Conclusion Here, we obtained supporting evidence, that the nas4x-1 mutant was defective in metal homeostasis. It was confirmed that nas4x-1 showed Fe deficiency in roots and signs of Fe deficiency and Fe sufficiency in leaves. Besides metal homeostasis, biotic stress, root carbohydrate, leaf

  20. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  1. Association of lung function genes with chronic obstructive pulmonary disease.

    Science.gov (United States)

    Kim, Woo Jin; Lim, Myoung Nam; Hong, Yoonki; Silverman, Edwin K; Lee, Ji-Hyun; Jung, Bock Hyun; Ra, Seung Won; Choi, Hye Sook; Jung, Young Ju; Park, Yong Bum; Park, Myung Jae; Lee, Sei Won; Lee, Jae Seung; Oh, Yeon-Mok; Lee, Sang Do

    2014-08-01

    Spirometric measurements of pulmonary function are important in diagnosing and determining the severity of chronic obstructive pulmonary disease (COPD). We performed this study to determine whether candidate genes identified in genome-wide association studies of spirometric measurements were associated with COPD and if they interacted with smoking intensity. The current analysis included 1,000 COPD subjects and 1,000 controls recruited from 24 hospital-based pulmonary clinics. Thirteen SNPs, chosen based on genome-wide association studies of spirometric measurements in the Korean population cohorts, were genotyped. Genetic association tests were performed, adjusting for age, sex, and smoking intensity, using models including a SNP-by-smoking interaction term. PID1 and FAM13A were significantly associated with COPD susceptibility. There were also significant interactions between SNPs in ACN9 and FAM13A and smoking pack-years, and an association of ACN9 with COPD in the lowest smoking tertile. The risk allele of FAM13A was associated with increased expression of FAM13A in the lung. We have validated associations of FAM13A and PID1 with COPD. ACN9 showed significant interaction with smoking and is a potential candidate gene for COPD. Significant associations of genetic variants of FAM13A with gene expression levels suggest that the associated loci may act as genetic regulatory elements for FAM13A gene expression.

  2. Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

    Directory of Open Access Journals (Sweden)

    Hakenberg Jörg

    2009-01-01

    Full Text Available Abstract Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success than on MeSH (73% success as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90

  3. Tmc gene therapy restores auditory function in deaf mice.

    Science.gov (United States)

    Askew, Charles; Rochat, Cylia; Pan, Bifeng; Asai, Yukako; Ahmed, Hena; Child, Erin; Schneider, Bernard L; Aebischer, Patrick; Holt, Jeffrey R

    2015-07-08

    Genetic hearing loss accounts for up to 50% of prelingual deafness worldwide, yet there are no biologic treatments currently available. To investigate gene therapy as a potential biologic strategy for restoration of auditory function in patients with genetic hearing loss, we tested a gene augmentation approach in mouse models of genetic deafness. We focused on DFNB7/11 and DFNA36, which are autosomal recessive and dominant deafnesses, respectively, caused by mutations in transmembrane channel-like 1 (TMC1). Mice that carry targeted deletion of Tmc1 or a dominant Tmc1 point mutation, known as Beethoven, are good models for human DFNB7/11 and DFNA36. We screened several adeno-associated viral (AAV) serotypes and promoters and identified AAV2/1 and the chicken β-actin (Cba) promoter as an efficient combination for driving the expression of exogenous Tmc1 in inner hair cells in vivo. Exogenous Tmc1 or its closely related ortholog, Tmc2, were capable of restoring sensory transduction, auditory brainstem responses, and acoustic startle reflexes in otherwise deaf mice, suggesting that gene augmentation with Tmc1 or Tmc2 is well suited for further development as a strategy for restoration of auditory function in deaf patients who carry TMC1 mutations. Copyright © 2015, American Association for the Advancement of Science.

  4. Analysis and visualization of gene expression data using ...

    African Journals Online (AJOL)

    BicAT-plus incorporates a reasonable biological comparative methodology based on the enrichment of the output biclusters with gene ontology functional categories. No exact algorithm can be considered the optimum one. Instead, biclustering algorithms can be used as integrated techniques to highlight the most enriched ...

  5. ADO: a disease ontology representing the domain knowledge specific to Alzheimer's disease.

    Science.gov (United States)

    Malhotra, Ashutosh; Younesi, Erfan; Gündel, Michaela; Müller, Bernd; Heneka, Michael T; Hofmann-Apitius, Martin

    2014-03-01

    Biomedical ontologies offer the capability to structure and represent domain-specific knowledge semantically. Disease-specific ontologies can facilitate knowledge exchange across multiple disciplines, and ontology-driven mining approaches can generate great value for modeling disease mechanisms. However, in the case of neurodegenerative diseases such as Alzheimer's disease, there is a lack of formal representation of the relevant knowledge domain. Alzheimer's disease ontology (ADO) is constructed in accordance to the ontology building life cycle. The Protégé OWL editor was used as a tool for building ADO in Ontology Web Language format. ADO was developed with the purpose of containing information relevant to four main biological views-preclinical, clinical, etiological, and molecular/cellular mechanisms-and was enriched by adding synonyms and references. Validation of the lexicalized ontology by means of named entity recognition-based methods showed a satisfactory performance (F score = 72%). In addition to structural and functional evaluation, a clinical expert in the field performed a manual evaluation and curation of ADO. Through integration of ADO into an information retrieval environment, we show that the ontology supports semantic search in scientific text. The usefulness of ADO is authenticated by dedicated use case scenarios. Development of ADO as an open ADO is a first attempt to organize information related to Alzheimer's disease in a formalized, structured manner. We demonstrate that ADO is able to capture both established and scattered knowledge existing in scientific text. Copyright © 2014 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  6. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows

    Science.gov (United States)

    2013-01-01

    Background Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses. PMID:23286517

  7. Revealing ontological commitments by magic.

    Science.gov (United States)

    Griffiths, Thomas L

    2015-03-01

    Considering the appeal of different magical transformations exposes some systematic asymmetries. For example, it is more interesting to transform a vase into a rose than a rose into a vase. An experiment in which people judged how interesting they found different magic tricks showed that these asymmetries reflect the direction a transformation moves in an ontological hierarchy: transformations in the direction of animacy and intelligence are favored over the opposite. A second and third experiment demonstrated that judgments of the plausibility of machines that perform the same transformations do not show the same asymmetries, but judgments of the interestingness of such machines do. A formal argument relates this sense of interestingness to evidence for an alternative to our current physical theory, with magic tricks being a particularly pure source of such evidence. These results suggest that people's intuitions about magic tricks can reveal the ontological commitments that underlie human cognition. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. The Usability-Error Ontology

    DEFF Research Database (Denmark)

    Elkin, Peter L.; Beuscart-zephir, Marie-Catherine; Pelayo, Sylvia

    2013-01-01

    in patients coming to harm. Often the root cause analysis of these adverse events can be traced back to Usability Errors in the Health Information Technology (HIT) or its interaction with users. Interoperability of the documentation of HIT related Usability Errors in a consistent fashion can improve our...... ability to do systematic reviews and meta-analyses. In an effort to support improved and more interoperable data capture regarding Usability Errors, we have created the Usability Error Ontology (UEO) as a classification method for representing knowledge regarding Usability Errors. We expect the UEO...... will grow over time to support an increasing number of HIT system types. In this manuscript, we present this Ontology of Usability Error Types and specifically address Computerized Physician Order Entry (CPOE), Electronic Health Records (EHR) and Revenue Cycle HIT systems....

  9. Effects of traditional Japanese massage therapy on gene expression: preliminary study.

    Science.gov (United States)

    Donoyama, Nozomi; Ohkoshi, Norio

    2011-06-01

    Changes in gene expression after traditional Japanese massage therapy were investigated to clarify the mechanisms of the clinical effects of traditional Japanese massage therapy. This was a pilot experimental study. The study was conducted in a laboratory at Tsukuba University of Technology. The subjects were 2 healthy female volunteers (58-year-old Participant A, 55-year-old Participant B). The intervention consisted of a 40-minute full-body massage using standard traditional Japanese massage techniques through the clothing and a 40-minute rest as a control, in which participants lie on the massage table without being massaged. Before and after an intervention, blood was taken and analyzed by microarray: (1) The number of genes whose expression was more than double after the intervention than before was examined; (2) For those genes, gene ontology analysis identified statistically significant gene ontology terms. The gene expression count in the total of 41,000 genes was 1256 genes for Participant A and 1778 for Participant B after traditional Japanese massage, and was 157 and 82 after the control, respectively. The significant gene ontology terms selected by both Participants A and B after massage were "immune response" and "immune system," whereas no gene ontology terms were selected by them in the control. It is implied that traditional Japanese massage therapy may affect the immune function. Further studies with more samples are necessary.

  10. Adaptation of the MapMan ontology to biotic stress responses: application in solanaceous species

    Directory of Open Access Journals (Sweden)

    Stitt Mark

    2007-09-01

    Full Text Available Abstract Background The results of transcriptome microarray analysis are usually presented as a list of differentially expressed genes. As these lists can be long, it is hard to interpret the desired experimental treatment effect on the physiology of analysed tissue, e.g. via selected metabolic or other pathways. For some organisms, gene ontologies and data visualization software have been implemented to overcome this problem, whereas for others, software adaptation is yet to be done. Results We present the classification of tentative potato contigs from the potato gene index (StGI available from Dana-Farber Cancer Institute (DFCI into the MapMan ontology to enable the application of the MapMan family of tools to potato microarrays. Special attention has been focused on mapping genes that could not be annotated based on similarity to Arabidopsis genes alone, thus possibly representing genes unique for potato. 97 such genes were classified into functional BINs (i.e. functional classes after manual annotation. A new pathway, focusing on biotic stress responses, has been added and can be used for all other organisms for which mappings have been done. The BIN representation on the potato 10 k cDNA microarray, in comparison with all putative potato gene sequences, has been tested. The functionality of the prepared potato mapping was validated with experimental data on plant response to viral infection. In total 43,408 unigenes were mapped into 35 corresponding BINs. Conclusion The potato mappings can be used to visualize up-to-date, publicly available, expressed sequence tags (ESTs and other sequences from GenBank, in combination with metabolic pathways. Further expert work on potato annotations will be needed with the ongoing EST and genome sequencing of potato. The current MapMan application for potato is directly applicable for analysis of data obtained on potato 10 k cDNA microarray by TIGR (The Institute for Genomic Research but can also be used

  11. Building an ontology for cyberterrorism

    CSIR Research Space (South Africa)

    Veerasamy, N

    2012-07-01

    Full Text Available . As defined in this research, a cyberterrorism attack consists of a high-level motivation that is religious, social or political. The individual/group can furthermore be classified as having a specific driving force depending of the level of extremism... or revolutionary thinking. Thus, the ontology will take into consideration the motivating characteristics that play a significant role in contributing towards the definition of cyberterrorism. Overall, this paper promotes the understanding of the field...

  12. Functional imaging: monitoring heme oxygenase-1 gene expression in vivo

    Science.gov (United States)

    Zhang, Weisheng; Reilly-Contag, Pamela; Stevenson, David K.; Contag, Christopher H.

    1999-07-01

    The regulation of genetic elements can be monitored in living animals using photoproteins as reporters. Heme oxygenase (HO) is the key catabolic enzyme in the heme degradation pathway. Here, HO expression serves as a model for in vivo functional imaging of transcriptional regulation of a clinically relevant gene. HO enzymatic activity is inhibited by heme analogs, metalloporphyrins, but many members of this family of compounds also activate transcription of the HO-1 promoter. The degree of transcriptional activation by twelve metalloporphyrins, differing at the central metal and porphyrin ring substituents, was evaluated in both NIH 3T3 stable lines and transgenic animals containing HO-1 promoter-luciferase gene fusions. In the correlative cell culture assays, the metalloporphyrins increased transcription form the full length HO promoter fusion to varying degrees, but none increased transcription from a truncated HO-1 promoter. These results suggested that one or both of the two distal enhancer elements located at -4 and -10 Kb upstream from transcriptional start are required for HO-1 induction by heme and its analogs. The full-length HO-1-luc fusion was then evaluated as a transgene in mice. It was possible to monitor the effects of the metalloporphyrins, SnMP and ZnPP, in living animals over time. This spatiotemporal analyses of gene expression in vivo implied that alterations in porphyrin ring substituents and the central metal may affect the extent of gene activation. These data further indicate that using photoprotein reporters, subtle differences in gene expression can be monitored in living animals.

  13. Ontology Maintenance using Textual Analysis

    Directory of Open Access Journals (Sweden)

    Yassine Gargouri

    2003-10-01

    Full Text Available Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

  14. The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses

    Science.gov (United States)

    Cooper, Laurel; Walls, Ramona L.; Elser, Justin; Gandolfo, Maria A.; Stevenson, Dennis W.; Smith, Barry; Preece, Justin; Athreya, Balaji; Mungall, Christopher J.; Rensing, Stefan; Hiss, Manuel; Lang, Daniel; Reski, Ralf; Berardini, Tanya Z.; Li, Donghui; Huala, Eva; Schaeffer, Mary; Menda, Naama; Arnaud, Elizabeth; Shrestha, Rosemary; Yamazaki, Yukiko; Jaiswal, Pankaj

    2013-01-01

    The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs. PMID:23220694

  15. Induction of Protective Genes Leads to Islet Survival and Function

    Directory of Open Access Journals (Sweden)

    Hongjun Wang

    2011-01-01

    Full Text Available Islet transplantation is the most valid approach to the treatment of type 1 diabetes. However, the function of transplanted islets is often compromised since a large number of β cells undergo apoptosis induced by stress and the immune rejection response elicited by the recipient after transplantation. Conventional treatment for islet transplantation is to administer immunosuppressive drugs to the recipient to suppress the immune rejection response mounted against transplanted islets. Induction of protective genes in the recipient (e.g., heme oxygenase-1 (HO-1, A20/tumor necrosis factor alpha inducible protein3 (tnfaip3, biliverdin reductase (BVR, Bcl2, and others or administration of one or more of the products of HO-1 to the donor, the islets themselves, and/or the recipient offers an alternative or synergistic approach to improve islet graft survival and function. In this perspective, we summarize studies describing the protective effects of these genes on islet survival and function in rodent allogeneic and xenogeneic transplantation models and the prevention of onset of diabetes, with emphasis on HO-1, A20, and BVR. Such approaches are also appealing to islet autotransplantation in patients with chronic pancreatitis after total pancreatectomy, a procedure that currently only leads to 1/3 of transplanted patients being diabetes-free.

  16. Functional analysis of mating type genes and transcriptome analysis during fruiting body development of botrytis cinerea

    NARCIS (Netherlands)

    Rodenburg, Sander Y.A.; Terhem, Razak B.; Veloso, Javier; Stassen, Joost H.M.; Kan, van Jan A.L.

    2018-01-01

    Botrytis cinerea is a plant-pathogenic fungus producing apothecia as sexual fruiting bodies. To study the function of mating type (MAT) genes, single-gene deletion mutants were generated in both genes of the MAT1-1 locus and both genes of the MAT1-2 locus. Deletion mutants in two MAT genes were

  17. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  18. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  19. An ontology-driven tool for structured data acquisition using Web forms.

    Science.gov (United States)

    Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I; Tierney, Michael J; Musen, Mark A

    2017-08-01

    Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web. We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries. We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.

  20. Linking MedDRA®-coded Clinical Phenotypes to Biological Mechanisms by The Ontology of Adverse Events: A pilot study on Tyrosine Kinase Inhibitors (TKIs)

    Science.gov (United States)

    Sarntivijai, Sirarat; Zhang, Shelley; Jagannathan, Desikan G.; Zaman, Shadia; Burkhart, Keith K.; Omenn, Gilbert S.; He, Yongqun; Athey, Brian D.; Abernethy, Darrell R.

    2016-01-01

    Introduction A translational bioinformatics challenge lies in connecting population and individual’s clinical phenotypes in various formats to biological mechanisms. The Medical Dictionary for Regulatory Activities (MedDRA®) is the default dictionary for Adverse Event (AE) reporting in the FDA Adverse Event Reporting System (FAERS). The Ontology of Adverse Events (OAE) represents AEs as pathological processes occurring after drug exposures. Objectives The aim is to establish a semantic framework to link biological mechanisms to phenotypes of AEs by combining OAE with MedDRA® in FAERS data analysis. We investigated the AEs associated with Tyrosine Kinase Inhibitors (TKIs) and monoclonal antibodies (mAbs) targeting tyrosine kinases. The selected 5 TKIs/mAbs (i.e., dasatinib, imatinib, lapatinib, cetuximab, and trastuzumab) are known to induce impaired ventricular function (non-QT) cardiotoxicity. Results Statistical analysis of FAERS data identified 1,053 distinct MedDRA® terms significantly associated with TKIs/mAbs, where 884 did not have corresponding OAE terms. We manually annotated these terms, added them to OAE by the standard OAE development strategy, and mapped them to MedDRA®. The data integration to provide insights into molecular mechanisms for drug-associated AEs is performed by including linkages in OAE for all related AE terms to MedDRA® and existing ontologies including Human Phenotype Ontology (HP), Uber Anatomy Ontology (UBERON), and Gene Ontology (GO). Sixteen AEs are shared by all 5 TKIs/mAbs, and each of 17 cardiotoxicity AEs was associated with at least one TKI/mAb. As an example, we analyzed ‘cardiac failure’ using the relations established in OAE with other ontologies, and demonstrated that one of the biological processes associated with cardiac failure maps to the genes associated with heart contraction. Conclusion By expanding existing OAE ontological design, our TKI use case demonstrates that the combination of OAE and Med

  1. Linking MedDRA(®)-Coded Clinical Phenotypes to Biological Mechanisms by the Ontology of Adverse Events: A Pilot Study on Tyrosine Kinase Inhibitors.

    Science.gov (United States)

    Sarntivijai, Sirarat; Zhang, Shelley; Jagannathan, Desikan G; Zaman, Shadia; Burkhart, Keith K; Omenn, Gilbert S; He, Yongqun; Athey, Brian D; Abernethy, Darrell R

    2016-07-01

    A translational bioinformatics challenge exists in connecting population and individual clinical phenotypes in various formats to biological mechanisms. The Medical Dictionary for Regulatory Activities (MedDRA(®)) is the default dictionary for adverse event (AE) reporting in the US Food and Drug Administration Adverse Event Reporting System (FAERS). The ontology of adverse events (OAE) represents AEs as pathological processes occurring after drug exposures. The aim of this work was to establish a semantic framework to link biological mechanisms to phenotypes of AEs by combining OAE with MedDRA(®) in FAERS data analysis. We investigated the AEs associated with tyrosine kinase inhibitors (TKIs) and monoclonal antibodies (mAbs) targeting tyrosine kinases. The five selected TKIs/mAbs (i.e., dasatinib, imatinib, lapatinib, cetuximab, and trastuzumab) are known to induce impaired ventricular function (non-QT) cardiotoxicity. Statistical analysis of FAERS data identified 1053 distinct MedDRA(®) terms significantly associated with TKIs/mAbs, where 884 did not have corresponding OAE terms. We manually annotated these terms, added them to OAE by the standard OAE development strategy, and mapped them to MedDRA(®). The data integration to provide insights into molecular mechanisms of drug-associated AEs was performed by including linkages in OAE for all related AE terms to MedDRA(®) and the existing ontologies, including the human phenotype ontology (HP), Uber anatomy ontology (UBERON), and gene ontology (GO). Sixteen AEs were shared by all five TKIs/mAbs, and each of 17 cardiotoxicity AEs was associated with at least one TKI/mAb. As an example, we analyzed "cardiac failure" using the relations established in OAE with other ontologies and demonstrated that one of the biological processes associated with cardiac failure maps to the genes associated with heart contraction. By expanding the existing OAE ontological design, our TKI use case demonstrated that the combination

  2. A novel algorithm for fully automated mapping of geospatial ontologies

    Science.gov (United States)

    Chaabane, Sana; Jaziri, Wassim

    2018-01-01

    Geospatial information is collected from different sources thus making spatial ontologies, built for the same geographic domain, heterogeneous; therefore, different and heterogeneous conceptualizations may coexist. Ontology integrating helps creating a common repository of the geospatial ontology and allows removing the heterogeneities between the existing ontologies. Ontology mapping is a process used in ontologies integrating and consists in finding correspondences between the source ontologies. This paper deals with the "mapping" process of geospatial ontologies which consist in applying an automated algorithm in finding the correspondences between concepts referring to the definitions of matching relationships. The proposed algorithm called "geographic ontologies mapping algorithm" defines three types of mapping: semantic, topological and spatial.

  3. Worm Phenotype Ontology: Integrating phenotype data within and beyond the C. elegans community

    Directory of Open Access Journals (Sweden)

    Yook Karen

    2011-01-01

    Full Text Available Abstract Background Caenorhabditis elegans gene-based phenotype information dates back to the 1970's, beginning with Sydney Brenner and the characterization of behavioral and morphological mutant alleles via classical genetics in order to understand nervous system function. Since then C. elegans has become an important genetic model system for the study of basic biological and biomedical principles, largely through the use of phenotype analysis. Because of the growth of C. elegans as a genetically tractable model organism and the development of large-scale analyses, there has been a significant increase of phenotype data that needs to be managed and made accessible to the research community. To do so, a standardized vocabulary is necessary to integrate phenotype data from diverse sources, permit integration with other data types and render the data in a computable form. Results We describe a hierarchically structured, controlled vocabulary of terms that can be used to standardize phenotype descriptions in C. elegans, namely the Worm Phenotype Ontology (WPO. The WPO is currently comprised of 1,880 phenotype terms, 74% of which have been used in the annotation of phenotypes associated with greater than 18,000 C. elegans genes. The scope of the WPO is not exclusively limited to C. elegans biology, rather it is devised to also incorporate phenotypes observed in related nematode species. We have enriched the value of the WPO by integrating it with other ontologies, thereby increasing the accessibility of worm phenotypes to non-nematode biologists. We are actively developing the WPO to continue to fulfill the evolving needs of the scientific community and hope to engage researchers in this crucial endeavor. Conclusions We provide a phenotype ontology (WPO that will help to facilitate data retrieval, and cross-species comparisons within the nematode community. In the larger scientific community, the WPO will permit data integration, and

  4. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013

    Science.gov (United States)

    Hastings, Janna; de Matos, Paula; Dekker, Adriano; Ennis, Marcus; Harsha, Bhavana; Kale, Namrata; Muthukrishnan, Venkatesh; Owen, Gareth; Turner, Steve; Williams, Mark; Steinbeck, Christoph

    2013-01-01

    ChEBI (http://www.ebi.ac.uk/chebi) is a database and ontology of chemical entities of biological interest. Over the past few years, ChEBI has continued to grow steadily in content, and has added several new features. In addition to incorporating all user-requested compounds, our annotation efforts have emphasized immunology, natural products and metabolites in many species. All database entries are now ‘is_a’ classified within the ontology, meaning that all of the chemicals are available to semantic reasoning tools that harness the classification hierarchy. We have completely aligned the ontology with the Open Biomedical Ontologies (OBO) Foundry-recommended upper level Basic Formal Ontology. Furthermore, we have aligned our chemical classification with the classification of chemical-involving processes in the Gene Ontology (GO), and as a result of this effort, the majority of chemical-involving processes in GO are now defined in terms of the ChEBI entities that participate in them. This effort necessitated incorporating many additional biologically relevant compounds. We have incorporated additional data types including reference citations, and the species and component for metabolites. Finally, our website and web services have had several enhancements, most notably the provision of a dynamic new interactive graph-based ontology visualization. PMID:23180789

  5. Database Concepts in a Domain Ontology

    Directory of Open Access Journals (Sweden)

    Gorskis Henrihs

    2017-12-01

    Full Text Available There are multiple approaches for mapping from a domain ontology to a database in the task of ontology-based data access. For that purpose, external mapping documents are most commonly used. These documents describe how the data necessary for the description of ontology individuals and other values, are to be obtained from the database. The present paper investigates the use of special database concepts. These concepts are not separated from the domain ontology; they are mixed with domain concepts to form a combined application ontology. By creating natural relationships between database concepts and domain concepts, mapping can be implemented more easily and with a specific purpose. The paper also investigates how the use of such database concepts in addition to domain concepts impacts ontology building and data retrieval.

  6. Gene-environment interaction and male reproductive function

    DEFF Research Database (Denmark)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L

    2010-01-01

    As genetic factors can hardly explain the changes taking place during short time spans, environmental and lifestyle-related factors have been suggested as the causes of time-related deterioration of male reproductive function. However, considering the strong heterogeneity of male fecundity between...... and within populations, genetic variants might be important determinants of the individual susceptibility to the adverse effects of environment or lifestyle. Although the possible mechanisms of such interplay in relation to the reproductive system are largely unknown, some recent studies have indicated...... that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism...

  7. Functional and evolutionary correlates of gene constellations in the Drosophila melanogaster genome that deviate from the stereotypical gene architecture.

    Science.gov (United States)

    Li, Shuwei; Shih, Ching-Hua; Kohn, Michael H

    2010-05-24

    The biological dimensions of genes are manifold. These include genomic properties, (e.g., X/autosomal linkage, recombination) and functional properties (e.g., expression level, tissue specificity). Multiple properties, each generally of subtle influence individually, may affect the evolution of genes or merely be (auto-)correlates. Results of multidimensional analyses may reveal the relative importance of these properties on the evolution of genes, and therefore help evaluate whether these properties should be considered during analyses. While numerous properties are now considered during studies, most work still assumes the stereotypical solitary gene as commonly depicted in textbooks. Here, we investigate the Drosophila melanogaster genome to determine whether deviations from the stereotypical gene architecture correlate with other properties of genes. Deviations from the stereotypical gene architecture were classified as the following gene constellations: Overlapping genes were defined as those that overlap in the 5-prime, exonic, or intronic regions. Chromatin co-clustering genes were defined as genes that co-clustered within 20 kb of transcriptional territories. If this scheme is applied the stereotypical gene emerges as a rare occurrence (7.5%), slightly varied schemes yielded between approximately 1%-50%. Moreover, when following our scheme, paired-overlapping genes and chromatin co-clustering genes accounted for 50.1 and 42.4% of the genes analyzed, respectively. Gene constellation was a correlate of a number of functional and evolutionary properties of genes, but its statistical effect was approximately 1-2 orders of magnitude lower than the effects of recombination, chromosome linkage and protein function. Analysis of datasets on male reproductive proteins showed these were biased in their representation of gene constellations and evolutionary rate Ka/Ks estimates, but these biases did not overwhelm the biologically meaningful observation of high

  8. Functional and evolutionary correlates of gene constellations in the Drosophila melanogaster genome that deviate from the stereotypical gene architecture

    Directory of Open Access Journals (Sweden)

    Kohn Michael H

    2010-05-01

    Full Text Available Abstract Background The biological dimensions of genes are manifold. These include genomic properties, (e.g., X/autosomal linkage, recombination and functional properties (e.g., expression level, tissue specificity. Multiple properties, each generally of subtle influence individually, may affect the evolution of genes or merely be (auto-correlates. Results of multidimensional analyses may reveal the relative importance of these properties on the evolution of genes, and therefore help evaluate whether these properties should be considered during analyses. While numerous properties are now considered during studies, most work still assumes the stereotypical solitary gene as commonly depicted in textbooks. Here, we investigate the Drosophila melanogaster genome to determine whether deviations from the stereotypical gene architecture correlate with other properties of genes. Results Deviations from the stereotypical gene architecture were classified as the following gene constellations: Overlapping genes were defined as those that overlap in the 5-prime, exonic, or intronic regions. Chromatin co-clustering genes were defined as genes that co-clustered within 20 kb of transcriptional territories. If this scheme is applied the stereotypical gene emerges as a rare occurrence (7.5%, slightly varied schemes yielded between ~1%-50%. Moreover, when following our scheme, paired-overlapping genes and chromatin co-clustering genes accounted for 50.1 and 42.4% of the genes analyzed, respectively. Gene constellation was a correlate of a number of functional and evolutionary properties of genes, but its statistical effect was ~1-2 orders of magnitude lower than the effects of recombination, chromosome linkage and protein function. Analysis of datasets on male reproductive proteins showed these were biased in their representation of gene constellations and evolutionary rate Ka/Ks estimates, but these biases did not overwhelm the biologically meaningful

  9. Functional bias in molecular evolution rate of Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Anandakrishnan Ramu

    2010-05-01

    Full Text Available Abstract Background Characteristics derived from mutation and other mechanisms that are advantageous for survival are often preserved during evolution by natural selection. Some genes are conserved in many organisms because they are responsible for fundamental biological function, others are conserved for their unique functional characteristics. Therefore one would expect the rate of molecular evolution for individual genes to be dependent on their biological function. Whether this expectation holds for genes duplicated by whole genome duplication is not known. Results We empirically demonstrate here, using duplicated genes generated from the Arabidopsis thaliana α-duplication event, that the rate of molecular evolution of genes duplicated in this event depend on biological function. Using functional clustering based on gene ontology annotation of gene pairs, we show that some duplicated genes, such as defense response genes, are under weaker purifying selection or under stronger diversifying selection than other duplicated genes, such as protein translation genes, as measured by the ratio of nonsynonymous to synonymous divergence (dN/dS. Conclusions These results provide empirical evidence indicating that molecular evolution rate for genes duplicated in whole genome duplication, as measured by dN/dS, may depend on biological function, which we characterize using gene ontology annotation. Furthermore, the general approach used here provides a framework for comparative analysis of molecular evolution rate for genes based on their biological function.

  10. Overview of methodologies for building ontologies

    OpenAIRE

    Fernández-López, M.

    1999-01-01

    A few research groups are now proposing a series of steps and methodologies for developing ontologies. However, mainly due to the fact that Ontological Engineering is still a relatively immature discipline, each work group employs its own methodology. Our goal is to present the most representative methodologies used in ontology development and to perform an analysis of such methodologies against the same framework of reference. So, the goal of this paper is not to provide new insights about m...

  11. Dataset Curation through Renders and Ontology Matching

    Science.gov (United States)

    2015-09-01

    Dataset Curation through Renders and Ontology Matching Yair Movshovitz-Attias CMU-CS-15-119 September 2015 School of Computer Science Computer...REPORT TYPE 3. DATES COVERED 00-00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Dataset Curation through Renders and Ontology Matching 5a...mapped to an ontology of geographical entities, we are able to extract multiple relevant labels per image. For the viewpoint estimation problem, by

  12. Tools of knowledge representation: Thesauri versus ontologies

    Directory of Open Access Journals (Sweden)

    Antonio García Jiménez

    2004-01-01

    Full Text Available The ontologies as valid tools of knowledge representation are analysed, by means of the presentation of different aspects that conform this emergent reality. Below, one of the most relevant goals in this paper is to connect ontologies with thesaurus, in order to determine their features in common, their differences and the possibilities of conversion. Finally, from viewpoint of Library and Information Science, the future implications because of generalization of the ontologies are presented

  13. Versioning System for Distributed Ontology Development

    Science.gov (United States)

    2016-03-15

    E. Jiménez-Ruiz, B. Cuenca Grau, Y. Zhou, and I. Horrocks, “Large-Scale Interactive Ontology Matching: Algorithms and Implementation,” in Proc. of...Horrocks, and R. Berlanga, “Supporting Concurrent Ontology Development: Framework, Algorithms and Tool,” Data & Knowledge Engineering, vol. 70, Issue 1...Distributed Ontology Development S.K. Damodaran 15 March 2016 This material is based on work supported by the Assistant Secretary of Defense for

  14. Cohesion Metrics for Ontology Design and Application

    OpenAIRE

    Haining Yao; Anthony M. Orme; Letha Etzkorn

    2005-01-01

    Recently, domain specific ontology development has been driven by research on the Semantic Web. Ontologies have been suggested for use in many application areas targeted by the Semantic Web, such as dynamic web service composition and general web service matching. Fundamental characteristics of these ontologies must be determined in order to effectively make use of them: for example, Sirin, Hendler and Parsia have suggested that determining fundamental characteristics...

  15. Semoogle - An Ontology Based Search Engine

    OpenAIRE

    Aghajani, Nooshin

    2012-01-01

    In this thesis, we present a prototype for search engine to show how such a semantic search application based on ontology techniques contributes to save time for user, and improve the quality of relevant search results compared to a traditional search engine. This system is built as a query improvement module, which uses ontology and sorts the results search based on four predefined categories. The first and important part of the implementation of search engine prototype is to apply ontology ...

  16. Applications and Uses of Dental Ontologies

    OpenAIRE

    Smart, Paul R.; Sadraie, Marjan

    2012-01-01

    The development of a number of large-scale semantically-rich ontologies for biomedicine attests to the interest of life science researchers and clinicians in Semantic Web technologies. To date, however, the dental profession has lagged behind other areas of biomedicine in developing a commonly accepted, standardized ontology to support the representation of dental knowledge and information. This paper attempts to identify some of the potential uses of dental ontologies as part of an effort to...

  17. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants

    KAUST Repository

    Hoehndorf, Robert

    2016-11-14

    Background The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. Results We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. Conclusions The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established

  18. Gene-environment interaction and male reproductive function

    Science.gov (United States)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L.; Rylander, Lars; Giwercman, Aleksander

    2010-01-01

    As genetic factors can hardly explain the changes taking place during short time spans, environmental and lifestyle-related factors have been suggested as the causes of time-related deterioration of male reproductive function. However, considering the strong heterogeneity of male fecundity between and within populations, genetic variants might be important determinants of the individual susceptibility to the adverse effects of environment or lifestyle. Although the possible mechanisms of such interplay in relation to the reproductive system are largely unknown, some recent studies have indicated that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism, but the number of studies is still limited. This type of interaction studies may improve our understanding of normal physiology and help us to identify the risk factors to male reproductive malfunction. We also shortly discuss other aspects of gene-environment interaction specifically associated with the issue of reproduction, namely environmental and lifestyle factors as the cause of sperm DNA damage. It remains to be investigated to what extent such genetic changes, by natural conception or through the use of assisted reproductive techniques, are transmitted to the next generation, thereby causing increased morbidity in the offspring. PMID:20348940

  19. Genes implicated in serotonergic and dopaminergic functioning predict BMI categories.

    Science.gov (United States)

    Fuemmeler, Bernard F; Agurs-Collins, Tanya D; McClernon, F Joseph; Kollins, Scott H; Kail, Melanie E; Bergen, Andrew W; Ashley-Koch, Allison E

    2008-02-01

    This study addressed the hypothesis that variation in genes associated with dopamine function (SLC6A3, DRD2, DRD4), serotonin function (SLC6A4, and regulation of monoamine levels (MAOA) may be predictive of BMI categories (obese and overweight + obese) in young adulthood and of changes in BMI as adolescents transition into young adulthood. Interactions with gender and race/ethnicity were also examined. Participants were a subsample of individuals from the National Longitudinal Study of Adolescent Health (Add Health), a nationally representative sample of adolescents followed from 1995 to 2002. The sample analyzed included a subset of 1,584 unrelated individuals with genotype data. Multiple logistic regressions were conducted to evaluate the associations between genotypes and obesity (BMI > 29.9) or overweight + obese combined (BMI > or = 25) with normal weight (BMI = 18.5-24.9) as a referent. Linear regression models were used to examine change in BMI from adolescence to young adulthood. Significant associations were found between SLC6A4 5HTTLPR and categories of BMI, and between MAOA promoter variable number tandem repeat (VNTR) among men and categories of BMI. Stratified analyses revealed that the association between these two genes and excess BMI was significant for men overall and for white and Hispanic men specifically. Linear regression models indicated a significant effect of SLC6A4 5HTTLPR on change in BMI from adolescence to young adulthood. Our findings lend further support to the involvement of genes implicated in dopamine and serotonin regulation on energy balance.

  20. Towards Process-Ontology: A Critical Study of Substance-Ontological Premises

    DEFF Research Database (Denmark)

    Seibt, Johanna

    The thesis proposes therapeutic revision of fundamental assumptions in contemporary ontological thought. I show that non of the prevalent theories of objects, by virtue of certain implicit substance-ontological assumptions provides a viable account of the numerical, qualitative, and trans-tempora......-ontological presuppositions, I finally explore the result of rejecting all of them and sketch a scheme basic on dynamic masses which promises to yield coherent explanation of the ontological features of those complex processes that we commonly call objects....