WorldWideScience

Sample records for gene ontology terms

  1. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  2. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  3. Gene Ontology

    Directory of Open Access Journals (Sweden)

    Gaston K. Mazandu

    2012-01-01

    Full Text Available The wide coverage and biological relevance of the Gene Ontology (GO, confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.

  4. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    OpenAIRE

    Tsatsoulis Costas; Amthauer Heather A

    2010-01-01

    Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces ce...

  5. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

    Science.gov (United States)

    Lewin, Alex; Grieve, Ian C

    2006-10-03

    Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  6. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Grieve Ian C

    2006-10-01

    Full Text Available Abstract Background Gene Ontology (GO terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. Results We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Conclusion Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  7. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  8. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning.

    Science.gov (United States)

    Amthauer, Heather A; Tsatsoulis, Costas

    2010-05-28

    There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  9. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  10. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  11. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  12. Correlating information contents of gene ontology terms to infer semantic similarity of gene products.

    Science.gov (United States)

    Gan, Mingxin

    2014-01-01

    Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  13. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  14. Functional discrimination of gene expression patterns in terms of the gene ontology.

    Science.gov (United States)

    Badea, Liviu

    2003-01-01

    The ever-growing amount of experimental data in molecular biology and genetics requires its automated analysis, by employing sophisticated knowledge discovery tools. We use an Inductive Logic Programming (ILP) learner to induce functional discrimination rules between genes studied using microarrays and found to be differentially expressed in three recently discovered subtypes of adenocarcinoma of the lung. The discrimination rules involve functional annotations from the Proteome HumanPSD database in terms of the Gene Ontology, whose hierarchical structure is essential for this task. While most of the lower levels of gene expression data (pre)processing have been automated, our work can be seen as a step toward automating the higher level functional analysis of the data. We view our application not just as a prototypical example of applying more sophisticated machine learning techniques to the functional analysis of genes, but also as an incentive for developing increasingly more sophisticated functional annotations and ontologies, that can be automatically processed by such learning algorithms.

  15. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Science.gov (United States)

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction.

  16. OTO: Ontology Term Organizer.

    Science.gov (United States)

    Huang, Fengqiong; Macklin, James A; Cui, Hong; Cole, Heather A; Endara, Lorena

    2015-02-15

    The need to create controlled vocabularies such as ontologies for knowledge organization and access has been widely recognized in various domains. Despite the indispensable need of thorough domain knowledge in ontology construction, most software tools for ontology construction are designed for knowledge engineers and not for domain experts to use. The differences in the opinions of different domain experts and in the terminology usages in source literature are rarely addressed by existing software. OTO software was developed based on the Agile principles. Through iterations of software release and user feedback, new features are added and existing features modified to make the tool more intuitive and efficient to use for small and large data sets. The software is open source and built in Java. Ontology Term Organizer (OTO; http://biosemantics.arizona.edu/OTO/ ) is a user-friendly, web-based, consensus-promoting, open source application for organizing domain terms by dragging and dropping terms to appropriate locations. The application is designed for users with specific domain knowledge such as biology but not in-depth ontology construction skills. Specifically OTO can be used to establish is_a, part_of, synonym, and order relationships among terms in any domain that reflects the terminology usage in source literature and based on multiple experts' opinions. The organized terms may be fed into formal ontologies to boost their coverage. All datasets organized on OTO are publicly available. OTO has been used to organize the terms extracted from thirty volumes of Flora of North America and Flora of China combined, in addition to some smaller datasets of different taxon groups. User feedback indicates that the tool is efficient and user friendly. Being open source software, the application can be modified to fit varied term organization needs for different domains.

  17. Gene Ontology Consortium: going forward.

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Gene Ontology Consortium: going forward

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. PMID:25428369

  19. A measure of semantic similarity between gene ontology terms based on semantic pathway covering

    Institute of Scientific and Technical Information of China (English)

    LI Rong; CAO Shunliang; LI Yuanyuan; TAN Hao; ZHU Yangyong; ZHONG Yang; LI Yixue

    2006-01-01

    Semantic similarity between Gene Ontology (GO) terms is critical in resolving semantic heterogeneousness when integrating heterogeneous biological databases. Traditionally, distance based and information content based measures are two major methods.In this paper, a new method based on semantic pathway covering is proposed and an algorithm, COMBINE algorithm, is presented,which considers information contents of two given nodes and those of all nodes included in the two nodes' pathways. Experiments show that COMBINE algorithm obtains the highest correlation index compared with those distance based and information content based algorithms.

  20. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  1. An Ontology of Gene

    OpenAIRE

    Masuya, Hiroshi; Mizoguchi, Riichiro

    2012-01-01

    The concept of a gene was established in the era of classical genetics and is now essential for life science for elucidating the molecular basis of the coding of genetic information necessary to realize the body of an organism and its biological functions. However, an ontology fully representing multiple aspects of a gene is still not available. In this study, we dissected the biological and ontological definitions of bearers of genetic information, including genes and alleles. Based on this ...

  2. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  3. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.

    Science.gov (United States)

    Xing, Zhihao; Chu, Chen; Chen, Lei; Kong, Xiangyin

    2016-11-01

    Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor

  4. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

    Science.gov (United States)

    Zhang, Shu-Bo; Lai, Jian-Huang

    2016-07-15

    Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. How the gene ontology evolves.

    Science.gov (United States)

    Leonelli, Sabina; Diehl, Alexander D; Christie, Karen R; Harris, Midori A; Lomax, Jane

    2011-08-05

    Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource. Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms. This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.

  6. Extending the Interpretation of Gene Profiling Microarray Experiments to Pathway Analysis Through the Use of Gene Ontology Terms

    Science.gov (United States)

    Chatziioannou, Aristotelis; Moulos, Panagiotis

    Microarray technology allows the survey of gene expression at a global level by measuring mRNA abundance. However, the grand complexity characterizing a microarray experiment entails the development of computationally powerful tools apt for probing the biological problem studied. Here we propose a suite for flexible, adaptable to a wide range of possible needs of the biological end-user, data-driven interpretation of microarray experiments. The suite is implemented in MATLAB and is making use of two modules, able to perform all steps of typical microarray data analysis starting from data standardization and normalization up to statistical selection and pathway analysis utilizing Gene Ontology Term annotations for the species genomes interrogated, whereas due to its modular structure it is scalable thus enabling the incorporation or its seamless assembly with other existing tools.

  7. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  8. The Ontology of the Gene Ontology

    Science.gov (United States)

    Smith, Barry; Williams, Jennifer; Steffen, Schulze-Kremer

    2003-01-01

    The rapidly increasing wealth of genomic data has driven the development of tools to assist in the task of representing and processing information about genes, their products and their functions. One of the most important of these tools is the Gene Ontology (GO), which is being developed in tandem with work on a variety of bioinformatics databases. An examination of the structure of GO, however, reveals a number of problems, which we believe can be resolved by taking account of certain organizing principles drawn from philosophical ontology. We shall explore the results of applying such principles to GO with a view to improving GO’s consistency and coherence and thus its future applicability in the automated processing of biological data. PMID:14728245

  9. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  10. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome.

    Science.gov (United States)

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2016-01-04

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈ 7-10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate and link these data. Our group, for the first time, has compiled PCOS-related information available through scientific literature; cross-linked it with molecular, biochemical and clinical databases and presented it as a user-friendly, web-based online knowledgebase for the benefit of the scientific and clinical community. Manually curated information on associated genes, single nucleotide polymorphisms, diseases, gene ontology terms and pathways along with supporting reference literature has been collated and included in PCOSKB (http://pcoskb.bicnirrh.res.in).

  11. Quality control for terms and definitions in ontologies and taxonomies

    Directory of Open Access Journals (Sweden)

    Rüegg Alexander

    2006-04-01

    Full Text Available Abstract Background Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO, the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. Results We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. Conclusion Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.

  12. Sample ontology, GOstat and ontology term enrichment - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us FANTOM5 Sample ontology, GOstat and ontology term enrichment Data detail Data name Sample on...tology, GOstat and ontology term enrichment DOI 10.18908/lsdba.nbdc01389-006.V002 Version V2 10.18908/lsdba....t Us Sample ontology, GOstat and ontology term enrichment - FANTOM5 | LSDB Archive ...

  13. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community. PMID:24093723

  14. Improvements to cardiovascular gene ontology.

    Science.gov (United States)

    Lovering, Ruth C; Dimmer, Emily C; Talmud, Philippa J

    2009-07-01

    Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of genes and gene products in any organism. Although one might initially wonder what relevance a 'controlled vocabulary' might have for cardiovascular science, such a resource is proving highly useful for researchers investigating complex cardiovascular disease phenotypes as well as those interpreting results from high-throughput methodologies. GO enables the current functional knowledge of individual genes to be used to annotate genomic or proteomic datasets. In this way, the GO data provides a very effective way of linking biological knowledge with the analysis of the large datasets of post-genomics research. Consequently, users of high-throughput methodologies such as expression arrays or proteomics will be the main beneficiaries of such annotation sets. However, as GO annotations increase in quality and quantity, groups using small-scale approaches will gradually begin to benefit too. For example, genome wide association scans for coronary heart disease are identifying novel genes, with previously unknown connections to cardiovascular processes, and the comprehensive annotation of these novel genes might provide clues to their cardiovascular link. At least 4000 genes, to date, have been implicated in cardiovascular processes and an initiative is underway to focus on annotating these genes for the benefit of the cardiovascular community. In this article we review the current uses of Gene Ontology annotation to highlight why Gene Ontology should be of interest to all those involved in cardiovascular research.

  15. SEMANTIC TERM BASED INFORMATION RETRIEVAL USING ONTOLOGY

    Directory of Open Access Journals (Sweden)

    J. Mannar Mannan

    2014-01-01

    Full Text Available Information Searching and retrieval is a challenging task in the traditional keyword based textual information retrieval system. In the growing information age, adding huge data every day the searching problem also augmented. Keyword based retrieval system returns bulk of junk document irrelevant to query. To address the limitations, this paper proposed query terms along with semantic terms for information retrieval using multiple ontology reference. User query sometimes reflects multiple domain of interest that persist us to collect semantically related ontologies. If no related ontology exists then WordNet ontology used to retrieve semantic terms related to query term. In this approach, classes on the ontology derived as semantic related text keywords, these keywords considered for rank the documents.

  16. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

    Directory of Open Access Journals (Sweden)

    Xiaomei Wu

    Full Text Available BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS, which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC. RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS. HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.

  17. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.

    Science.gov (United States)

    Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G

    2016-05-01

    Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Representing Kidney Development Using the Gene Ontology

    Science.gov (United States)

    Alam-Faruque, Yasmin; Hill, David P.; Dimmer, Emily C.; Harris, Midori A.; Foulger, Rebecca E.; Tweedie, Susan; Attrill, Helen; Howe, Douglas G.; Thomas, Stephen Randall; Davidson, Duncan; Woolf, Adrian S.; Blake, Judith A.; Mungall, Christopher J.; O’Donovan, Claire; Apweiler, Rolf; Huntley, Rachael P.

    2014-01-01

    Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease. PMID:24941002

  19. The Gene Ontology (GO) project in 2006

    National Research Council Canada - National Science Library

    2006-01-01

    The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net...

  20. The Gene Ontology project in 2008

    National Research Council Canada - National Science Library

    The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org...

  1. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL.

    Science.gov (United States)

    Jupp, Simon; Stevens, Robert; Hoehndorf, Robert

    2012-04-24

    Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of

  2. The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems

    Directory of Open Access Journals (Sweden)

    Rubin Eitan

    2010-09-01

    Full Text Available Abstract Background The Gene Ontology (GO is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis. Here, we propose a new approach to editing the gene ontology, clipping, which is the editing of GO according to biological relevance. Creation of a GO subset by clipping is achieved by removing terms (from all hierarchal levels if they are not functionally relevant to a given domain of interest. Terms that are located in levels higher to relevant terms are kept, thus, biologically irrelevant terms are only removed if they are not parental to terms that are relevant. Results Using this approach, we have created the Neural-Immune Gene Ontology (NIGO subset of GO directed for neurological and immunological systems. We tested the performance of NIGO in extracting knowledge from microarray experiments by conducting functional analysis and comparing the results to those obtained using the full GO and a generic GO slim. NIGO not only improved the statistical scores given to relevant terms, but was also able to retrieve functionally relevant terms that did not pass statistical cutoffs when using the full GO or the slim subset. Conclusions Our results validate the pipeline used to generate NIGO, suggesting it is indeed enriched with terms that are specific to the neural/immune domains. The results suggest that NIGO can enhance the analysis of microarray experiments involving neural and/or immune related systems. They also directly demonstrate the potential such a domain-specific GO has in generating meaningful hypotheses.

  3. Cross-Ontology multi-level association rule mining in the Gene Ontology.

    Directory of Open Access Journals (Sweden)

    Prashanti Manda

    Full Text Available The Gene Ontology (GO has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  4. Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns.

    Science.gov (United States)

    Xiang, Zuoshuang; Zheng, Jie; Lin, Yu; He, Yongqun

    2015-01-01

    It is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers proposed a Quick Term Templates (QTTs) process targeted at generating new ontology classes following the same pattern, using term templates in a spreadsheet format. Inspired by the ODPs and QTTs, the Ontorat web application is developed to automatically generate new ontology terms, annotations of terms, and logical axioms based on a specific ODP(s). The inputs of an Ontorat execution include axiom expression settings, an input data file, ID generation settings, and a target ontology (optional). The axiom expression settings can be saved as a predesigned Ontorat setting format text file for reuse. The input data file is generated based on a template file created by a specific ODP (text or Excel format). Ontorat is an efficient tool for ontology expansion. Different use cases are described. For example, Ontorat was applied to automatically generate over 1,000 Japan RIKEN cell line cell terms with both logical axioms and rich annotation axioms in the Cell Line Ontology (CLO). Approximately 800 licensed animal vaccines were represented and annotated in the Vaccine Ontology (VO) by Ontorat. The OBI team used Ontorat to add assay and device terms required by ENCODE project. Ontorat was also used to add missing annotations to all existing Biobank specific terms in the Biobank Ontology. A collection of ODPs and templates with examples are provided on the Ontorat website and can be reused to facilitate ontology development. With ever increasing ontology development and applications, Ontorat provides a timely platform for generating and annotating a large number of ontology terms by following

  5. SEMANTIC TERM BASED INFORMATION RETRIEVAL USING ONTOLOGY

    OpenAIRE

    2014-01-01

    Information Searching and retrieval is a challenging task in the traditional keyword based textual information retrieval system. In the growing information age, adding huge data every day the searching problem also augmented. Keyword based retrieval system returns bulk of junk document irrelevant to query. To address the limitations, this paper proposed query terms along with semantic terms for information retrieval using multiple ontology reference. User query sometimes reflects multiple ...

  6. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    Science.gov (United States)

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Practical Applications of the Gene Ontology Resource

    Science.gov (United States)

    Huntley, Rachael P.; Dimmer, Emily C.; Apweiler, Rolf

    The Gene Ontology (GO) is a controlled vocabulary that represents knowledge about the functional attributes of gene products in a structured manner and can be used in both computational and human analyses. This vocabulary has been used by diverse curation groups to associate functional information to individual gene products in the form of annotations. GO has proven an invaluable resource for evaluating and interpreting the biological significance of large data sets, enabling researchers to create hypotheses to direct their future research. This chapter provides an overview of the Gene Ontology, how it can be used, and tips on getting the most out of GO analyses.

  8. OntologyWidget – a reusable, embeddable widget for easily locating ontology terms

    Directory of Open Access Journals (Sweden)

    Skene JH Pate

    2007-09-01

    Full Text Available Abstract Background Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form. Results We have produced a tool, OntologyWidget, which allows users to rapidly search for and browse ontology terms. OntologyWidget can easily be embedded in other web-based applications. OntologyWidget is written using AJAX (Asynchronous JavaScript and XML and has two related elements. The first is a dynamic auto-complete ontology search feature. As a user enters characters into the search box, the appropriate ontology is queried remotely for terms that match the typed-in text, and the query results populate a drop-down list with all potential matches. Upon selection of a term from the list, the user can locate this term within a generic and dynamic ontology browser, which comprises the second element of the tool. The ontology browser shows the paths from a selected term to the root as well as parent/child tree hierarchies. We have implemented web services at the Stanford Microarray Database (SMD, which provide the OntologyWidget with access to over 40 ontologies from the Open Biological Ontology (OBO website 1. Each ontology is updated weekly. Adopters of the OntologyWidget can either use SMD's web services, or elect to rely on their own. Deploying the OntologyWidget can be accomplished in three simple steps: (1 install Apache Tomcat 2 on one's web server, (2 download and install the OntologyWidget servlet stub that provides access to the SMD ontology web services, and (3 create an html (HyperText Markup Language file that refers to the OntologyWidget using a simple, well-defined format. Conclusion We have developed Ontology

  9. Quality assurance of the gene ontology using abstraction networks.

    Science.gov (United States)

    Ochs, Christopher; Perl, Yehoshua; Halper, Michael; Geller, James; Lomax, Jane

    2016-06-01

    The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.

  10. Automatic, context-specific generation of Gene Ontology slims

    Directory of Open Access Journals (Sweden)

    Sehgal Muhammad

    2010-10-01

    Full Text Available Abstract Background The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual. Results Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power. Conclusions Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.

  11. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Gene-based and semantic structure of the Gene Ontology as a complex network

    Science.gov (United States)

    Coronnello, Claudia; Tumminello, Michele; Miccichè, Salvatore

    2016-09-01

    The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The Gene Ontology (GO) is constantly evolving over time. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. Here we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium. Moreover, the GO is a natural example of bipartite network of terms and genes. Here we are interested in studying the properties of the projected network of terms, i.e. a gene-based weighted network of GO terms, in which a link between any two terms is set if at least one gene is annotated in both terms. One aim of the present paper is to compare the structural properties of the semantic and the gene-based network. The relative importance of terms is very similar in the two networks, but the community structure changes. We show that in some cases GO terms that appear to be distinct from a semantic point of view are instead connected, and appear in the same community when considering their gene content. The identification of such gene-based communities of terms might therefore be the basis of a simple protocol aiming at improving the semantic structure of GO. Information about terms that share large gene content might also be important from a biomedical point of view, as it might reveal how genes over-expressed in a certain term also affect other biological processes, molecular functions and cellular components not directly linked according to GO semantics.

  13. A robust data-driven approach for gene ontology annotation

    OpenAIRE

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For th...

  14. Measuring the evolution of ontology complexity: the gene ontology case study.

    Science.gov (United States)

    Dameron, Olivier; Bettembourg, Charles; Le Meur, Nolwenn

    2013-01-01

    Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure. The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred. The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.

  15. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome

    OpenAIRE

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2015-01-01

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈7–10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate ...

  16. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. Conclusions The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl. PMID:23895341

  17. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

    Science.gov (United States)

    Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane

    2013-07-29

    The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

  18. [Key effect genes responding to nerve injury identified by gene ontology and computer pattern recognition].

    Science.gov (United States)

    Pan, Qian; Peng, Jin; Zhou, Xue; Yang, Hao; Zhang, Wei

    2012-07-01

    In order to screen out important genes from large gene data of gene microarray after nerve injury, we combine gene ontology (GO) method and computer pattern recognition technology to find key genes responding to nerve injury, and then verify one of these screened-out genes. Data mining and gene ontology analysis of gene chip data GSE26350 was carried out through MATLAB software. Cd44 was selected from screened-out key gene molecular spectrum by comparing genes' different GO terms and positions on score map of principal component. Function interferences were employed to influence the normal binding of Cd44 and one of its ligands, chondroitin sulfate C (CSC), to observe neurite extension. Gene ontology analysis showed that the first genes on score map (marked by red *) mainly distributed in molecular transducer activity, receptor activity, protein binding et al molecular function GO terms. Cd44 is one of six effector protein genes, and attracted us with its function diversity. After adding different reagents into the medium to interfere the normal binding of CSC and Cd44, varying-degree remissions of CSC's inhibition on neurite extension were observed. CSC can inhibit neurite extension through binding Cd44 on the neuron membrane. This verifies that important genes in given physiological processes can be identified by gene ontology analysis of gene chip data.

  19. Gene function prediction based on the Gene Ontology hierarchical structure.

    Science.gov (United States)

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  20. Integrating Gene Ontology and Blast to predict gene functions

    Institute of Scientific and Technical Information of China (English)

    WANG Cheng-gang; MO Zhi-hong

    2007-01-01

    A GoBlast system was built to predict gene function by integrating Blast search and Gene Ontology (GO) annotations together. The operation system was based on Debian Linux 3.1, with Apache as the web server and Mysql database as the data storage system. FASTA files with GO annotations were taken as the sequence source for blast alignment, which were formatted by wu-formatdb program. The GoBlast system includes three Bioperl modules in Perl: a data input module, a data process module and a data output module. A GoBlast query starts with an amino acid or nucleotide sequence. It ends with an output in an html page, presenting high scoring gene products which are of a high homology to the queried sequence and listing associated GO terms beside respective gene poducts. A simple click on a GO term leads to the detailed explanation of the specific gene function. This avails gene function prediction by Blast. GoBlast can be a very useful tool for functional genome research and is available for free at http://bioq.org/goblast.

  1. GOseek: a gene ontology search engine using enhanced keywords.

    Science.gov (United States)

    Taha, Kamal

    2013-01-01

    We propose in this paper a biological search engine called GOseek, which overcomes the limitation of current gene similarity tools. Given a set of genes, GOseek returns the most significant genes that are semantically related to the given genes. These returned genes are usually annotated to one of the Lowest Common Ancestors (LCA) of the Gene Ontology (GO) terms annotating the given genes. Most genes have several annotation GO terms. Therefore, there may be more than one LCA for the GO terms annotating the given genes. The LCA annotating the genes that are most semantically related to the given gene is the one that receives the most aggregate semantic contribution from the GO terms annotating the given genes. To identify this LCA, GOseek quantifies the contribution of the GO terms annotating the given genes to the semantics of their LCAs. That is, it encodes the semantic contribution into a numeric format. GOseek uses microarray experiment data to rank result genes based on their significance. We evaluated GOseek experimentally and compared it with a comparable gene prediction tool. Results showed marked improvement over the tool.

  2. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  3. Correlating Expression Data with Gene Function Using Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    LIU,Qi; DENG,Yong; WANG,Chuan; SHI,Tie-Liu; LI,Yi-Xue

    2006-01-01

    Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological conditions.However, whether function annotation by similarity metrics is reliable or not and to what extent the similarity in gene expression patterns is useful for annotation of gene functions, has not been evaluated. This paper made a comprehensive research on the correlation between the similarity of expression data and of gene functions using Gene Ontology. It has been found that although the similarity in expression patterns and the similarity in gene functions are significantly dependent on each other, this association is rather weak. In addition, among the three categories of Gene Ontology, the similarity of expression data is more useful for cellular component annotation than for biological process and molecular function. The results presented are interesting for the gene functions prediction research area.

  4. Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

    Science.gov (United States)

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  5. OAHG: an integrated resource for annotating human genes with multi-level ontologies

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-01-01

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e–16). PMID:27703231

  6. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  7. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-09-05

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  8. Semantic particularity measure for functional characterization of gene sets using gene ontology.

    Science.gov (United States)

    Bettembourg, Charles; Diot, Christian; Dameron, Olivier

    2014-01-01

    Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

  9. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    Science.gov (United States)

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  10. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  11. Combining Hierarchical and Associative Gene Ontology Relations with Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Riensche, Roderick M.; Beagley, Nathaniel; Baddeley, Bob L.; Tratz, Stephen C.; Gregory, Michelle L.

    2007-03-01

    Gene and gene product similarity is a fundamental diagnostic measure in analyzing biological data and constructing predictive models for functional genomics. With the rising influence of the Gene Ontology, two complementary approaches have emerged where the similarity between two genes or gene products is obtained by comparing Gene Ontology (GO) annotations associated with the genes or gene products. One approach captures GO-based similarity in terms of hierarchical relations within each gene subontology. The other approach identifies GO-based similarity in terms of associative relations across the three gene subontologies. We propose a novel methodology where the two approaches can be merged with ensuing benefits in coverage and accuracy, and demonstrate that further improvements can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  12. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network.

    Science.gov (United States)

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C; Mohanty, Bidyut K; Gao, Nan; Tang, Jijun; Lawson, Andrew B; Hannun, Yusuf A; Zheng, W Jim

    2014-10-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. A new gene ontology-based measure for the functional similarity of gene products

    Institute of Scientific and Technical Information of China (English)

    QI Guo-long; QIAN Shi-yu; FANG Ji-qian

    2013-01-01

    Background Although biomedical ontologies have standardized the representation of gene products across species and databases,a method for determining the functional similarities of gene products has not yet been developed.Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph.Our measure was compared with Resnik's measure in two applications,which were based on the association of the measure used with the gene co-expression and the proteinprotein interactions.Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions,and our measure performed the best overall.Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.

  14. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  15. A task-based approach for Gene Ontology evaluation.

    Science.gov (United States)

    Clarke, Erik L; Loguercio, Salvatore; Good, Benjamin M; Su, Andrew I

    2013-04-15

    The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.

  16. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  17. Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Tratz, Stephen C.; Gregory, Michelle L.

    2006-06-08

    With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  18. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  19. Integrating Information in Biological Ontologies and Molecular Networks to Infer Novel Terms

    Science.gov (United States)

    Li, Le; Yip, Kevin Y.

    2016-01-01

    Currently most terms and term-term relationships in Gene Ontology (GO) are defined manually, which creates cost, consistency and completeness issues. Recent studies have demonstrated the feasibility of inferring GO automatically from biological networks, which represents an important complementary approach to GO construction. These methods (NeXO and CliXO) are unsupervised, which means 1) they cannot use the information contained in existing GO, 2) the way they integrate biological networks may not optimize the accuracy, and 3) they are not customized to infer the three different sub-ontologies of GO. Here we present a semi-supervised method called Unicorn that extends these previous methods to tackle the three problems. Unicorn uses a sub-tree of an existing GO sub-ontology as training part to learn parameters in integrating multiple networks. Cross-validation results show that Unicorn reliably inferred the left-out parts of each specific GO sub-ontology. In addition, by training Unicorn with an old version of GO together with biological networks, it successfully re-discovered some terms and term-term relationships present only in a new version of GO. Unicorn also successfully inferred some novel terms that were not contained in GO but have biological meanings well-supported by the literature.Availability: Source code of Unicorn is available at http://yiplab.cse.cuhk.edu.hk/unicorn/. PMID:27976738

  20. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  1. The Representation of Heart Development in the Gene Ontology

    Science.gov (United States)

    Khodiyar, Varsha K.; Hill, David P.; Howe, Doug; Berardini, Tanya Z.; Tweedie, Susan; Talmud, Philippa J.; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C.

    2012-01-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. PMID:21419760

  2. The representation of heart development in the gene ontology.

    Science.gov (United States)

    Khodiyar, Varsha K; Hill, David P; Howe, Doug; Berardini, Tanya Z; Tweedie, Susan; Talmud, Philippa J; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C

    2011-06-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.

  3. Codon bias and gene ontology in holometabolous and hemimetabolous insects.

    Science.gov (United States)

    Carlini, David B; Makowski, Matthew

    2015-12-01

    The relationship between preferred codon use (PCU), developmental mode, and gene ontology (GO) was investigated in a sample of nine insect species with sequenced genomes. These species were selected to represent two distinct modes of insect development, holometabolism and hemimetabolism, with an aim toward determining whether the differences in developmental timing concomitant with developmental mode would be mirrored by differences in PCU in their developmental genes. We hypothesized that the developmental genes of holometabolous insects should be under greater selective pressure for efficient translation, manifest as increased PCU, than those of hemimetabolous insects because holometabolism requires abundant protein expression over shorter time intervals than hemimetabolism, where proteins are required more uniformly in time. Preferred codon sets were defined for each species, from which the frequency of PCU for each gene was obtained. Although there were substantial differences in the genomic base composition of holometabolous and hemimetabolous insects, both groups exhibited a general preference for GC-ending codons, with the former group having higher PCU averaged across all genes. For each species, the biological process GO term for each gene was assigned that of its Drosophila homolog(s), and PCU was calculated for each GO term category. The top two GO term categories for PCU enrichment in the holometabolous insects were anatomical structure development and cell differentiation. The increased PCU in the developmental genes of holometabolous insects may reflect a general strategy to maximize the protein production of genes expressed in bursts over short time periods, e.g., heat shock proteins. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 686-698, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  4. Ontology Mapping of Indian Medicinal Plants with Standardized Medical Terms

    Directory of Open Access Journals (Sweden)

    S. Waheeta Hopper

    2012-01-01

    Full Text Available Problem statement: World Wide Web (WWW consisting large volume of information related with medicinal plants. However health care recommendation with Indian Medicinal Plants becomes complicated because valuable Information about medicinal resources as plants is scattered, in text form and unstructured. Search engines are not quite efficient and require excessive manual processing. Therefore search becomes difficult for the ordinary users to find the medicinal uses of herbal plants from the web. And another problem is that the domain experts could not able to map the medicinal uses of herbal plants with the existing standardized medical terms. Mapping the existing ontology introduces the problem of finding the similarity between the terms and relationships. Finding the solution to perform automatic mapping is another major challenge to be solved. Approach: To address these issues we developed a Knowledge framework for the Indian Medicinal Plants (KIMP. Knowledge framework includes the ontology creation, user interface for querying the system. Jena is used to build semantic web applications with the ontology representation of Resource Description Framework (RDF and Web Ontology Language (OWL. SPARQL Protocol and RDF Query Language (SPARQL is used to retrieve various query patterns. Automated mapping is achieved by considering lexical and edge based relatedness. Results: The user interface is demonstrated for five thousand concepts, which gives the related information from Wikipedia web page in three languages. Mapping recommendation by the lexical similarity Jaccard algorithm gives 27% and Jaro Winkler algorithm gives 60%. Edge based relationship using WuPalmer algorithm gives 93% mapping recommendation. These are analyzed and compared with our algorithm based on WuPalmer gives more specific mapping results than WuPalmer with 71%. Conclusion: Thus it possible to find the specific resultant web page based on the user requirement in three different

  5. A relation based measure of semantic similarity for Gene Ontology annotations

    Directory of Open Access Journals (Sweden)

    Gaudin Benoit

    2008-11-01

    Full Text Available Abstract Background Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description. Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. Results We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. Conclusion We derive a measure of semantic

  6. Visualization of mappings between the gene ontology and cluster trees

    Science.gov (United States)

    Jusufi, Ilir; Kerren, Andreas; Aleksakhin, Vladyslav; Schreiber, Falk

    2012-01-01

    Ontologies and hierarchical clustering are both important tools in biology and medicine to study high-throughput data such as transcriptomics and metabolomics data. Enrichment of ontology terms in the data is used to identify statistically overrepresented ontology terms, giving insight into relevant biological processes or functional modules. Hierarchical clustering is a standard method to analyze and visualize data to find relatively homogeneous clusters of experimental data points. Both methods support the analysis of the same data set, but are usually considered independently. However, often a combined view is desired: visualizing a large data set in the context of an ontology under consideration of a clustering of the data. This paper proposes a new visualization method for this task.

  7. Detecting Inconsistencies in the Gene Ontology Using Ontology Databases with Not-gadgets

    Science.gov (United States)

    Lependu, Paea; Dou, Dejing; Howe, Doug

    We present ontology databases with not-gadgets, a method for detecting inconsistencies in an ontology with large numbers of annotated instances by using triggers and exclusion dependencies in a unique way. What makes this work relevant is the use of the database itself, rather than an external reasoner, to detect logical inconsistencies given large numbers of annotated instances. What distinguishes this work is the use of event-driven triggers together with the introduction of explicit negations. We applied this approach toward the serotonin example, an open problem in biomedical informatics which aims to use annotations to help identify inconsistencies in the Gene Ontology. We discovered 75 inconsistencies that have important implications in biology, which include: (1) methods for refining transfer rules used for inferring electronic annotations, and (2) highlighting possible biological differences across species worth investigating.

  8. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  9. A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

    Science.gov (United States)

    Huntley, Rachael P; Harris, Midori A; Alam-Faruque, Yasmin; Blake, Judith A; Carbon, Seth; Dietze, Heiko; Dimmer, Emily C; Foulger, Rebecca E; Hill, David P; Khodiyar, Varsha K; Lock, Antonia; Lomax, Jane; Lovering, Ruth C; Mutowo-Meullenet, Prudence; Sawford, Tony; Van Auken, Kimberly; Wood, Valerie; Mungall, Christopher J

    2014-05-21

    The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.

  10. Ontological Enrichment of the Genes-to-Systems Breast Cancer Database

    Science.gov (United States)

    Viti, Federica; Mosca, Ettore; Merelli, Ivan; Calabria, Andrea; Alfieri, Roberta; Milanesi, Luciano

    Breast cancer research need the development of specific and suitable tools to appropriately manage biomolecular knowledge. The presented work deals with the integrative storage of breast cancer related biological data, in order to promote a system biology approach to this network disease. To increase data standardization and resource integration, annotations maintained in Genes-to-Systems Breast Cancer (G2SBC) database are associated to ontological terms, which provide a hierarchical structure to organize data enabling more effective queries, statistical analysis and semantic web searching. Exploited ontologies, which cover all levels of the molecular environment, from genes to systems, are among the most known and widely used bioinformatics resources. In G2SBC database ontology terms both provide a semantic layer to improve data storage, accessibility and analysis and represent a user friendly instrument to identify relations among biological components.

  11. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  12. Muscle Research and Gene Ontology: New standards for improved data integration

    Directory of Open Access Journals (Sweden)

    Nori Alessandra

    2009-01-01

    Full Text Available Abstract Background The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community. Results The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature. Conclusion The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.

  13. Guidelines for the functional annotation of microRNAs using the Gene Ontology.

    Science.gov (United States)

    Huntley, Rachael P; Sitnikov, Dmitry; Orlic-Milacic, Marija; Balakrishnan, Rama; D'Eustachio, Peter; Gillespie, Marc E; Howe, Doug; Kalea, Anastasia Z; Maegdefessel, Lars; Osumi-Sutherland, David; Petri, Victoria; Smith, Jennifer R; Van Auken, Kimberly; Wood, Valerie; Zampetaki, Anna; Mayr, Manuel; Lovering, Ruth C

    2016-05-01

    MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). © 2016 Huntley et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  14. Evaluating the significance of protein functional similarity based on gene ontology.

    Science.gov (United States)

    Konopka, Bogumil M; Golda, Tomasz; Kotulska, Malgorzata

    2014-11-01

    Gene ontology is among the most successful ontologies in the biomedical domain. It is used to describe, unambiguously, protein molecular functions, cellular localizations, and processes in which proteins participate. The hierarchical structure of gene ontology allows quantifying protein functional similarity by application of algorithms that calculate semantic similarities. The scores, however, are meaningless without a given context. Here, we propose how to evaluate the significance of protein function semantic similarity scores by comparing them to reference distributions calculated for randomly chosen proteins. In the study, thresholds for significant functional semantic similarity, in four representative annotation corpuses, were estimated. We also show that the score significance is influenced by the number and specificity of gene ontology terms that are annotated to compared proteins. While proteins with a greater number of terms tend to yield higher similarity scores, proteins with more specific terms produce lower scores. The estimated significance thresholds were validated using protein sequence-function and structure-function relationships. Taking into account the term number and term specificity improves the distinction between significant and insignificant semantic similarity comparisons.

  15. The effects of shared information on semantic calculations in the gene ontology.

    Science.gov (United States)

    Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

    2017-01-01

    The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

  16. The Vision and Challenges of the Gene Ontology.

    Science.gov (United States)

    Lewis, Suzanna E

    2017-01-01

    The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.

  17. Fast Gene Ontology based clustering for microarray experiments

    OpenAIRE

    Ovaska Kristian; Laakso Marko; Hautaniemi Sampsa

    2008-01-01

    Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fa...

  18. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  19. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.

  20. Identification of disease-causing genes using microarray data mining and Gene Ontology.

    Science.gov (United States)

    Mohammadi, Azadeh; Saraee, Mohammad H; Salehi, Mansoor

    2011-01-26

    One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene

  1. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Directory of Open Access Journals (Sweden)

    Saraee Mohammad H

    2011-01-01

    Full Text Available Abstract Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional

  2. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Science.gov (United States)

    2011-01-01

    Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional methods by adding a redundancy

  3. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  4. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  5. Expansion of the Gene Ontology knowledgebase and resources

    Science.gov (United States)

    2017-01-01

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. PMID:27899567

  6. Expansion of the Gene Ontology knowledgebase and resources.

    Science.gov (United States)

    2017-01-04

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology

    Science.gov (United States)

    Hill, David P.; Berardini, Tanya Z.; Howe, Douglas G.; Van Auken, Kimberly M.

    2010-01-01

    Developmental biology, like many other areas of biology, has undergone a dramatic shift in the perspective from which developmental processes are viewed. Instead of focusing on the actions of a handful of genes or functional RNAs, we now consider the interactions of large functional gene networks and study how these complex systems orchestrate the unfolding of an organism, from gametes to adult. Developmental biologists are beginning to realize that understanding ontogeny on this scale requires the utilization of computational methods to capture, store and represent the knowledge we have about the underlying processes. Here we review the use of the Gene Ontology (GO) to study developmental biology. We describe the organization and structure of the GO and illustrate some of the ways we use it to capture the current understanding of many common developmental processes. We also discuss ways in which gene product annotations using the GO have been used to ask and answer developmental questions in a variety of model developmental systems. We provide suggestions as to how the GO might be used in more powerful ways to address questions about development. Our goal is to provide developmental biologists with enough background about the GO that they can begin to think about how they might use the ontology efficiently and in the most powerful ways possible. PMID:19921742

  8. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  9. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  10. The mammalian adult neurogenesis gene ontology (MANGO) provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Science.gov (United States)

    Overall, Rupert W; Paszkowski-Rogacz, Maciej; Kempermann, Gerd

    2012-01-01

    Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes) to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already successful 'top-down' approach of the Gene Ontology.

  11. The mammalian adult neurogenesis gene ontology (MANGO provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Directory of Open Access Journals (Sweden)

    Rupert W Overall

    Full Text Available BACKGROUND: Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. METHODOLOGY/PRINCIPAL FINDINGS: We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. CONCLUSIONS/SIGNIFICANCE: The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already

  12. Cellular functions of genetically imprinted genes in human and mouse as annotated in the gene ontology.

    Science.gov (United States)

    Hamed, Mohamed; Ismael, Siba; Paulsen, Martina; Helms, Volkhard

    2012-01-01

    By analyzing the cellular functions of genetically imprinted genes as annotated in the Gene Ontology for human and mouse, we found that imprinted genes are often involved in developmental, transport and regulatory processes. In the human, paternally expressed genes are enriched in GO terms related to the development of organs and of anatomical structures. In the mouse, maternally expressed genes regulate cation transport as well as G-protein signaling processes. Furthermore, we investigated if imprinted genes are regulated by common transcription factors. We identified 25 TF families that showed an enrichment of binding sites in the set of imprinted genes in human and 40 TF families in mouse. In general, maternally and paternally expressed genes are not regulated by different transcription factors. The genes Nnat, Klf14, Blcap, Gnas and Ube3a contribute most to the enrichment of TF families. In the mouse, genes that are maternally expressed in placenta are enriched for AP1 binding sites. In the human, we found that these genes possessed binding sites for both, AP1 and SP1.

  13. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL

    Directory of Open Access Journals (Sweden)

    Aranguren Mikel

    2007-02-01

    Full Text Available Abstract The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.

  14. Brief isoflurane anaesthesia affects differential gene expression, gene ontology and gene networks in rat brain.

    Science.gov (United States)

    Lowes, Damon A; Galley, Helen F; Moura, Alessandro P S; Webster, Nigel R

    2017-01-15

    Much is still unknown about the mechanisms of effects of even brief anaesthesia on the brain and previous studies have simply compared differential expression profiles with and without anaesthesia. We hypothesised that network analysis, in addition to the traditional differential gene expression and ontology analysis, would enable identification of the effects of anaesthesia on interactions between genes. Rats (n=10 per group) were randomised to anaesthesia with isoflurane in oxygen or oxygen only for 15min, and 6h later brains were removed. Differential gene expression and gene ontology analysis of microarray data was performed. Standard clustering techniques and principal component analysis with Bayesian rules were used along with social network analysis methods, to quantitatively model and describe the gene networks. Anaesthesia had marked effects on genes in the brain with differential regulation of 416 probe sets by at least 2 fold. Gene ontology analysis showed 23 genes were functionally related to the anaesthesia and of these, 12 were involved with neurotransmitter release, transport and secretion. Gene network analysis revealed much greater connectivity in genes from brains from anaesthetised rats compared to controls. Other importance measures were also altered after anaesthesia; median [range] closeness centrality (shortest path) was lower in anaesthetized animals (0.07 [0-0.30]) than controls (0.39 [0.30-0.53], pgenes after anaesthesia and suggests future targets for investigation. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Unifying themes in microbial associations with animal and plant hosts described using the gene ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Collmer, Candace W; Gwinn-Giglio, Michelle; Lindeberg, Magdalen; Meng, Shaowu; Chibucos, Marcus C; Tseng, Tsai-Tien; Lomax, Jane; Biehl, Bryan; Ireland, Amelia; Bird, David; Dean, Ralph A; Glasner, Jeremy D; Perna, Nicole; Setubal, Joao C; Collmer, Alan; Tyler, Brett M

    2010-12-01

    Microbes form intimate relationships with hosts (symbioses) that range from mutualism to parasitism. Common microbial mechanisms involved in a successful host association include adhesion, entry of the microbe or its effector proteins into the host cell, mitigation of host defenses, and nutrient acquisition. Genes associated with these microbial mechanisms are known for a broad range of symbioses, revealing both divergent and convergent strategies. Effective comparisons among these symbioses, however, are hampered by inconsistent descriptive terms in the literature for functionally similar genes. Bioinformatic approaches that use homology-based tools are limited to identifying functionally similar genes based on similarities in their sequences. An effective solution to these limitations is provided by the Gene Ontology (GO), which provides a standardized language to describe gene products from all organisms. The GO comprises three ontologies that enable one to describe the molecular function(s) of gene products, the biological processes to which they contribute, and their cellular locations. Beginning in 2004, the Plant-Associated Microbe Gene Ontology (PAMGO) interest group collaborated with the GO consortium to extend the GO to accommodate terms for describing gene products associated with microbe-host interactions. Currently, over 900 terms that describe biological processes common to diverse plant- and animal-associated microbes are incorporated into the GO database. Here we review some unifying themes common to diverse host-microbe associations and illustrate how the new GO terms facilitate a standardized description of the gene products involved. We also highlight areas where new terms need to be developed, an ongoing process that should involve the whole community.

  16. Analysis of tumor suppressor genes based on gene ontology and the KEGG pathway.

    Science.gov (United States)

    Yang, Jing; Chen, Lei; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs) are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO) and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS), were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments.

  17. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

    Science.gov (United States)

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; Wang, Yadong; Rhee, Seung Y; Chen, Jin

    2015-02-14

    Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/~jinchen/NETSIM .

  18. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    Science.gov (United States)

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  19. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2013-01-01

    Full Text Available One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR method followed by incremental feature selection (IFS. 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  20. Bi-directional semantic similarity for gene ontology to optimize biological and clinical analyses.

    Science.gov (United States)

    Bien, Sang Jay; Park, Chan Hee; Shim, Hae Jin; Yang, Woongcheol; Kim, Jihun; Kim, Ju Han

    2012-01-01

    Semantic similarity analysis facilitates automated semantic explanations of biological and clinical data annotated by biomedical ontologies. Gene ontology (GO) has become one of the most important biomedical ontologies with a set of controlled vocabularies, providing rich semantic annotations for genes and molecular phenotypes for diseases. Current methods for measuring GO semantic similarities are limited to considering only the ancestor terms while neglecting the descendants. One can find many GO term pairs whose ancestors are identical but whose descendants are very different and vice versa. Moreover, the lower parts of GO trees are full of terms with more specific semantics. This study proposed a method of measuring semantic similarities between GO terms using the entire GO tree structure, including both the upper (ancestral) and the lower (descendant) parts. Comprehensive comparison studies were performed with well-known information content-based and graph structure-based semantic similarity measures with protein sequence similarities, gene expression-profile correlations, protein-protein interactions, and biological pathway analyses. The proposed bidirectional measure of semantic similarity outperformed other graph-based and information content-based methods.

  1. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

    Science.gov (United States)

    Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin M

    2016-09-09

    Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

  2. Ontology or formal ontology

    Science.gov (United States)

    Žáček, Martin

    2017-07-01

    Ontology or formal ontology? Which word is correct? The aim of this article is to introduce correct terms and explain their basis. Ontology describes a particular area of interest (domain) in a formal way - defines the classes of objects that are in that area, and relationships that may exist between them. Meaning of ontology consists mainly in facilitating communication between people, improve collaboration of software systems and in the improvement of systems engineering. Ontology in all these areas offer the possibility of unification of view, maintaining consistency and unambiguity.

  3. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

    Science.gov (United States)

    Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

    2016-03-11

    Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.

  4. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  5. Genetic resources for advanced biofuel production described with the Gene Ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Purwantini, Endang; Lomax, Jane; Setubal, João C; Mukhopadhyay, Biswarup; Tyler, Brett M

    2014-01-01

    Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology () project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat), can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  6. Genetic Resources for Advanced Biofuel Production Described with the Gene Ontology

    Directory of Open Access Journals (Sweden)

    Trudy eTorto-Alalibo

    2014-10-01

    Full Text Available Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial Energy Gene Ontology (MENGO: http://www.mengo.biochem.vt.edu project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat, can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  7. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...... can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties...

  8. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  9. BOAT: automatic alignment of biomedical ontologies using term informativeness and candidate selection.

    Science.gov (United States)

    Chua, Watson Wei Khong; Kim, Jung-Jae

    2012-04-01

    The biomedical sciences is one of the few domains where ontologies are widely being developed to facilitate information retrieval and knowledge sharing, but there still remains the problem that applications using different ontologies cannot share knowledge without explicit references between overlapping concepts. Ontology alignment is the task of identifying such equivalence relations between concepts across ontologies. Its application to the biomedical domain should address two open issues: (1) determining the equivalence of concept-pairs which have overlapping terms in their names, and (2) the high run-time required to align large ontologies which are typical in the biomedical domain. To address them, we present a novel approach, named the Biomedical Ontologies Alignment Technique (BOAT), which is state-of-the-art in terms of F-measure, precision and speed. A key feature of BOAT is that it considers the informativeness of each component word in the concept labels, which has significant impact on biomedical ontologies, resulting in a 12.2% increase in F-measure. Another important feature of BOAT is that it selects for comparison only concept pairs that show high likelihoods of equivalence, based on the similarity of their annotations. BOAT's F-measure of 0.88 for the alignment of the mouse and human anatomy ontologies is on par with that of another state-of-the-art matcher, AgreementMaker, while taking a shorter time.

  10. Integration of the Gene Ontology into an object-oriented architecture

    Directory of Open Access Journals (Sweden)

    Zheng W Jim

    2005-05-01

    Full Text Available Abstract Background To standardize gene product descriptions, a formal vocabulary defined as the Gene Ontology (GO has been developed. GO terms have been categorized into biological processes, molecular functions, and cellular components. However, there is no single representation that integrates all the terms into one cohesive model. Furthermore, GO definitions have little information explaining the underlying architecture that forms these terms, such as the dynamic and static events occurring in a process. In contrast, object-oriented models have been developed to show dynamic and static events. A portion of the TGF-beta signaling pathway, which is involved in numerous cellular events including cancer, differentiation and development, was used to demonstrate the feasibility of integrating the Gene Ontology into an object-oriented model. Results Using object-oriented models we have captured the static and dynamic events that occur during a representative GO process, "transforming growth factor-beta (TGF-beta receptor complex assembly" (GO:0007181. Conclusion We demonstrate that the utility of GO terms can be enhanced by object-oriented technology, and that the GO terms can be integrated into an object-oriented model by serving as a basis for the generation of object functions and attributes.

  11. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  12. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining.

    Science.gov (United States)

    Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2012-12-20

    Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since

  13. Onto-CC: a web server for identifying Gene Ontology conceptual clusters

    Science.gov (United States)

    Romero-Zaliz, R.; del Val, C.; Cobb, J. P.; Zwir, I.

    2008-01-01

    The Gene Ontology (GO) vocabulary has been extensively explored to analyze the functions of coexpressed genes. However, despite its extended use in Biology and Medical Sciences, there are still high levels of uncertainty about which ontology (i.e. Molecular Process, Cellular Component or Molecular Function) should be used, and at which level of specificity. Moreover, the GO database can contain incomplete information resulting from human annotations, or highly influenced by the available knowledge about a specific branch in an ontology. In spite of these drawbacks, there is a trend to ignore these problems and even use GO terms to conduct searches of gene expression profiles (i.e. expression + GO) instead of more cautious approaches that just consider them as an independent source of validation (i.e. expression versus GO). Consequently, propagating the uncertainty and producing biased analysis of the required gene grouping hypotheses. We proposed a web tool, Onto-CC, as an automatic method specially suited for independent explanation/validation of gene grouping hypotheses (e.g. coexpressed genes) based on GO clusters (i.e. expression versus GO). Onto-CC approach reduces the uncertainty of the queries by identifying optimal conceptual clusters that combine terms from different ontologies simultaneously, as well as terms defined at different levels of specificity in the GO hierarchy. To do so, we implemented the EMO-CC methodology to find clusters in structural databases [GO Directed acyclic Graph (DAG) tree], inspired on Conceptual Clustering algorithms. This approach allows the management of optimal cluster sets as potential parallel hypotheses, guided by multiobjective/multimodal optimization techniques. Therefore, we can generate alternative and, still, optimal explanations of queries that can provide new insights for a given problem. Onto-CC has been successfully used to test different medical and biological hypotheses including the explanation and prediction of

  14. Systematically characterizing and prioritizing chemosensitivity related gene based on Gene Ontology and protein interaction network

    Directory of Open Access Journals (Sweden)

    Chen Xin

    2012-10-01

    Full Text Available Abstract Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN. Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable

  15. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    Science.gov (United States)

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  16. Identifying redundant and missing relations in the gene ontology.

    Science.gov (United States)

    Mougin, Fleur

    2015-01-01

    Significant efforts have been undertaken for providing the Gene Ontology (GO) in a computable format as well as for enriching it with logical definitions. Automated approaches can thus be applied to GO for assisting its maintenance and for checking its internal coherence. However, inconsistencies may still remain within GO. In this frame, the objective of this work was to audit GO relationships. First, reasoning over relationships was exploited for detecting redundant relations existing between GO concepts. Missing necessary and sufficient conditions were then identified based on the compositional structure of the preferred names of GO concepts. More than one thousand redundant relations and 500 missing necessary and sufficient conditions were found. The proposed approach was thus successful for detecting inconsistencies within GO relations. The application of lexical approaches as well as the exploitation of synonyms and textual definitions could be useful for identifying additional necessary and sufficient conditions. Multiple necessary and sufficient conditions for a given GO concept may be indicative of inconsistencies.

  17. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.

    Science.gov (United States)

    Karadeniz, İlknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan

    2015-01-01

    Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our

  18. Genes2GO: A web application for querying gene sets for specific GO terms.

    Science.gov (United States)

    Chawla, Konika; Kuiper, Martin

    2016-01-01

    Gene ontology annotations have become an essential resource for biological interpretations of experimental findings. The process of gathering basic annotation information in tables that link gene sets with specific gene ontology terms can be cumbersome, in particular if it requires above average computer skills or bioinformatics expertise. We have therefore developed Genes2GO, an intuitive R-based web application. Genes2GO uses the biomaRt package of Bioconductor in order to retrieve custom sets of gene ontology annotations for any list of genes from organisms covered by the Ensembl database. Genes2GO produces a binary matrix file, indicating for each gene the presence or absence of specific annotations for a gene. It should be noted that other GO tools do not offer this user-friendly access to annotations. Genes2GO is freely available and listed under http://www.semantic-systems-biology.org/tools/externaltools/.

  19. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

    Science.gov (United States)

    Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

    2017-03-20

    The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .

  20. Identification of oral cancer related candidate genes by integrating protein-protein interactions, gene ontology, pathway analysis and immunohistochemistry.

    Science.gov (United States)

    Kumar, Ravindra; Samal, Sabindra K; Routray, Samapika; Dash, Rupesh; Dixit, Anshuman

    2017-05-30

    In the recent years, bioinformatics methods have been reported with a high degree of success for candidate gene identification. In this milieu, we have used an integrated bioinformatics approach assimilating information from gene ontologies (GO), protein-protein interaction (PPI) and network analysis to predict candidate genes related to oral squamous cell carcinoma (OSCC). A total of 40973 PPIs were considered for 4704 cancer-related genes to construct human cancer gene network (HCGN). The importance of each node was measured in HCGN by ten different centrality measures. We have shown that the top ranking genes are related to a significantly higher number of diseases as compared to other genes in HCGN. A total of 39 candidate oral cancer target genes were predicted by combining top ranked genes and the genes corresponding to significantly enriched oral cancer related GO terms. Initial verification using literature and available experimental data indicated that 29 genes were related with OSCC. A detailed pathway analysis led us to propose a role for the selected candidate genes in the invasion and metastasis in OSCC. We further validated our predictions using immunohistochemistry (IHC) and found that the gene FLNA was upregulated while the genes ARRB1 and HTT were downregulated in the OSCC tissue samples.

  1. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data.

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.

  2. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L.

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer’s disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology. PMID:28199395

  3. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.

  4. Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

    Science.gov (United States)

    Mazandu, Gaston K.; Mulder, Nicola J.

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG. PMID:24078912

  5. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

    Science.gov (United States)

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.

  6. Formal modeling of Gene Ontology annotation predictions based on factor graphs

    Science.gov (United States)

    Spetale, Flavio; Murillo, Javier; Tapia, Elizabeth; Arce, Débora; Ponce, Sergio; Bulacio, Pilar

    2016-04-01

    Gene Ontology (GO) is a hierarchical vocabulary for gene product annotation. Its synergy with machine learning classification methods has been widely used for the prediction of protein functions. Current classification methods rely on heuristic solutions to check the consistency with some aspects of the underlying GO structure. In this work we formalize the GO is-a relationship through predicate logic. Moreover, an ontology model based on Forney Factor Graph (FFG) is shown on a general fragment of Cellular Component GO.

  7. Protein-Protein Interaction Network and Gene Ontology

    Science.gov (United States)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  8. DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Directory of Open Access Journals (Sweden)

    Wu Cathy H

    2005-08-01

    Full Text Available Abstract Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations. Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms. Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO. DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete

  9. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    Science.gov (United States)

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery.

  10. Genetic resources for methane production from biomass described with gene ontology

    Directory of Open Access Journals (Sweden)

    Endang ePurwantini

    2014-12-01

    Full Text Available Methane (CH4 is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH4 into the atmosphere contributes to climate change. Biological CH4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H2 plus CO2, acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO2. In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H2 plus CO2. In this report we present Gene Ontology (GO terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO project. A recently discovered non-canonical CH4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing gold standards for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/.

  11. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

    Directory of Open Access Journals (Sweden)

    Jain Shobhit

    2010-11-01

    Full Text Available Abstract Background Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs. They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO. Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. Results We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS, to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. Conclusions The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.

  12. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  13. PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources [v1; ref status: indexed, http://f1000r.es/5j2

    Directory of Open Access Journals (Sweden)

    Indika Kahanda

    2015-07-01

    Full Text Available The human phenotype ontology (HPO was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.

  14. TopoICSim: a new semantic similarity measure based on gene ontology.

    Science.gov (United States)

    Ehsani, Rezvan; Drabløs, Finn

    2016-07-29

    The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .

  15. Grouping miRNAs of similar functions via weighted information content of gene ontology.

    Science.gov (United States)

    Lan, Chaowang; Chen, Qingfeng; Li, Jinyan

    2016-12-22

    Regulation mechanisms between miRNAs and genes are complicated. To accomplish a biological function, a miRNA may regulate multiple target genes, and similarly a target gene may be regulated by multiple miRNAs. Wet-lab knowledge of co-regulating miRNAs is limited. This work introduces a computational method to group miRNAs of similar functions to identify co-regulating miRNAsfrom a similarity matrix of miRNAs. We define a novel information content of gene ontology (GO) to measure similarity between two sets of GO graphs corresponding to the two sets of target genes of two miRNAs. This between-graph similarity is then transferred as a functional similarity between the two miRNAs. Our definition of the information content is based on the size of a GO term's descendants, but adjusted by a weight derived from its depth level and the GO relationships at its path to the root node or to the most informative common ancestor (MICA). Further, a self-tuning technique and the eigenvalues of the normalized Laplacian matrix are applied to determine the optimal parameters for the spectral clustering of the similarity matrix of the miRNAs. Experimental results demonstrate that our method has better clustering performance than the existing edge-based, node-based or hybrid methods. Our method has also demonstrated a novel usefulness for the function annotation of new miRNAs, as reported in the detailed case studies.

  16. GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

    Directory of Open Access Journals (Sweden)

    Yang Da

    2007-01-01

    Full Text Available Abstract Background Rapid progress in high-throughput biotechnologies (e.g. microarrays and exponential accumulation of gene functional knowledge make it promising for systematic understanding of complex human diseases at functional modules level. Based on Gene Ontology, a large number of automatic tools have been developed for the functional analysis and biological interpretation of the high-throughput microarray data. Results Different from the existing tools such as Onto-Express and FatiGO, we develop a tool named GO-2D for identifying 2-dimensional functional modules based on combined GO categories. For example, it refines biological process categories by sorting their genes into different cellular component categories, and then extracts those combined categories enriched with the interesting genes (e.g., the differentially expressed genes for identifying the cellular-localized functional modules. Applications of GO-2D to the analyses of two human cancer datasets show that very specific disease-relevant processes can be identified by using cellular location information. Conclusion For studying complex human diseases, GO-2D can extract functionally compact and detailed modules such as the cellular-localized ones, characterizing disease-relevant modules in terms of both biological processes and cellular locations. The application results clearly demonstrate that 2-dimensional approach complementary to current 1-dimensional approach is powerful for finding modules highly relevant to diseases.

  17. SoFoCles: feature filtering for microarray classification based on gene ontology.

    Science.gov (United States)

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  18. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science.

    Science.gov (United States)

    Klie, Sebastian; Nikoloski, Zoran

    2012-01-01

    Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  19. The choice between MapMan and Gene Ontology for automated gene function prediction in plant science

    Directory of Open Access Journals (Sweden)

    Sebastian eKlie

    2012-06-01

    Full Text Available Since the introduction of the Gene Ontology (GO, the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of coexpression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  20. Multimodal probabilistic generative models for time-course gene expression data and Gene Ontology (GO) tags.

    Science.gov (United States)

    Gabbur, Prasad; Hoying, James; Barnard, Kobus

    2015-10-01

    We propose four probabilistic generative models for simultaneously modeling gene expression levels and Gene Ontology (GO) tags. Unlike previous approaches for using GO tags, the joint modeling framework allows the two sources of information to complement and reinforce each other. We fit our models to three time-course datasets collected to study biological processes, specifically blood vessel growth (angiogenesis) and mitotic cell cycles. The proposed models result in a joint clustering of genes and GO annotations. Different models group genes based on GO tags and their behavior over the entire time-course, within biological stages, or even individual time points. We show how such models can be used for biological stage boundary estimation de novo. We also evaluate our models on biological stage prediction accuracy of held out samples. Our results suggest that the models usually perform better when GO tag information is included. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  2. Finding top-k similar pairs of objects annotated with terms from an ontology

    CERN Document Server

    Bhattacharya, Arnab; Singh, Ambuj K

    2010-01-01

    With the growing focus on semantic searches and interpretations, an increasing number of standardized vocabularies and ontologies are being designed and used to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-k best pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them, and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure, earth mover's distance, finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be aggregated progressively and utilize them to speed up the search for top-k object pairs when the earth mover's distance is used. F...

  3. Nursing diagnoses and outcomes related to the circulatory-system terms (ICNP® represented in an ontology

    Directory of Open Access Journals (Sweden)

    Marcia Regina Cubas

    2013-10-01

    Full Text Available The aim of the present study was to develop titles of Nursing Diagnoses and Outcomes (ND/NO through the relationship between the terms of the Focus axis, limited to the Circulatory System Process, and the terms of other ICNP® axes and to integrate these terms into an ontology. Titles were developed linking 17 terms of the focus axis, which were evaluated by expert nurses in five Brazilian cities. Titles whose use concordance was above 0.80 were included in the ontology. In total, 89 titles for ND/NO were supported in the literature, and 19 were not supported; 37 were assessed as eligible for use in healthcare practice and were included in the ontology. The construction of ND/NO titles based on the ICNP® and using a formal representation of knowledge is a task that requires deepening concepts used for nursing and adequate classification revisions. The elaborated titles will facilitate the composition of diagnostics that are more consistent with practice.

  4. Using Ontology Fingerprints to Evaluate Genome-wide Association Results

    OpenAIRE

    Lam Tsoi; Michael Boehnke; Richard Klein; Jim Zheng

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  5. PPISEARCHENGINE: gene ontology-based search for protein-protein interactions.

    Science.gov (United States)

    Park, Byungkyu; Cui, Guangyu; Lee, Hyunjin; Huang, De-Shuang; Han, Kyungsook

    2013-01-01

    This paper presents a new search engine called PPISearchEngine which finds protein-protein interactions (PPIs) using the gene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a prime number and the relation between the terms is represented by the product of prime numbers. This representation is hidden from users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number that represents the query protein. For a query protein, PPISearchEngine considers not only the GO term associated with the query protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions of the query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching search method cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To the best of our knowledge, this search engine is the first method that can process queries like 'for protein p with GO [Formula: see text], find p's interaction partners with GO [Formula: see text]'. PPISearchEngine is freely available to academics at http://search.hpid.org/.

  6. Non-lexical approaches to identifying associative relations in the gene ontology.

    Science.gov (United States)

    Bodenreider, Olivier; Aubry, Marc; Burgun, Anita

    2005-01-01

    The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

  7. Space station short-term mission planning using ontology modelling and time iteration

    Institute of Scientific and Technical Information of China (English)

    Huijiao Bu; Jin Zhang; Yazhong Luo

    2016-01-01

    This paper studies the problem of the space station short-term mission planning, which aims to alocate the exe-cuting time of missions effectively, schedule the correspon- ding resources reasonably and arrange the time of the as-tronauts properly. A domain model is developed by using the ontology theory to describe the concepts, constraints and relations of the planning domain formaly, abstractly and normatively. A method based on time iteration is adopted to solve the short-term planning problem. Meanwhile, the re-solving strategies are proposed to resolve different kinds of conflicts induced by the constraints of power, heat, resource, astronaut and relationship. The proposed approach is evalu-ated in a test case with fifteen missions, thirteen resources and three astronauts. The results show that the developed domain ontology model is reasonable, and the time iteration method using the proposed resolving strategies can suc-cessfuly obtain the plan satisfying al considered constraints.

  8. Large-scale Gene Ontology analysis of plant transcriptome-derived sequences retrieved by AFLP technology

    Directory of Open Access Journals (Sweden)

    Ramina Angelo

    2008-07-01

    Full Text Available Abstract Background After 10-year-use of AFLP (Amplified Fragment Length Polymorphism technology for DNA fingerprinting and mRNA profiling, large repertories of genome- and transcriptome-derived sequences are available in public databases for model, crop and tree species. AFLP marker systems have been and are being extensively exploited for genome scanning and gene mapping, as well as cDNA-AFLP for transcriptome profiling and differentially expressed gene cloning. The evaluation, annotation and classification of genomic markers and expressed transcripts would be of great utility for both functional genomics and systems biology research in plants. This may be achieved by means of the Gene Ontology (GO, consisting in three structured vocabularies (i.e. ontologies describing genes, transcripts and proteins of any organism in terms of their associated cellular component, biological process and molecular function in a species-independent manner. In this paper, the functional annotation of about 8,000 AFLP-derived ESTs retrieved in the NCBI databases was carried out by using GO terminology. Results Descriptive statistics on the type, size and nature of gene sequences obtained by means of AFLP technology were calculated. The gene products associated with mRNA transcripts were then classified according to the three main GO vocabularies. A comparison of the functional content of cDNA-AFLP records was also performed by splitting the sequence dataset into monocots and dicots and by comparing them to all annotated ESTs of Arabidopsis and rice, respectively. On the whole, the statistical parameters adopted for the in silico AFLP-derived transcriptome-anchored sequence analysis proved to be critical for obtaining reliable GO results. Such an exhaustive annotation may offer a suitable platform for functional genomics, particularly useful in non-model species. Conclusion Reliable GO annotations of AFLP-derived sequences can be gathered through the optimization

  9. Impact of ontology evolution on functional analyses.

    Science.gov (United States)

    Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard

    2012-10-15

    Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.

  10. Interactome and Gene Ontology provide congruent yet subtly different views of a eukaryotic cell

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2009-07-01

    Full Text Available Abstract Background The characterization of the global functional structure of a cell is a major goal in bioinformatics and systems biology. Gene Ontology (GO and the protein-protein interaction network offer alternative views of that structure. Results This study presents a comparison of the global structures of the Gene Ontology and the interactome of Saccharomyces cerevisiae. Sensitive, unsupervised methods of clustering applied to a large fraction of the proteome led to establish a GO-interactome correlation value of +0.47 for a general dataset that contains both high and low-confidence interactions and +0.58 for a smaller, high-confidence dataset. Conclusion The structures of the yeast cell deduced from GO and interactome are substantially congruent. However, some significant differences were also detected, which may contribute to a better understanding of cell function and also to a refinement of the current ontologies.

  11. From "glycosyltransferase" to "congenital muscular dystrophy": integrating knowledge from NCBI Entrez Gene and the Gene Ontology.

    Science.gov (United States)

    Sahoo, Satya S; Zeng, Kelly; Bodenreider, Olivier; Sheth, Amit

    2007-01-01

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.

  12. An improved method for functional similarity analysis of genes based on Gene Ontology.

    Science.gov (United States)

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  13. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  14. Extending gene ontology in the context of extracellular RNA and vesicle communication

    NARCIS (Netherlands)

    Cheung, Kei-Hoi; Keerthikumar, Shivakumar; Roncaglia, Paola; Subramanian, Sai Lakshmi; Roth, Matthew E; Samuel, Monisha; Anand, Sushma; Gangoda, Lahiru; Gould, Stephen; Alexander, Roger; Galas, David; Gerstein, Mark B; Hill, Andrew F; Kitchen, Robert R; Lötvall, Jan; Patel, Tushar; Procaccini, Dena C; Quesenberry, Peter; Rozowsky, Joel; Raffai, Robert L; Shypitsyna, Aleksandra; Su, Andrew I; Théry, Clotilde; Vickers, Kasey; Wauben, Marca H M; Mathivanan, Suresh; Milosavljevic, Aleksandar; Laurent, Louise C

    2016-01-01

    BACKGROUND: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA

  15. Ontology based molecular signatures for immune cell types via gene expression analysis

    Science.gov (United States)

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  16. Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach.

    Science.gov (United States)

    Patel, Sejal; Park, Min Tae M; Chakravarty, M Mallar; Knight, Jo

    2016-01-01

    Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease.

  17. Evaluation of clustering algorithms for gene expression data using gene ontology annotations

    Institute of Scientific and Technical Information of China (English)

    MA Ning; ZHANG Zheng-guo

    2012-01-01

    Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.

  18. A multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors for functional gene analysis.

    Science.gov (United States)

    Weber, Kristoffer; Bartsch, Udo; Stocking, Carol; Fehse, Boris

    2008-04-01

    Functional gene analysis requires the possibility of overexpression, as well as downregulation of one, or ideally several, potentially interacting genes. Lentiviral vectors are well suited for this purpose as they ensure stable expression of complementary DNAs (cDNAs), as well as short-hairpin RNAs (shRNAs), and can efficiently transduce a wide spectrum of cell targets when packaged within the coat proteins of other viruses. Here we introduce a multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors designed according to the "building blocks" principle. Using a wide spectrum of different fluorescent markers, including drug-selectable enhanced green fluorescent protein (eGFP)- and dTomato-blasticidin-S resistance fusion proteins, LeGO vectors allow simultaneous analysis of multiple genes and shRNAs of interest within single, easily identifiable cells. Furthermore, each functional module is flanked by unique cloning sites, ensuring flexibility and individual optimization. The efficacy of these vectors for analyzing multiple genes in a single cell was demonstrated in several different cell types, including hematopoietic, endothelial, and neural stem and progenitor cells, as well as hepatocytes. LeGO vectors thus represent a valuable tool for investigating gene networks using conditional ectopic expression and knock-down approaches simultaneously.

  19. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  20. Globaltest and GOEAST: two different approaches for Gene Ontology analysis

    NARCIS (Netherlands)

    Hulsegge, B.; Kommadath, A.; Smits, M.A.

    2009-01-01

    Background Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST. Globaltest is a method for testing whether sets of

  1. Dictionary and Gene Ontology Based Similarity for Named Entity Relationship Protein-protein Interaction Prediction from Biotext Corpus

    Directory of Open Access Journals (Sweden)

    Smt K. Prabavathy

    2014-12-01

    Full Text Available Protein-protein interactions functions as a significant key role in several biological systems. These involves in complex formation and many pathways which are used to perform biological processes. By accurate identification of the set of interacting proteins can get rid of new light on the functional role of various proteins in the complex surroundings of the cell. The ability to construct biologically consequential gene networks and identification of the exact relationship in the gene network is critical for present-day systems biology. In earlier research, the power of presented gene modules to shed light on the functioning of complex biological systems is studied. Most of modules in these networks have shown small link with meaningful biological function, because these methods doesn’t exactly calculate the semantic relationship between the entities. In order to overcome these problems and improve the PPI results in the biotext corpus a new method is proposed in this research. The proposed method which directly incorporates Gene Ontology (GO annotation in construction of gene modules and Dictionary-based text is proposed to extract biotext information. Dictionary-Based Text and Gene Ontology (DBTGO approach that integrates with various gene-gene pairwise similarity values, protein-protein interaction relationship obtained from gene expression, in order to gain better biotext information retrieval result. A result analysis has been carried out on Biotext Project at UC Berkley. Testing the DBTGO algorithm indicates that it is able to improve PPI relationship identification result with all previously suggested methods in terms of the precision, recall, F measure and Normalized Discounted Cumulative Gain (NDCG. The proposed DBTGO algorithm can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  2. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  3. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Science.gov (United States)

    Vashisht, Shikha; Bagler, Ganesh

    2012-01-01

    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  4. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Directory of Open Access Journals (Sweden)

    Shikha Vashisht

    Full Text Available Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  5. How to learn about gene function: text-mining or ontologies?

    Science.gov (United States)

    Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I

    2015-03-01

    As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic

  6. Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

    Science.gov (United States)

    Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

    2016-04-01

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

  7. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  8. Gene ontology study of methyl jasmonate-treated and non-treated hairy roots of Panax ginseng to identify genes involved in secondary metabolic pathway.

    Science.gov (United States)

    Sathiyamoorthy, S; In, J G; Gayathri, S; Kim, Y Ju; Yang, D Ch

    2010-07-01

    The roots of Panax ginseng C.A. Meyer, known as Korean ginseng have been a valuable and important folk medicine in East Asian countries. It mainly used to maintain the homeostasis of the human body, with the presence ofginsenosides and non-saponin compounds like phenol compounds, acidic polysaccharides and polyethylene compounds. Functional genomics aid to annotate based on gene ontology. In this study, we focused on the genes involving in secondary metabolic pathways and to visualize temporal changes of gene expression in ginseng hairy roots with methyl ester methyl jasmonate (MeJA) along with non-treated hairy roots. A 5.774 EST clones were clustered and assembled as 501 contigs and 2.955 singletons. Annotations categorized with molecular functions, biological processes, cellular compounds of gene ontological terms and biochemical functions, enzyme commission to sequences were assigned to metabolic pathways of Kyoto Encyclopedia of Genes and Genomes database. Comparatively, EST sequences are assigned to cellular process, metabolic process, biotic and abiotic stress stimuli, developmental and biological regulations and transports are up-regulated 2-3 fold in MeJA treated hairy roots. 46 different sub groups of enzymes found in the MeJA treated plants. These annotated ESTs represents a significant proportion of the P. ginseng and provides molecular resource for developmental of microarrays for gene expression studies concerning development, metabolism and reproduction.

  9. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

    Science.gov (United States)

    Caniza, Horacio; Romero, Alfonso E; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-08-01

    We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. alberto@cs.rhul.ac.uk GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines. © The Author 2014. Published by Oxford University Press.

  10. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    Science.gov (United States)

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  11. The Functional Genetics of Handedness and Language Lateralization: Insights from Gene Ontology, Pathway and Disease Association Analyses.

    Science.gov (United States)

    Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian

    2017-01-01

    Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.

  12. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Directory of Open Access Journals (Sweden)

    Shibiao Wan

    Full Text Available Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  13. HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/. PMID:24647341

  14. Combining sequence and Gene Ontology for protein module detection in the Weighted Network.

    Science.gov (United States)

    Yu, Yang; Liu, Jie; Feng, Nuan; Song, Bo; Zheng, Zeyu

    2017-01-07

    Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.

    Science.gov (United States)

    Khodiyar, Varsha K; Howe, Doug; Talmud, Philippa J; Breckenridge, Ross; Lovering, Ruth C

    2013-01-01

    For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer's vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer's vesicle determine asymmetry in the developing heart, the direction of 'heart jogging' and the direction of 'heart looping'.  'Heart jogging' is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward 'jog'. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development.    We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging.  We found that the human, mouse and zebrafish 'heart jogging orthologs' are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging.    This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.

  16. Ontology-Driven Co-clustering of Gene Expression Data

    Science.gov (United States)

    Cordero, Francesca; Pensa, Ruggero G.; Visconti, Alessia; Ienco, Dino; Botta, Marco

    The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

  17. Gene Ontology based housekeeping gene selection for RNA-seq normalization.

    Science.gov (United States)

    Chen, Chien-Ming; Lu, Yu-Lun; Sio, Chi-Pong; Wu, Guan-Chung; Tzou, Wen-Shyong; Pai, Tun-Wen

    2014-06-01

    RNA-seq analysis provides a powerful tool for revealing relationships between gene expression level and biological function of proteins. In order to identify differentially expressed genes among various RNA-seq datasets obtained from different experimental designs, an appropriate normalization method for calibrating multiple experimental datasets is the first challenging problem. We propose a novel method to facilitate biologists in selecting a set of suitable housekeeping genes for inter-sample normalization. The approach is achieved by adopting user defined experimentally related keywords, GO annotations, GO term distance matrices, orthologous housekeeping gene candidates, and stability ranking of housekeeping genes. By identifying the most distanced GO terms from query keywords and selecting housekeeping gene candidates with low coefficients of variation among different spatio-temporal datasets, the proposed method can automatically enumerate a set of functionally irrelevant housekeeping genes for pratical normalization. Novel and benchmark testing RNA-seq datasets were applied to demostrate that different selections of housekeeping gene lead to strong impact on differential gene expression analysis, and compared results have shown that our proposed method outperformed other traditional approaches in terms of both sensitivity and specificity. The proposed mechanism of selecting appropriate houskeeping genes for inter-dataset normalization is robust and accurate for differential expression analyses. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Ontological Discovery Environment: a system for integrating gene-phenotype associations.

    Science.gov (United States)

    Baker, Erich J; Jay, Jeremy J; Philip, Vivek M; Zhang, Yun; Li, Zuopan; Kirova, Roumyana; Langston, Michael A; Chesler, Elissa J

    2009-12-01

    The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental

  19. GOParGenPy: a high throughput method to generate gene ontology data matrices.

    Science.gov (United States)

    Kumar, Ajay Anand; Holm, Liisa; Toronen, Petri

    2013-08-08

    Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

  20. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data.

    Science.gov (United States)

    Koç, Ibrahim; Caetano-Anollés, Gustavo

    2017-01-01

    The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.

  1. Anatomy Ontology Matching Using Markov Logic Networks

    Directory of Open Access Journals (Sweden)

    Chunhua Li

    2016-01-01

    Full Text Available The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relationships between ontologies describing different species. Ontology matching is a kind of solutions to find semantic correspondences between entities of different ontologies. Markov logic networks which unify probabilistic graphical model and first-order logic provide an excellent framework for ontology matching. We combine several different matching strategies through first-order logic formulas according to the structure of anatomy ontologies. Experiments on the adult mouse anatomy and the human anatomy have demonstrated the effectiveness of proposed approach in terms of the quality of result alignment.

  2. A persistent particle ontology for QFT in terms of the Dirac sea

    CERN Document Server

    Deckert, Dirk-Andre; Oldofredi, Andrea

    2016-01-01

    We show that the Bohmian approach in terms of persisting particles that move on continuous trajectories following a deterministic law can be literally applied to QFT. By means of the Dirac sea model -- exemplified in the electron sector of the standard model neglecting radiation -- we explain how starting from persisting particles, one is led to standard QFT employing creation and annihilation operators when tracking the dynamics with respect to a reference state, the so-called vacuum. Since on the level of wave functions, both formalisms are mathematically equivalent, this proposal provides for an ontology of QFT that includes a dynamics of individual processes, solves the measurement problem and explains the appearance of creation and annihilation events.

  3. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    Energy Technology Data Exchange (ETDEWEB)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G. [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada); Law, R. David, E-mail: dlaw@lakeheadu.ca [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada)

    2012-10-15

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  4. Using Ontology Fingerprints to evaluate genome-wide association study results

    OpenAIRE

    Tsoi, Lam C.; Michael Boehnke; Klein, Richard L.; Jim Zheng, W.

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  5. Ontology design patterns to disambiguate relations between genes and gene products in GENIA.

    Science.gov (United States)

    Hoehndorf, Robert; Ngonga Ngomo, Axel-Cyrille; Pyysalo, Sampo; Ohta, Tomoko; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2011-10-06

    Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  6. Ontology design patterns to disambiguate relations between genes and gene products in GENIA

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2011-10-01

    Full Text Available Abstract Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  7. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data.

    Science.gov (United States)

    Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan

    2016-01-01

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

  8. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  9. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology.

    Science.gov (United States)

    Clemente, José C; Satou, Kenji; Valiente, Gabriel

    2005-01-01

    There has been much interest in the structural comparison and alignment of metabolic pathways. Several techniques have been conceived to assess the similarity of metabolic pathways of different organisms. In this paper, we show that the combination of a new heuristic algorithm for the comparison of metabolic pathways together with any of three enzyme similarity measures (hierarchical, information content, and gene ontology) can be used to derive a metabolic pathway similarity measure that is suitable for reconstructing phylogenetic relationships from metabolic pathways. Experimental results on the Glycolysis pathway of 73 organisms representing the three domains of life show that our method outperforms previous techniques.

  10. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology.

    Science.gov (United States)

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2015-09-01

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible online ( http://www.iitr.ac.in/ajayshiv/ ) through a user-friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multicomponent complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  11. Visualization and analysis of microarray and gene ontology data with treemaps

    Directory of Open Access Journals (Sweden)

    Babaria Ketan

    2004-06-01

    Full Text Available Abstract Background The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Results Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Conclusions Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.

  12. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

    Science.gov (United States)

    Köhler, Sebastian; Doelken, Sandra C; Ruef, Barbara J; Bauer, Sebastian; Washington, Nicole; Westerfield, Monte; Gkoutos, George; Schofield, Paul; Smedley, Damian; Lewis, Suzanna E; Robinson, Peter N; Mungall, Christopher J

    2013-01-01

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  13. CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

    Science.gov (United States)

    Park, Julie; Costanzo, Maria C; Balakrishnan, Rama; Cherry, J Michael; Hong, Eurie L

    2012-01-01

    The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

  14. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research.

    Science.gov (United States)

    Primmer, C R; Papakostas, S; Leder, E H; Davis, M J; Ragan, M A

    2013-06-01

    Recent advances in molecular technologies have opened up unprecedented opportunities for molecular ecologists to better understand the molecular basis of traits of ecological and evolutionary importance in almost any organism. Nevertheless, reliable and systematic inference of functionally relevant information from these masses of data remains challenging. The aim of this review is to highlight how the Gene Ontology (GO) database can be of use in resolving this challenge. The GO provides a largely species-neutral source of information on the molecular function, biological role and cellular location of tens of thousands of gene products. As it is designed to be species-neutral, the GO is well suited for cross-species use, meaning that, functional annotation derived from model organisms can be transferred to inferred orthologues in newly sequenced species. In other words, the GO can provide gene annotation information for species with nonannotated genomes. In this review, we describe the GO database, how functional information is linked with genes/gene products in model organisms, and how molecular ecologists can utilize this information to annotate their own data. Then, we outline various applications of GO for enhancing the understanding of molecular basis of traits in ecologically relevant species. We also highlight potential pitfalls, provide step-by-step recommendations for conducting a sound study in nonmodel organisms, suggest avenues for future research and outline a strategy for maximizing the benefits of a more ecological and evolutionary genomics-oriented ontology by ensuring its compatibility with the GO. © 2013 John Wiley & Sons Ltd.

  15. Changes in winter depression phenotype correlate with white blood cell gene expression profiles: a combined metagene and gene ontology approach.

    Science.gov (United States)

    Bosker, Fokko J; Terpstra, Peter; Gladkevich, Anatoliy V; Janneke Dijck-Brouwer, D A; te Meerman, Gerard; Nolen, Willem A; Schoevers, Robert A; Meesters, Ybe

    2015-04-03

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior and following bright light therapy and in summer. RNA was isolated, converted into cRNA, amplified and hybridized on Illumina® gene expression arrays. The raw optical array data were quantile normalized and thereafter analyzed using a metagene approach, based on previously published Affymetrix gene array data. The raw data were also subjected to a secondary analysis focusing on circadian genes and genes involved in serotonergic neurotransmission. Differences between the conditions were analyzed, using analysis of variance on the principal components of the metagene score matrix. After correction for multiple testing no statistically significant differences were found. Another approach uses the correlation between metagene factor weights and the actual expression values, averaged over conditions. When comparing the correlations of winter vs. summer and bright light therapy vs. summer significant changes for several metagenes were found. Subsequent gene ontology analyses (DAVID and GeneTrail) of 5 major metagenes suggest an interaction between brain and white blood cells. The hypothesis driven analysis with a smaller group of genes failed to demonstrate any significant effects. The results from the combined metagene and gene ontology analyses support the idea of communication between brain and white blood cells. Future studies will need a much larger sample size to obtain information at the level of single genes. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

    Science.gov (United States)

    Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J

    2016-02-01

    Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Orymold: ontology based gene expression data integration and analysis tool applied to rice

    Directory of Open Access Journals (Sweden)

    Segura Jordi

    2009-05-01

    Full Text Available Abstract Background Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies. Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces. Results We designed and implemented a software tool that allows investigators to create their own semantic model of an organism and to use it to dynamically integrate expression data obtained from DNA microarrays and other probe based technologies. The software provides tools to use the semantic model to postulate and validate of hypotheses on the spatial and temporal expression and function of genes. In order to illustrate the software's use and features, we used it to build a semantic model of rice (Oryza sativa and integrated experimental data into it. Conclusion In this paper we describe the development and features of a flexible software application for dynamic gene expression data annotation, integration, and exploration called Orymold. Orymold is freely available for non-commercial users from http://www.oryzon.com/media/orymold.html

  18. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Science.gov (United States)

    2011-01-01

    Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches. PMID:21914205

  19. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Directory of Open Access Journals (Sweden)

    Kirsten Toralf

    2011-09-01

    Full Text Available Abstract Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.

  20. Visualizing the temporal distribution of terminologies for biological ontology development

    Science.gov (United States)

    Kim, Tak-eun; Lee, Hodong; Park, Jinah; Park, Jong C.

    2008-01-01

    Communities in biology have developed a number of ontologies that provide standard terminologies for the characteristics of various concepts and their relationships. However, it is difficult to construct and maintain such ontologies in biology, since it is a non-trivial task to identify commonly used potential member terms in a particular ontology, in the presence of constant changes of such terms over time as the research in the field advances. In this paper, we propose a visualization system, called BioTermViz, which presents the temporal distribution of ontological terms from the text of published journal abstracts. BioTermViz shows such a temporal distribution of terms for journal abstracts in the order of published time, occurrences of the annotated Gene Ontology concepts per abstract, and the ontological hierarchy of the terms. With a combination of these three types of information, we can capture the global tendency in the use of terms, and identify a particular term or terms to be created, modified, segmented, or removed, effectively developing biological ontologies in an interactive manner. In order to demonstrate the practical utility of BioTermViz, we describe several scenarios for the development of an ontology for a specific sub-class of proteins, or ubiquitin-protein ligases.

  1. Identification of genes involved in radioresistance of nasopharyngeal carcinoma by integrating gene ontology and protein-protein interaction networks.

    Science.gov (United States)

    Guo, Ya; Zhu, Xiao-Dong; Qu, Song; Li, Ling; Su, Fang; Li, Ye; Huang, Shi-Ting; Li, Dan-Rong

    2012-01-01

    Radioresistance remains one of the important factors in relapse and metastasis of nasopharyngeal carcinoma. Thus, it is imperative to identify genes involved in radioresistance and explore the underlying biological processes in the development of radioresistance. In this study, we used cDNA microarrays to select differential genes between radioresistant CNE-2R and parental CNE-2 cell lines. One hundred and eighty-three significantly differentially expressed genes (pgenes were upregulated and 45 genes were downregulated in CNE-2R. We further employed publicly available bioinformatics related software, such as GOEAST and STRING to examine the relationship among differentially expressed genes. The results show that these genes were involved in type I interferon-mediated signaling pathway biological processes; the nodes tended to have high connectivity with the EGFR pathway, IFN-related pathways, NF-κB. The node STAT1 has high connectivity with other nodes in the protein-protein interaction (PPI) networks. Finally, the reliability of microarray data was validated for selected genes by semi-quantitative RT-PCR and Western blotting. The results were consistent with the microarray data. Our study suggests that microarrays combined with gene ontology and protein interaction networks have great value in the identification of genes of radioresistance in nasopharyngeal carcinoma; genes involved in several biological processes and protein interaction networks may be relevant to NPC radioresistance; in particular, the verified genes CCL5, STAT1-α, STAT2 and GSTP1 may become potential biomarkers for predicting NPC response to radiotherapy.

  2. An ontology for microbial phenotypes.

    Science.gov (United States)

    Chibucos, Marcus C; Zweifel, Adrienne E; Herrera, Jonathan C; Meza, William; Eslamfam, Shabnam; Uetz, Peter; Siegele, Deborah A; Hu, James C; Giglio, Michelle G

    2014-11-30

    Phenotypic data are routinely used to elucidate gene function in organisms amenable to genetic manipulation. However, previous to this work, there was no generalizable system in place for the structured storage and retrieval of phenotypic information for bacteria. The Ontology of Microbial Phenotypes (OMP) has been created to standardize the capture of such phenotypic information from microbes. OMP has been built on the foundations of the Basic Formal Ontology and the Phenotype and Trait Ontology. Terms have logical definitions that can facilitate computational searching of phenotypes and their associated genes. OMP can be accessed via a wiki page as well as downloaded from SourceForge. Initial annotations with OMP are being made for Escherichia coli using a wiki-based annotation capture system. New OMP terms are being concurrently developed as annotation proceeds. We anticipate that diverse groups studying microbial genetics and associated phenotypes will employ OMP for standardizing microbial phenotype annotation, much as the Gene Ontology has standardized gene product annotation. The resulting OMP resource and associated annotations will facilitate prediction of phenotypes for unknown genes and result in new experimental characterization of phenotypes and functions.

  3. Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available The current increase in Gene Ontology (GO annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

  4. The pathway ontology - updates and applications.

    Science.gov (United States)

    Petri, Victoria; Jayaraman, Pushkala; Tutaj, Marek; Hayman, G Thomas; Smith, Jennifer R; De Pons, Jeff; Laulederkind, Stanley Jf; Lowry, Timothy F; Nigam, Rajni; Wang, Shur-Jen; Shimoyama, Mary; Dwinell, Melinda R; Munzenmaier, Diane H; Worthey, Elizabeth A; Jacob, Howard J

    2014-02-05

    The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. The two released pipelines - the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD "Immune and Inflammatory Disease Portal" at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the 'infectious disease pathway' parent term category. The 'drug pathway' node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by over 75%. Ongoing development of

  5. The pathway ontology – updates and applications

    Science.gov (United States)

    2014-01-01

    Background The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. Results The two released pipelines – the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD “Immune and Inflammatory Disease Portal” at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the ‘infectious disease pathway’ parent term category. The ‘drug pathway’ node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by

  6. Assessing identity, redundancy and confounds in Gene Ontology annotations over time.

    Science.gov (United States)

    Gillis, Jesse; Pavlidis, Paul

    2013-02-15

    The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. Data available at http://chibi.ubc.ca/assessGO.

  7. Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis.

    Science.gov (United States)

    Pashaiasl, Maryam; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2016-09-01

    Diminished ovarian reserve (DOR) is one of the reasons for infertility that not only affects both older and young women. Ovarian reserve assessment can be used as a new prognostic tool for infertility treatment decision making. Here, up- and down-regulated gene expression profiles of granulosa cells were analysed to generate a putative interaction map of the involved genes. In addition, gene ontology (GO) analysis was used to get insight intol the biological processes and molecular functions of involved proteins in DOR. Eleven up-regulated genes and nine down-regulated genes were identified and assessed by constructing interaction networks based on their biological processes. PTGS2, CTGF, LHCGR, CITED, SOCS2, STAR and FSTL3 were the key nodes in the up-regulated networks, while the IGF2, AMH, GREM, and FOXC1 proteins were key in the down-regulated networks. MIRN101-1, MIRN153-1 and MIRN194-1 inhibited the expression of SOCS2, while CSH1 and BMP2 positively regulated IGF1 and IGF2. Ossification, ovarian follicle development, vasculogenesis, sequence-specific DNA binding transcription factor activity, and golgi apparatus are the major differential groups between up-regulated and down-regulated genes in DOR. Meta-analysis of publicly available transcriptomic data highlighted the high coexpression of CTGF, connective tissue growth factor, with the other key regulators of DOR. CTGF is involved in organ senescence and focal adhesion pathway according to GO analysis. These findings provide a comprehensive system biology based insight into the aetiology of DOR through network and gene ontology analyses.

  8. The language of gene ontology: a Zipf’s law analysis

    Directory of Open Access Journals (Sweden)

    Kalankesh Leila

    2012-06-01

    Full Text Available Abstract Background Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf’s law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results Annotations from the Gene Ontology Annotation project were found to follow Zipf’s law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component. On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

  9. Tutorial on Protein Ontology Resources.

    Science.gov (United States)

    Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R; Ross, Karen E; Natale, Darren A

    2017-01-01

    The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.

  10. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  11. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

    Science.gov (United States)

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2016-08-31

    The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .

  12. Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations.

    Science.gov (United States)

    Liu, Gangbiao; Zou, Yangyun; Cheng, Qiqun; Zeng, Yanwu; Gu, Xun; Su, Zhixi

    2014-04-01

    The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional

  13. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer.

    Science.gov (United States)

    Lando, Malin; Holden, Marit; Bergersen, Linn C; Svendsrud, Debbie H; Stokke, Trond; Sundfør, Kolbein; Glad, Ingrid K; Kristensen, Gunnar B; Lyng, Heidi

    2009-11-01

    Integrative analysis of gene dosage, expression, and ontology (GO) data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q) associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1) and 13q (FAM48A, MED4) correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.

  14. Handling multiple testing while interpreting microarrays with the Gene Ontology Database

    Directory of Open Access Journals (Sweden)

    Zhao Hongyu

    2004-09-01

    Full Text Available Abstract Background The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally. Results We present the results from a preliminary study of the distribution one would expect for analyzing sets of genes extracted from Drosophila, S. cerevisiae, Wormbase, and Gramene databases using the Gene Ontology Database. Conclusions We found that the estimated distribution is not regular and is not predictable outside of a particular set of genes. Permutation-based simulations may be necessary to determine the confidence in results of such analyses.

  15. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms

    Directory of Open Access Journals (Sweden)

    Yang Xiang

    2015-01-01

    Full Text Available The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.

  16. Modular Ontology Techniques and their Applications in the Biomedical Domain.

    Science.gov (United States)

    Pathak, Jyotishman; Johnson, Thomas M; Chute, Christopher G

    2008-08-05

    In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing "modules" of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.

  17. The Proteasix Ontology.

    Science.gov (United States)

    Arguello Casteleiro, Mercedes; Klein, Julie; Stevens, Robert

    2016-06-04

    The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.

  18. The neurological disease ontology.

    Science.gov (United States)

    Jensen, Mark; Cox, Alexander P; Chaudhry, Naveed; Ng, Marcus; Sule, Donat; Duncan, William; Ray, Patrick; Weinstock-Guttman, Bianca; Smith, Barry; Ruttenberg, Alan; Szigeti, Kinga; Diehl, Alexander D

    2013-12-06

    We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer's disease, multiple sclerosis, and stroke. ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms 'disease', 'diagnosis', 'disease course', and 'disorder'. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer's disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker. ND seeks to provide a formal foundation for the representation of clinical and research data

  19. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    OpenAIRE

    Zhen Li; Bi-Qing Li; Min Jiang; Lei Chen; Jian Zhang; Lin Liu; Tao Huang

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance...

  20. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    Science.gov (United States)

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  1. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  2. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology.

    Science.gov (United States)

    Ramsak, Živa; Baebler, Špela; Rotter, Ana; Korbar, Matej; Mozetic, Igor; Usadel, Björn; Gruden, Kristina

    2014-01-01

    GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.

  3. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  4. Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: application to Brugada syndrome.

    Science.gov (United States)

    Massanet, Raimon; Gallardo-Chacon, Joan-Josep; Caminal, Pere; Perera, Alexandre

    2009-01-01

    This work presents a methodology for finding phenotype candidate genes starting from a set of known related genes. This is accomplished by automatically mining and organizing the available scientific literature using Gene Ontology-based semantic similarity. As a case study, Brugada syndrome related genes have been used as input in order to obtain a list of other possible candidate genes related with this disease. Brugada anomaly produces a typical alteration in the Electrocardiogram and carriers of the disease show an increased probability of sudden death. Results show a set of semantically coherent proteins that are shown to be related with synaptic transmission and muscle contraction physiological processes.

  5. Ontology searching and browsing at the Rat Genome Database

    Science.gov (United States)

    Laulederkind, Stanley J. F.; Tutaj, Marek; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R.; Jacob, Howard J.

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a ‘driller’ type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. Database URL: http://rgd.mcw.edu/rgdweb/ontology/search.html PMID:22434847

  6. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution.

  7. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D.

    2017-01-01

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new ‘hierarchical view’ of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. PMID:27899595

  8. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  10. Performing ontology.

    Science.gov (United States)

    Aspers, Patrik

    2015-06-01

    Ontology, and in particular, the so-called ontological turn, is the topic of a recent themed issue of Social Studies of Science (Volume 43, Issue 3, 2013). Ontology, or metaphysics, is in philosophy concerned with what there is, how it is, and forms of being. But to what is the science and technology studies researcher turning when he or she talks of ontology? It is argued that it is unclear what is gained by arguing that ontology also refers to constructed elements. The 'ontological turn' comes with the risk of creating a pseudo-debate or pseudo-activity, in which energy is used for no end, at the expense of empirical studies. This text rebuts the idea of an ontological turn as foreshadowed in the texts of the themed issue. It argues that there is no fundamental qualitative difference between the ontological turn and what we know as constructivism.

  11. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  12. Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

    Science.gov (United States)

    Kim, Jin-Dong; Kim, Jung-Jae; Han, Xu; Rebholz-Schuhmann, Dietrich

    2015-01-01

    The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. To further explore the grand theme, we extended the evaluation from a perspective of KB construction. Also, the Gene Regulation Ontology (GRO) task was newly introduced in the third edition. The final evaluation of the participating systems resulted in relatively low performance. The reason was attributed to the large size and complex semantic representation of the ontology. To investigate potential benefits of resource exchange between the presumably similar tasks, we measured the overlap between the datasets of the two tasks, and tested whether the dataset for one task can be used to enhance performance on the other. We report an extended evaluation on all the participating systems in the GE task, incoporating a KB perspective. For the evaluation, the final submission of each participant was converted to RDF statements, and evaluated using 8 queries that were formulated in SPARQL. The results suggest that the evaluation may be concluded differently between the two different perspectives, annotation vs. KB. We also provide a comparison of the GE and GRO tasks by converting their datasets into each other's format. More than 90% of the GE data could be converted into the GRO task format, while only half of the GRO data could be mapped to the GE task format. The imbalance in conversion indicates that the GRO is a comprehensive extension of the GE task ontology. We further used the converted GRO data as additional training data for the GE task, which helped improve GE task participant system performance. However, the converted GE data did not help GRO task participants, due to overfitting and the ontology gap.

  13. ENRICHMENT OF OBO ONTOLOGIES

    Science.gov (United States)

    Bada, Michael; Hunter, Lawrence

    2006-01-01

    This paper describes a frame-based integration of the three GO subontologies, the Chemicals of Biological Interest ontology (ChEBI), and the Cell Type Ontology (CTO) in which relationships between elements of the ontologies are modeled in a way that better captures the relational semantics between biological concepts represented by the terms, rather than between the terms themselves, than previous frame-based efforts. We also describe a methodology for creating suggested enriching assertions of the form (subject, relationship, object) by identifying patterns in GO terms, mapping these patterns and subpatterns to relationships, matching concepts to these patterns and subpatterns, and integrating these assertions into the ontologies. Using this methodology, a large number of reliable assertions linking previously unlinked OBO terms using a wide variety of specific, hierarchically arranged relationships were created: A predicted assertion was made for 62% of GO terms that matched one of 31 patterns, and 97% of these predicted assertions were assessed to be valid; a further 429 assertions (corresponding to 6% of the matching terms) were manually created, resulting in an initial set of 4,497 assertions. Furthermore, this methodology programmatically integrates assertions into a base ontology such that each assertion is fully consistent with respect to higher (i.e., more general) relevant class and slot levels. Such an integration is absent from previous compositional efforts, and we argue its necessity for the creation of coherent biological ontologies when linking previously unlinked terms. PMID:17011833

  14. Ontology Usage at ZFIN

    CERN Document Server

    Howe, Doug

    2010-01-01

    The Zebrafish Model Organism Database (ZFIN) provides a Web resource of zebrafish genomic, genetic, developmental, and phenotypic data. Four different ontologies are currently used to annotate data to the most specific term available facilitating a better comparison between inter-species data. In addition, ontologies are used to help users find and cluster data more quickly without the need of knowing the exact technical name for a term.

  15. GOlorize: a Cytoscape plug-in for network visualization with Gene Ontology-based layout and coloring

    OpenAIRE

    Garcia, O.; Saveanu, C.; Cline, M.; Fromont-Racine, M; Jacquier, A; Schwikowski, B.; Aittokallio, T.

    2007-01-01

    International audience; We have implemented a graph layout algorithm that exposes Gene Ontology (GO) class structure on the network nodes. It can be used in conjunction with BiNGO plug-in to Cytoscape, which finds the GO categories over-represented in a given network. Our plug-in, named GOlorize, first highlights the class members with category-specific color-coding and then constructs an enhanced visualization of the network using a class-directed layout algorithm. AVAILABILITY: http://www.c...

  16. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  17. Ontology development for Sufism domain

    Science.gov (United States)

    Iqbal, Rizwan

    2012-01-01

    Domain ontology is a descriptive representation of any particular domain which in detail describes the concepts in a domain, the relationships among those concepts and organizes them in a hierarchal manner. It is also defined as a structure of knowledge, used as a means of knowledge sharing to the community. An Important aspect of using ontologies is to make information retrieval more accurate and efficient. Thousands of domain ontologies from all around the world are available online on ontology repositories. Ontology repositories like SWOOGLE currently have over 1000 ontologies covering a wide range of domains. It was found that up to date there was no ontology available covering the domain of "Sufism". This unavailability of "Sufism" domain ontology became a motivation factor for this research. This research came up with a working "Sufism" domain ontology as well a framework, design of the proposed framework focuses on the resolution to problems which were experienced while creating the "Sufism" ontology. The development and working of the "Sufism" domain ontology are covered in detail in this research. The word "Sufism" is a term which refers to Islamic mysticism. One of the reasons to choose "Sufism" for ontology creation is its global curiosity. This research has also managed to create some individuals which inherit the concepts from the "Sufism" ontology. The creation of individuals helps to demonstrate the efficient and precise retrieval of data from the "Sufism" domain ontology. The experiment of creating the "Sufism" domain ontology was carried out on a tool called Protégé. Protégé is a tool which is used for ontology creation, editing and it is open source.

  18. A unified anatomy ontology of the vertebrate skeletal system.

    Directory of Open Access Journals (Sweden)

    Wasila M Dahdul

    Full Text Available The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO, to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish and multispecies (teleost, amphibian vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages, and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO, Gene Ontology (GO, Uberon, and Cell Ontology (CL, and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  19. A unified anatomy ontology of the vertebrate skeletal system.

    Science.gov (United States)

    Dahdul, Wasila M; Balhoff, James P; Blackburn, David C; Diehl, Alexander D; Haendel, Melissa A; Hall, Brian K; Lapp, Hilmar; Lundberg, John G; Mungall, Christopher J; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E; Vickaryous, Matthew K; Westerfield, Monte; Mabee, Paula M

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  20. Standardized description of scientific evidence using the Evidence Ontology (ECO).

    Science.gov (United States)

    Chibucos, Marcus C; Mungall, Christopher J; Balakrishnan, Rama; Christie, Karen R; Huntley, Rachael P; White, Owen; Blake, Judith A; Lewis, Suzanna E; Giglio, Michelle

    2014-01-01

    The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org.

  1. A Unified Anatomy Ontology of the Vertebrate Skeletal System

    Science.gov (United States)

    Dahdul, Wasila M.; Balhoff, James P.; Blackburn, David C.; Diehl, Alexander D.; Haendel, Melissa A.; Hall, Brian K.; Lapp, Hilmar; Lundberg, John G.; Mungall, Christopher J.; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E.; Vickaryous, Matthew K.; Westerfield, Monte; Mabee, Paula M.

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity. PMID:23251424

  2. Engineering Ontologies

    OpenAIRE

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering systems modelling, simulation and design. A general and formal ontology, called PHYSSYS, for dynamic physical systems is presented and its structuring principles are discussed. We show how the PHYSSYS ...

  3. Multiple Trait Covariance Association Test Identifies Gene Ontology Categories Associated with Chill Coma Recovery Time in Drosophila melanogaster /631/208/721 /631/208/721 /631/208/729/743 /631/208/729/743 /119 /45 /45/43 article

    DEFF Research Database (Denmark)

    Sørensen, Izel Fourie; Edwards, Stefan M.; Rohde, Palle Duun

    2017-01-01

    features, here defined by gene ontology (GO) terms, enriched for causal variants affecting a quantitative trait in a population with low degree of relatedness. Different set test approaches were compared using simulated data illustrating the impact of trait- and genomic feature-specific factors...

  4. Identification of protein features encoded by alternative exons using Exon Ontology.

    Science.gov (United States)

    Tranchevent, Léon-Charles; Aubé, Fabien; Dulaurier, Louis; Benoit-Pilven, Clara; Rey, Amandine; Poret, Arnaud; Chautard, Emilie; Mortada, Hussein; Desmet, François-Olivier; Chakrama, Fatima Zahra; Moreno-Garcia, Maira Alejandra; Goillot, Evelyne; Janczarski, Stéphane; Mortreux, Franck; Bourgeois, Cyril F; Auboeuf, Didier

    2017-06-01

    Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named "Exon Ontology," based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information. © 2017 Tranchevent et al.; Published by Cold Spring Harbor Laboratory Press.

  5. Ontologies for Bioinformatics

    Directory of Open Access Journals (Sweden)

    Agnieszka Leszczynski

    2008-01-01

    Full Text Available The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.

  6. Engineering Ontologies

    NARCIS (Netherlands)

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering syst

  7. Survey of modular ontology techniques and their applications in the biomedical domain.

    Science.gov (United States)

    Pathak, Jyotishman; Johnson, Thomas M; Chute, Christopher G

    2009-08-01

    In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing "modules" of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.

  8. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Science.gov (United States)

    Chen, Xiaoshu; Zhang, Jianzhi

    2012-01-01

    The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and

  9. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaoshu Chen

    Full Text Available The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species paralogs than (between-species orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny

  10. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  11. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies.

    Science.gov (United States)

    Walls, Ramona L; Deck, John; Guralnick, Robert; Baskauf, Steve; Beaman, Reed; Blum, Stanley; Bowers, Shawn; Buttigieg, Pier Luigi; Davies, Neil; Endresen, Dag; Gandolfo, Maria Alejandra; Hanner, Robert; Janning, Alyssa; Krishtalka, Leonard; Matsunaga, Andréa; Midford, Peter; Morrison, Norman; Ó Tuama, Éamonn; Schildhauer, Mark; Smith, Barry; Stucky, Brian J; Thomer, Andrea; Wieczorek, John; Whitacre, Jamie; Wooley, John

    2014-01-01

    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

  12. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies.

    Directory of Open Access Journals (Sweden)

    Ramona L Walls

    Full Text Available The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques, as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1 individual organisms, including voucher specimens from ecological studies and museum specimens, 2 bulk or environmental samples (e.g., gut contents, soil, water that include DNA, other molecules, and potentially many organisms, especially microbes, and 3 survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and

  13. Efficient Management of Biomedical Ontology Versions

    Science.gov (United States)

    Kirsten, Toralf; Hartung, Michael; Groß, Anika; Rahm, Erhard

    Ontologies have become very popular in life sciences and other domains. They mostly undergo continuous changes and new ontology versions are frequently released. However, current analysis studies do not consider the ontology changes reflected in different versions but typically limit themselves to a specific ontology version which may quickly become obsolete. To allow applications easy access to different ontology versions we propose a central and uniform management of the versions of different biomedical ontologies. The proposed database approach takes concept and structural changes of succeeding ontology versions into account thereby supporting different kinds of change analysis. Furthermore, it is very space-efficient by avoiding redundant storage of ontology components which remain unchanged in different versions. We evaluate the storage requirements and query performance of the proposed approach for the Gene Ontology.

  14. FYPO: the fission yeast phenotype ontology.

    Science.gov (United States)

    Harris, Midori A; Lock, Antonia; Bähler, Jürg; Oliver, Stephen G; Wood, Valerie

    2013-07-01

    To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).

  15. Biomedical ontologies: a functional perspective.

    Science.gov (United States)

    Rubin, Daniel L; Shah, Nigam H; Noy, Natalya F

    2008-01-01

    The information explosion in biology makes it difficult for researchers to stay abreast of current biomedical knowledge and to make sense of the massive amounts of online information. Ontologies--specifications of the entities, their attributes and relationships among the entities in a domain of discourse--are increasingly enabling biomedical researchers to accomplish these tasks. In fact, bio-ontologies are beginning to proliferate in step with accruing biological data. The myriad of ontologies being created enables researchers not only to solve some of the problems in handling the data explosion but also introduces new challenges. One of the key difficulties in realizing the full potential of ontologies in biomedical research is the isolation of various communities involved: some workers spend their career developing ontologies and ontology-related tools, while few researchers (biologists and physicians) know how ontologies can accelerate their research. The objective of this review is to give an overview of biomedical ontology in practical terms by providing a functional perspective--describing how bio-ontologies can and are being used. As biomedical scientists begin to recognize the many different ways ontologies enable biomedical research, they will drive the emergence of new computer applications that will help them exploit the wealth of research data now at their fingertips.

  16. Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

    Directory of Open Access Journals (Sweden)

    Hakenberg Jörg

    2009-01-01

    Full Text Available Abstract Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success than on MeSH (73% success as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90

  17. BioPortal: ontologies and integrated data resources at the click of a mouse.

    Science.gov (United States)

    Noy, Natalya F; Shah, Nigam H; Whetzel, Patricia L; Dai, Benjamin; Dorf, Michael; Griffith, Nicholas; Jonquet, Clement; Rubin, Daniel L; Storey, Margaret-Anne; Chute, Christopher G; Musen, Mark A

    2009-07-01

    Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.

  18. Ontology Research

    OpenAIRE

    Welty, Christopher

    2003-01-01

    In this issue, I have collected a fairly broad, although by no means exhaustive, sampling of work in the field of ontology research. To define a field is often quite difficult; it is more a collection of people and ideas than it is a specific technology. To represent our field, I present six articles that cover several of the major thrusts of ontology research from the past decade.

  19. Using Network Extracted Ontologies to Identify Novel Genes with Roles in Appressorium Development in the Rice Blast Fungus Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Ryan M. Ames

    2017-01-01

    Full Text Available Magnaporthe oryzae is the causal agent of rice blast disease, the most important infection of rice worldwide. Half the world’s population depends on rice for its primary caloric intake and, as such, rice blast poses a serious threat to food security. The stages of M. oryzae infection are well defined, with the formation of an appressorium, a cell type that allows penetration of the plant cuticle, particularly well studied. However, many of the key pathways and genes involved in this disease stage are yet to be identified. In this study, I have used network-extracted ontologies (NeXOs, hierarchical structures inferred from RNA-Seq data, to identify pathways involved in appressorium development, which in turn highlights novel genes with potential roles in this process. This study illustrates the use of NeXOs for pathway identification from large-scale genomics data and also identifies novel genes with potential roles in disease. The methods presented here will be useful to study disease processes in other pathogenic species and these data represent predictions of novel targets for intervention in M. oryzae.

  20. Mouse anatomy ontologies: enhancements and tools for exploring and integrating biomedical data.

    Science.gov (United States)

    Hayamizu, Terry F; Baldock, Richard A; Ringwald, Martin

    2015-10-01

    Mouse anatomy ontologies provide standard nomenclature for describing normal and mutant mouse anatomy, and are essential for the description and integration of data directly related to anatomy such as gene expression patterns. Building on our previous work on anatomical ontologies for the embryonic and adult mouse, we have recently developed a new and substantially revised anatomical ontology covering all life stages of the mouse. Anatomical terms are organized in complex hierarchies enabling multiple relationships between terms. Tissue classification as well as partonomic, developmental, and other types of relationships can be represented. Hierarchies for specific developmental stages can also be derived. The ontology forms the core of the eMouse Atlas Project (EMAP) and is used extensively for annotating and integrating gene expression patterns and other data by the Gene Expression Database (GXD), the eMouse Atlas of Gene Expression (EMAGE) and other database resources. Here we illustrate the evolution of the developmental and adult mouse anatomical ontologies toward one combined system. We report on recent ontology enhancements, describe the current status, and discuss future plans for mouse anatomy ontology development and application in integrating data resources.

  1. The Ontology for Biomedical Investigations.

    Science.gov (United States)

    Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias; Brush, Matthew H; Bug, Bill; Chibucos, Marcus C; Clancy, Kevin; Courtot, Mélanie; Derom, Dirk; Dumontier, Michel; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Gibson, Frank; Gonzalez-Beltran, Alejandra; Haendel, Melissa A; He, Yongqun; Heiskanen, Mervi; Hernandez-Boussard, Tina; Jensen, Mark; Lin, Yu; Lister, Allyson L; Lord, Phillip; Malone, James; Manduchi, Elisabetta; McGee, Monnie; Morrison, Norman; Overton, James A; Parkinson, Helen; Peters, Bjoern; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H; Schober, Daniel; Smith, Barry; Soldatova, Larisa N; Stoeckert, Christian J; Taylor, Chris F; Torniai, Carlo; Turner, Jessica A; Vita, Randi; Whetzel, Patricia L; Zheng, Jie

    2016-01-01

    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed

  2. The Ontology for Biomedical Investigations.

    Directory of Open Access Journals (Sweden)

    Anita Bandrowski

    Full Text Available The Ontology for Biomedical Investigations (OBI is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI and Phenotype Attribute and Trait Ontology (PATO without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT. The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org providing details on the people, policies, and issues being

  3. Generating Application Ontologies from Reference Ontologies

    OpenAIRE

    Shaw, Marianne; Detwiler, Landon T.; Brinkley, James F.; Suciu, Dan

    2008-01-01

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies.

  4. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  5. Ontology Design Patterns: Bridging the Gap Between Local Semantic Use Cases and Large-Scale, Long-Term Data Integration

    Science.gov (United States)

    Shepherd, Adam; Arko, Robert; Krisnadhi, Adila; Hitzler, Pascal; Janowicz, Krzysztof; Chandler, Cyndy; Narock, Tom; Cheatham, Michelle; Schildhauer, Mark; Jones, Matt; Raymond, Lisa; Mickle, Audrey; Finin, Tim; Fils, Doug; Carbotte, Suzanne; Lehnert, Kerstin

    2015-04-01

    Integrating datasets for new use cases is one of the common drivers for adopting semantic web technologies. Even though linked data principles enables this type of activity over time, the task of reconciling new ontological commitments for newer use cases can be daunting. This situation was faced by the Biological and Chemical Oceanography Data Management Office (BCO-DMO) as it sought to integrate its existing linked data with other data repositories to address newer scientific use cases as a partner in the GeoLink Project. To achieve a successful integration with other GeoLink partners, BCO-DMO's metadata would need to be described using the new ontologies developed by the GeoLink partners - a situation that could impact semantic inferencing, pre-existing software and external users of BCO-DMO's linked data. This presentation describes the process of how GeoLink is bridging the gap between local, pre-existing ontologies to achieve scientific metadata integration for all its partners through the use of ontology design patterns. GeoLink, an NSF EarthCube Building Block, brings together experts from the geosciences, computer science, and library science in an effort to improve discovery and reuse of data and knowledge. Its participating repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecology and biogeochemistry to paleoclimatology. GeoLink's outcomes include a set of reusable ontology design patterns (ODPs) that describe core geoscience concepts, a network of Linked Data published by participating repositories using those ODPs, and tools to facilitate discovery of related content in multiple repositories.

  6. The ontology of biological sequences

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2009-11-01

    Full Text Available Abstract Background Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike. Results We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving. Conclusion Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.

  7. Building Ontologies in DAML + OIL

    Science.gov (United States)

    Wroe, Chris; Bechhofer, Sean; Lord, Phillip; Rector, Alan; Goble, Carole

    2003-01-01

    In this article we describe an approach to representing and building ontologies advocated by the Bioinformatics and Medical Informatics groups at the University of Manchester. The hand-crafting of ontologies offers an easy and rapid avenue to delivering ontologies. Experience has shown that such approaches are unsustainable. Description logic approaches have been shown to offer computational support for building sound, complete and logically consistent ontologies. A new knowledge representation language, DAML + OIL, offers a new standard that is able to support many styles of ontology, from hand-crafted to full logic-based descriptions with reasoning support. We describe this language, the OilEd editing tool, reasoning support and a strategy for the language’s use. We finish with a current example, in the Gene Ontology Next Generation (GONG) project, that uses DAML + OIL as the basis for moving the Gene Ontology from its current hand-crafted, form to one that uses logical descriptions of a concept’s properties to deliver a more complete version of the ontology. PMID:18629114

  8. Towards a Formalized Ontology-Based Requirements Model

    Institute of Scientific and Technical Information of China (English)

    JIANG Dan-dong; ZHANG Shen-sheng; WANG Ying-lin

    2005-01-01

    The goal of this paper is to take a further step towards an ontological approach for representing requirements information. The motivation for ontologies was discussed. The definitions of ontology and requirements ontology were given. Then, it presented a collection of informal terms, including four subject areas. It also discussed the formalization process of ontology. The underlying meta-ontology was determined, and the formalized requirements ontology was analyzed. This formal ontology is built to serve as a basis for requirements model. Finally, the implementation of software system was given.

  9. Using phylogenomic patterns and gene ontology to identify proteins of importance in plant evolution.

    Science.gov (United States)

    Cibrián-Jaramillo, Angélica; De la Torre-Bárcena, Jose E; Lee, Ernest K; Katari, Manpreet S; Little, Damon P; Stevenson, Dennis W; Martienssen, Rob; Coruzzi, Gloria M; DeSalle, Rob

    2010-07-12

    We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants.

  10. SUGOI: automated ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2015-04-01

    Full Text Available A foundational ontology can solve interoperability issues among the domain ontologies aligned to it. However, several foundational ontologies have been developed, hence such interoperability issues exist among domain ontologies. The novel SUGOI tool...

  11. The plant ontology as a tool for comparative plant anatomy and genomic analyses.

    Science.gov (United States)

    Cooper, Laurel; Walls, Ramona L; Elser, Justin; Gandolfo, Maria A; Stevenson, Dennis W; Smith, Barry; Preece, Justin; Athreya, Balaji; Mungall, Christopher J; Rensing, Stefan; Hiss, Manuel; Lang, Daniel; Reski, Ralf; Berardini, Tanya Z; Li, Donghui; Huala, Eva; Schaeffer, Mary; Menda, Naama; Arnaud, Elizabeth; Shrestha, Rosemary; Yamazaki, Yukiko; Jaiswal, Pankaj

    2013-02-01

    The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.

  12. POEAS: Automated Plant Phenomic Analysis Using Plant Ontology.

    Science.gov (United States)

    Shameer, Khader; Naika, Mahantesha Bn; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Biological enrichment analysis using gene ontology (GO) provides a global overview of the functional role of genes or proteins identified from large-scale genomic or proteomic experiments. Phenomic enrichment analysis of gene lists can provide an important layer of information as well as cellular components, molecular functions, and biological processes associated with gene lists. Plant phenomic enrichment analysis will be useful for performing new experiments to better understand plant systems and for the interpretation of gene or proteins identified from high-throughput experiments. Plant ontology (PO) is a compendium of terms to define the diverse phenotypic characteristics of plant species, including plant anatomy, morphology, and development stages. Adoption of this highly useful ontology is limited, when compared to GO, because of the lack of user-friendly tools that enable the use of PO for statistical enrichment analysis. To address this challenge, we introduce Plant Ontology Enrichment Analysis Server (POEAS) in the public domain. POEAS uses a simple list of genes as input data and performs enrichment analysis using Ontologizer 2.0 to provide results in two levels, enrichment results and visualization utilities, to generate ontological graphs that are of publication quality. POEAS also offers interactive options to identify user-defined background population sets, various multiple-testing correction methods, different enrichment calculation methods, and resampling tests to improve statistical significance. The availability of such a tool to perform phenomic enrichment analyses using plant genes as a complementary resource will permit the adoption of PO-based phenomic analysis as part of analytical workflows. POEAS can be accessed using the URL http://caps.ncbs.res.in/poeas.

  13. Ontology Localization

    OpenAIRE

    2009-01-01

    Nuestra meta principal en esta tesis es proponer una solución para construir una ontología multilingüe, a través de la localización automática de una ontología. La noción de localización viene del área de Desarrollo de Software que hace referencia a la adaptación de un producto de software a un ambiente no nativo. En la Ingeniería Ontológica, la localización de ontologías podría ser considerada como un subtipo de la localización de software en el cual el producto es un modelo compartido de un...

  14. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO Cellular Component curation

    Directory of Open Access Journals (Sweden)

    Chan Juancarlos

    2009-07-01

    Full Text Available Abstract Background Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%, when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%. From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%. Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given

  15. The Evolving Definition of the Term "Gene".

    Science.gov (United States)

    Portin, Petter; Wilkins, Adam

    2017-04-01

    This paper presents a history of the changing meanings of the term "gene," over more than a century, and a discussion of why this word, so crucial to genetics, needs redefinition today. In this account, the first two phases of 20th century genetics are designated the "classical" and the "neoclassical" periods, and the current molecular-genetic era the "modern period." While the first two stages generated increasing clarity about the nature of the gene, the present period features complexity and confusion. Initially, the term "gene" was coined to denote an abstract "unit of inheritance," to which no specific material attributes were assigned. As the classical and neoclassical periods unfolded, the term became more concrete, first as a dimensionless point on a chromosome, then as a linear segment within a chromosome, and finally as a linear segment in the DNA molecule that encodes a polypeptide chain. This last definition, from the early 1960s, remains the one employed today, but developments since the 1970s have undermined its generality. Indeed, they raise questions about both the utility of the concept of a basic "unit of inheritance" and the long implicit belief that genes are autonomous agents. Here, we review findings that have made the classic molecular definition obsolete and propose a new one based on contemporary knowledge. Copyright © 2017 by the Genetics Society of America.

  16. Research on Forging Die Design Ontology

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wenlei; FAN Yushun

    2006-01-01

    Forging die design is heavily dependent on engineers' experiences. But traditional AI technologies can barely provide a standard knowledge representation style for knowledge transferring. This paper introduces ontology into forging die design. 3-layer forging die design ontology is built, which includes Meta-ontology, Domain-ontology and Bottom ontology. Further, by conceptualization, the concepts and their relations are formally addressed by primitives such as Term, Relation and Function etc, which are explicitly expressed by concept tree. Bottom ontology uses Knowledge Item and Prototype to represent and capture general knowledge for knowledge reuse and share. Forging die design ontology building approach is discussed for standard knowledge representation, knowledge mine and knowledge driven CAD design etc. And OWL language is employed for integration among different domain ontologies integration. Finally a locomotive forging die KBE system is presented to demonstrate this approach.

  17. Development of an Ontology for Periodontitis.

    Science.gov (United States)

    Suzuki, Asami; Takai-Igarashi, Takako; Nakaya, Jun; Tanaka, Hiroshi

    2015-01-01

    In the clinical dentists and periodontal researchers' community, there is an obvious demand for a systems model capable of linking the clinical presentation of periodontitis to underlying molecular knowledge. A computer-readable representation of processes on disease development will give periodontal researchers opportunities to elucidate pathways and mechanisms of periodontitis. An ontology for periodontitis can be a model for integration of large variety of factors relating to a complex disease such as chronic inflammation in different organs accompanied by bone remodeling and immune system disorders, which has recently been referred to as osteoimmunology. Terms characteristic of descriptions related to the onset and progression of periodontitis were manually extracted from 194 review articles and PubMed abstracts by experts in periodontology. We specified all the relations between the extracted terms and constructed them into an ontology for periodontitis. We also investigated matching between classes of our ontology and that of Gene Ontology Biological Process. We developed an ontology for periodontitis called Periodontitis-Ontology (PeriO). The pathological progression of periodontitis is caused by complex, multi-factor interrelationships. PeriO consists of all the required concepts to represent the pathological progression and clinical treatment of periodontitis. The pathological processes were formalized with reference to Basic Formal Ontology and Relation Ontology, which accounts for participants in the processes realized by biological objects such as molecules and cells. We investigated the peculiarity of biological processes observed in pathological progression and medical treatments for the disease in comparison with Gene Ontology Biological Process (GO-BP) annotations. The results indicated that peculiarities of Perio existed in 1) granularity and context dependency of both the conceptualizations, and 2) causality intrinsic to the pathological processes

  18. Bacterial Virus Ontology; Coordinating across Databases.

    Science.gov (United States)

    Hulo, Chantal; Masson, Patrick; Toussaint, Ariane; Osumi-Sutherland, David; de Castro, Edouard; Auchincloss, Andrea H; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis; Le Mercier, Philippe

    2017-05-23

    Bacterial viruses, also called bacteriophages, display a great genetic diversity and utilize unique processes for infecting and reproducing within a host cell. All these processes were investigated and indexed in the ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. Classically, the viral life-cycle is described by schematic pictures. Using this ontology, it can be represented by a combination of successive events: entry, latency, transcription/replication, host-virus interactions and virus release. Each of these parts is broken down into discrete steps. For example enterobacteria phage lambda entry is broken down in: viral attachment to host adhesion receptor, viral attachment to host entry receptor, viral genome ejection and viral genome circularization. To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases.

  19. Ontology for vector surveillance and management.

    Science.gov (United States)

    Lozano-Fuentes, Saul; Bandyopadhyay, Aritra; Cowell, Lindsay G; Goldfain, Albert; Eisen, Lars

    2013-01-01

    Ontologies, which are made up by standardized and defined controlled vocabulary terms and their interrelationships, are comprehensive and readily searchable repositories for knowledge in a given domain. The Open Biomedical Ontologies (OBO) Foundry was initiated in 2001 with the aims of becoming an "umbrella" for life-science ontologies and promoting the use of ontology development best practices. A software application (OBO-Edit; *.obo file format) was developed to facilitate ontology development and editing. The OBO Foundry now comprises over 100 ontologies and candidate ontologies, including the NCBI organismal classification ontology (NCBITaxon), the Mosquito Insecticide Resistance Ontology (MIRO), the Infectious Disease Ontology (IDO), the IDOMAL malaria ontology, and ontologies for mosquito gross anatomy and tick gross anatomy. We previously developed a disease data management system for dengue and malaria control programs, which incorporated a set of information trees built upon ontological principles, including a "term tree" to promote the use of standardized terms. In the course of doing so, we realized that there were substantial gaps in existing ontologies with regards to concepts, processes, and, especially, physical entities (e.g., vector species, pathogen species, and vector surveillance and management equipment) in the domain of surveillance and management of vectors and vector-borne pathogens. We therefore produced an ontology for vector surveillance and management, focusing on arthropod vectors and vector-borne pathogens with relevance to humans or domestic animals, and with special emphasis on content to support operational activities through inclusion in databases, data management systems, or decision support systems. The Vector Surveillance and Management Ontology (VSMO) includes >2,200 unique terms, of which the vast majority (>80%) were newly generated during the development of this ontology. One core feature of the VSMO is the linkage, through

  20. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation

    Science.gov (United States)

    2013-01-01

    Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/. PMID:23409969

  1. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  2. Ontology Requirements Specification

    OpenAIRE

    Suárez-Figueroa, Mari Carmen; Gómez-Pérez, A.

    2012-01-01

    The goal of the ontology requirements specification activity is to state why the ontology is being built, what its intended uses are, who the end users are, and which requirements the ontology should fulfill. This chapter presents detailed methodological guidelines for specifying ontology requirements efficiently. These guidelines will help ontology engineers to capture ontology requirements and produce the ontology requirements specification document (ORSD). The ORSD will play a key role dur...

  3. GoPipe:批量序列的Gene Ontology注释和统计分析%GoPipe: Streamlined Gene Ontology Annotation for Batch Anonymous Sequences With Statistics

    Institute of Scientific and Technical Information of China (English)

    陈作舟; 薛成海; 朱晟; 周丰丰; XUEFENG BRUCE LING; 刘国平; 陈良标

    2005-01-01

    随着后基因组时代的到来,批量的测序,特别是EST的测序,逐渐成为普通实验室的日常工作.这些新的序列往往需要进行批量的Gene Ontology(GO)的注释及随后的统计分析.但是目前除了Goblet以外,并没有软件适合对未知序列进行批量的GO注释,而GoBlet因为具有上载量的限制,以及仅仅利用BLAST作为预测工具,所以仍有许多不足之处.开发了一个软件包GoPipe,通过整合BLAST和InterProScan的结果来进行序列注释,并提供了进一步作统计比较的工具.主程序接收任意个BLAST和InterProScan的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作.还提供了统计的工具来比较不同输入对GO的分布来挖掘生物学意义.另外,在交集工作模式下,程序取InterProScan和BLAST结果的交集,在测试数据集中,其精确度达到99.1%,这大大超过了InterProScan本身对GO预测的精确度,而敏感度只是稍微下降.较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量Gene Ontology注释的理想的工具.上述软件包可以在网站(http://gopipe.fishgenome.org/)免费获得或者与作者联系获取.%Accelerated availability of new sequences, especially ESTs, calls for computational methods to link sequences with Gene Ontology (GO) terms in a batch mode. There is currently no program for such purpose except Goblet, an online tool which uses BLAST to interpret query sequence with proper GO terms, but has a restriction of upload sequence files less than 100 kilobytes in size. GoPipe is a standalone package that integrates BLAST and InterProScan results to obtain Gene Ontology annotation with built-in statistical options. GoPipe takes any number of BLAST and/or InterProScan output files simultaneously and launches jobs sequentially to perform parsing, data integration, redundancy removal, GO distributions calculation and graphic display. A very

  4. NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

    OpenAIRE

    Martinez-Romero, Marcos; Jonquet, Clement; O'Connor, Martin J.; Graybeal, John; Pazos, Alejandro; Musen, Mark A.

    2016-01-01

    Background. Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biome...

  5. NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

    OpenAIRE

    Mart?nez-Romero, Marcos; Jonquet, Clement; O'Connor, Martin J.; Graybeal, John; Pazos, Alejandro; Musen, Mark A.

    2016-01-01

    Background Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomed...

  6. Lentiviral gene ontology (LeGO) vectors equipped with novel drug-selectable fluorescent proteins: new building blocks for cell marking and multi-gene analysis.

    Science.gov (United States)

    Weber, K; Mock, U; Petrowitz, B; Bartsch, U; Fehse, B

    2010-04-01

    Vector-encoded fluorescent proteins (FPs) facilitate unambiguous identification or sorting of gene-modified cells by fluorescence-activated cell sorting (FACS). Exploiting this feature, we have recently developed lentiviral gene ontology (LeGO) vectors (www.LentiGO-Vectors.de) for multi-gene analysis in different target cells. In this study, we extend the LeGO principle by introducing 10 different drug-selectable FPs created by fusing one of the five selection marker (protecting against blasticidin, hygromycin, neomycin, puromycin and zeocin) and one of the five FP genes (Cerulean, eGFP, Venus, dTomato and mCherry). All tested fusion proteins allowed both fluorescence-mediated detection and drug-mediated selection of LeGO-transduced cells. Newly generated codon-optimized hygromycin- and neomycin-resistance genes showed improved expression as compared with their ancestors. New LeGO constructs were produced at titers >10(6) per ml (for non-concentrated supernatants). We show efficient combinatorial marking and selection of various cells, including mesenchymal stem cells, simultaneously transduced with different LeGO constructs. Inclusion of the cytomegalovirus early enhancer/chicken beta-actin promoter into LeGO vectors facilitated robust transgene expression in and selection of neural stem cells and their differentiated progeny. We suppose that the new drug-selectable markers combining advantages of FACS and drug selection are well suited for numerous applications and vector systems. Their inclusion into LeGO vectors opens new possibilities for (stem) cell tracking and functional multi-gene analysis.

  7. Region Evolution eXplorer - A tool for discovering evolution trends in ontology regions.

    Science.gov (United States)

    Christen, Victor; Hartung, Michael; Groß, Anika

    2015-01-01

    A large number of life science ontologies has been developed to support different application scenarios such as gene annotation or functional analysis. The continuous accumulation of new insights and knowledge affects specific portions in ontologies and thus leads to their adaptation. Therefore, it is valuable to study which ontology parts have been extensively modified or remained unchanged. Users can monitor the evolution of an ontology to improve its further development or apply the knowledge in their applications. Here we present REX (Region Evolution eXplorer) a web-based system for exploring the evolution of ontology parts (regions). REX provides an analysis platform for currently about 1,000 versions of 16 well-known life science ontologies. Interactive workflows allow an explorative analysis of changing ontology regions and can be used to study evolution trends for long-term periods. REX is a web application providing an interactive and user-friendly interface to identify (un)stable regions in large life science ontologies. It is available at http://www.izbi.de/rex.

  8. Ontological backdrop

    DEFF Research Database (Denmark)

    Galle, Per

    2000-01-01

    In this report I keep track of ontological assumptions or implications of other OARs, introducing a system of categories and concepts that is compatible with them. The purpose was originally to keep terminology consistent throughout all OARs. However, the report also gives a condensed picture...... of the world view which underlies my current work on product modelling. It contains a justification of my view of concept exemplification, with lines traced back to Kant's work on epistemology....

  9. Building ontologies with basic formal ontology

    CERN Document Server

    Arp, Robert; Spear, Andrew D.

    2015-01-01

    In the era of "big data," science is increasingly information driven, and the potential for computers to store, manage, and integrate massive amounts of data has given rise to such new disciplinary fields as biomedical informatics. Applied ontology offers a strategy for the organization of scientific information in computer-tractable form, drawing on concepts not only from computer and information science but also from linguistics, logic, and philosophy. This book provides an introduction to the field of applied ontology that is of particular relevance to biomedicine, covering theoretical components of ontologies, best practices for ontology design, and examples of biomedical ontologies in use. After defining an ontology as a representation of the types of entities in a given domain, the book distinguishes between different kinds of ontologies and taxonomies, and shows how applied ontology draws on more traditional ideas from metaphysics. It presents the core features of the Basic Formal Ontology (BFO), now u...

  10. The Orthology Ontology: development and applications.

    Science.gov (United States)

    Fernández-Breis, Jesualdo Tomás; Chiba, Hirokazu; Legaz-García, María Del Carmen; Uchiyama, Ikuo

    2016-06-04

    Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.

  11. The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses

    Science.gov (United States)

    Cooper, Laurel; Walls, Ramona L.; Elser, Justin; Gandolfo, Maria A.; Stevenson, Dennis W.; Smith, Barry; Preece, Justin; Athreya, Balaji; Mungall, Christopher J.; Rensing, Stefan; Hiss, Manuel; Lang, Daniel; Reski, Ralf; Berardini, Tanya Z.; Li, Donghui; Huala, Eva; Schaeffer, Mary; Menda, Naama; Arnaud, Elizabeth; Shrestha, Rosemary; Yamazaki, Yukiko; Jaiswal, Pankaj

    2013-01-01

    The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs. PMID:23220694

  12. An empirical analysis of ontology reuse in BioPortal.

    Science.gov (United States)

    Ochs, Christopher; Perl, Yehoshua; Geller, James; Arabandi, Sivaram; Tudorache, Tania; Musen, Mark A

    2017-07-01

    Biomedical ontologies often reuse content (i.e., classes and properties) from other ontologies. Content reuse enables a consistent representation of a domain and reusing content can save an ontology author significant time and effort. Prior studies have investigated the existence of reused terms among the ontologies in the NCBO BioPortal, but as of yet there has not been a study investigating how the ontologies in BioPortal utilize reused content in the modeling of their own content. In this study we investigate how 355 ontologies hosted in the NCBO BioPortal reuse content from other ontologies for the purposes of creating new ontology content. We identified 197 ontologies that reuse content. Among these ontologies, 108 utilize reused classes in the modeling of their own classes and 116 utilize reused properties in class restrictions. Current utilization of reuse and quality issues related to reuse are discussed. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

    Science.gov (United States)

    Diehl, Alexander D; Meehan, Terrence F; Bradford, Yvonne M; Brush, Matthew H; Dahdul, Wasila M; Dougall, David S; He, Yongqun; Osumi-Sutherland, David; Ruttenberg, Alan; Sarntivijai, Sirarat; Van Slyke, Ceri E; Vasilevsky, Nicole A; Haendel, Melissa A; Blake, Judith A; Mungall, Christopher J

    2016-07-04

    The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the

  14. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior

  15. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior an

  16. IDOMAL: an ontology for malaria

    Directory of Open Access Journals (Sweden)

    Topalis Pantelis

    2010-08-01

    Full Text Available Abstract Background Ontologies are rapidly becoming a necessity for the design of efficient information technology tools, especially databases, because they permit the organization of stored data using logical rules and defined terms that are understood by both humans and machines. This has as consequence both an enhanced usage and interoperability of databases and related resources. It is hoped that IDOMAL, the ontology of malaria will prove a valuable instrument when implemented in both malaria research and control measures. Methods The OBOEdit2 software was used for the construction of the ontology. IDOMAL is based on the Basic Formal Ontology (BFO and follows the rules set by the OBO Foundry consortium. Results The first version of the malaria ontology covers both clinical and epidemiological aspects of the disease, as well as disease and vector biology. IDOMAL is meant to later become the nucleation site for a much larger ontology of vector borne diseases, which will itself be an extension of a large ontology of infectious diseases (IDO. The latter is currently being developed in the frame of a large international collaborative effort. Conclusions IDOMAL, already freely available in its first version, will form part of a suite of ontologies that will be used to drive IT tools and databases specifically constructed to help control malaria and, later, other vector-borne diseases. This suite already consists of the ontology described here as well as the one on insecticide resistance that has been available for some time. Additional components are being developed and introduced into IDOMAL.

  17. Ontological Surprises

    DEFF Research Database (Denmark)

    Leahu, Lucian

    2016-01-01

    This paper investigates how we might rethink design as the technological crafting of human-machine relations in the context of a machine learning technique called neural networks. It analyzes Google’s Inceptionism project, which uses neural networks for image recognition. The surprising output of...... a hybrid approach where machine learning algorithms are used to identify objects as well as connections between them; finally, it argues for remaining open to ontological surprises in machine learning as they may enable the crafting of different relations with and through technologies....

  18. Generating application ontologies from reference ontologies.

    Science.gov (United States)

    Shaw, Marianne; Detwiler, Landon T; Brinkley, James F; Suciu, Dan

    2008-11-06

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies. We present extensions to the semantic web query language SparQL that will allow researchers to develop application ontologies that are derived from reference ontologies. We have modified the ARQ query processor to support subqueries, recursive subqueries, and Skolem functions for node creation. We demonstrate the utility of these extensions by deriving an application ontology from the Foundational Model of Anatomy.

  19. Building a biomedical ontology recommender web service.

    Science.gov (United States)

    Jonquet, Clement; Musen, Mark A; Shah, Nigam H

    2010-06-22

    Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use. We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal. We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated 'very relevant' by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.

  20. Where to Publish and Find Ontologies? A Survey of Ontology Libraries

    Science.gov (United States)

    d'Aquin, Mathieu; Noy, Natalya F.

    2011-01-01

    One of the key promises of the Semantic Web is its potential to enable and facilitate data interoperability. The ability of data providers and application developers to share and reuse ontologies is a critical component of this data interoperability: if different applications and data sources use the same set of well defined terms for describing their domain and data, it will be much easier for them to “talk” to one another. Ontology libraries are the systems that collect ontologies from different sources and facilitate the tasks of finding, exploring, and using these ontologies. Thus ontology libraries can serve as a link in enabling diverse users and applications to discover, evaluate, use, and publish ontologies. In this paper, we provide a survey of the growing—and surprisingly diverse—landscape of ontology libraries. We highlight how the varying scope and intended use of the libraries a ects their features, content, and potential exploitation in applications. From reviewing eleven ontology libraries, we identify a core set of questions that ontology practitioners and users should consider in choosing an ontology library for finding ontologies or publishing their own. We also discuss the research challenges that emerge from this survey, for the developers of ontology libraries to address. PMID:22408576

  1. Where to Publish and Find Ontologies? A Survey of Ontology Libraries.

    Science.gov (United States)

    d'Aquin, Mathieu; Noy, Natalya F

    2012-03-01

    One of the key promises of the Semantic Web is its potential to enable and facilitate data interoperability. The ability of data providers and application developers to share and reuse ontologies is a critical component of this data interoperability: if different applications and data sources use the same set of well defined terms for describing their domain and data, it will be much easier for them to "talk" to one another. Ontology libraries are the systems that collect ontologies from different sources and facilitate the tasks of finding, exploring, and using these ontologies. Thus ontology libraries can serve as a link in enabling diverse users and applications to discover, evaluate, use, and publish ontologies. In this paper, we provide a survey of the growing-and surprisingly diverse-landscape of ontology libraries. We highlight how the varying scope and intended use of the libraries a ects their features, content, and potential exploitation in applications. From reviewing eleven ontology libraries, we identify a core set of questions that ontology practitioners and users should consider in choosing an ontology library for finding ontologies or publishing their own. We also discuss the research challenges that emerge from this survey, for the developers of ontology libraries to address.

  2. Infrastructures as Ontological Experiments

    Directory of Open Access Journals (Sweden)

    Casper Bruun Jensen

    2015-11-01

    Full Text Available Ontology has recently gained renewed attention in science and technology studies and anthropology (e.g. Gad, Jensen and Winthereik 2015; Holbraad, Pedersen and Viveiros de Castro 2014; Woolgar and Lezaun 2013. Yet, it has a considerably longer pedigree than these recent debates might lead one to think. Experiments, of course, have long held the attention of sociologists, historians, and philosophers of science (Collins 1985; Gooding 1990; Shapin and Schaffer 1985. And infrastructures have been the focus of sustained inquiry in the sociology and history of technology (Bowker 1994; Hughes 1983. Once these terms are put into conjunction, however, each gets a somewhat different inflection. The following note briefly explores the conceptual purchase of considering infrastructures as ontological experiments.

  3. ONTOLOGY: UNREAL REALITY

    Directory of Open Access Journals (Sweden)

    Piotr Jaroszynski

    2014-12-01

    Full Text Available The article examines the difference between ontology and metaphysics. It shows that as soon as the composition of being from essence and existence is treated as purely mental or in a “reified” way (where essence and existence are independent elements, then essence as essence becomes a thing, and then simply becomes a being, or what is called reality. Both versions in which the real difference disappears or in which the road leads to “reification,” influence the treatment of essence as independent, where essence as thing fills the field of reality. However, if essence was only possibility, then (1 the reality also would be merely possible, (2 the realistic field of philosophical terminology would get curtailed, and (3 there would be no terms to maintain the difference between reality and possibility, between metaphysics and ontology.

  4. NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.

    Science.gov (United States)

    Martínez-Romero, Marcos; Jonquet, Clement; O'Connor, Martin J; Graybeal, John; Pazos, Alejandro; Musen, Mark A

    2017-06-07

    Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios. Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability

  5. Long-term consequences of chronic fluoxetine exposure on the expression of myelination-related genes in the rat hippocampus.

    Science.gov (United States)

    Kroeze, Y; Peeters, D; Boulle, F; Pawluski, J L; van den Hove, D L A; van Bokhoven, H; Zhou, H; Homberg, J R

    2015-09-22

    The selective serotonin reuptake inhibitor (SSRI) fluoxetine is widely prescribed for the treatment of symptoms related to a variety of psychiatric disorders. After chronic SSRI treatment, some symptoms remediate on the long term, but the underlying mechanisms are not yet well understood. Here we studied the long-term consequences (40 days after treatment) of chronic fluoxetine exposure on genome-wide gene expression. During the treatment period, we measured body weight; and 1 week after treatment, cessation behavior in an SSRI-sensitive anxiety test was assessed. Gene expression was assessed in hippocampal tissue of adult rats using transcriptome analysis and several differentially expressed genes were validated in independent samples. Gene ontology analysis showed that upregulated genes induced by chronic fluoxetine exposure were significantly enriched for genes involved in myelination. We also investigated the expression of myelination-related genes in adult rats exposed to fluoxetine at early life and found two myelination-related genes (Transferrin (Tf) and Ciliary neurotrophic factor (Cntf)) that were downregulated by chronic fluoxetine exposure. Cntf, a neurotrophic factor involved in myelination, showed regulation in opposite direction in the adult versus neonatally fluoxetine-exposed groups. Expression of myelination-related genes correlated negatively with anxiety-like behavior in both adult and neonatally fluoxetine-exposed rats. In conclusion, our data reveal that chronic fluoxetine exposure causes on the long-term changes in expression of genes involved in myelination, a process that shapes brain connectivity and contributes to symptoms of psychiatric disorders.

  6. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.

    Directory of Open Access Journals (Sweden)

    Adi L Tarca

    Full Text Available Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples. Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.

  7. An Extended Ontology Model and Ontology Checking Based on Description Logics

    Institute of Scientific and Technical Information of China (English)

    王洪伟; 蒋馥; 吴家春

    2004-01-01

    Ontology is defined as an explicit specification of a conceptualization. In this paper, an extended ontology model was constructed using description logics, which is a 5-tuples including term set, individual set, term definition set, instantiation assertion set and term restriction set. Based on the extended model, the issue on ontology checking was studied with the conclusion that the four kinds of term checking, including term satisfiability checking, term subsumption checking, term equivalence checking and term disjointness checking, can be reduced to the satisfiability checking, and satisfiability checking can be transformed into instantiation consistence checking.

  8. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.

    Science.gov (United States)

    Hastings, Janna; de Matos, Paula; Dekker, Adriano; Ennis, Marcus; Harsha, Bhavana; Kale, Namrata; Muthukrishnan, Venkatesh; Owen, Gareth; Turner, Steve; Williams, Mark; Steinbeck, Christoph

    2013-01-01

    ChEBI (http://www.ebi.ac.uk/chebi) is a database and ontology of chemical entities of biological interest. Over the past few years, ChEBI has continued to grow steadily in content, and has added several new features. In addition to incorporating all user-requested compounds, our annotation efforts have emphasized immunology, natural products and metabolites in many species. All database entries are now 'is_a' classified within the ontology, meaning that all of the chemicals are available to semantic reasoning tools that harness the classification hierarchy. We have completely aligned the ontology with the Open Biomedical Ontologies (OBO) Foundry-recommended upper level Basic Formal Ontology. Furthermore, we have aligned our chemical classification with the classification of chemical-involving processes in the Gene Ontology (GO), and as a result of this effort, the majority of chemical-involving processes in GO are now defined in terms of the ChEBI entities that participate in them. This effort necessitated incorporating many additional biologically relevant compounds. We have incorporated additional data types including reference citations, and the species and component for metabolites. Finally, our website and web services have had several enhancements, most notably the provision of a dynamic new interactive graph-based ontology visualization.

  9. MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken.

    Science.gov (United States)

    Morota, Gota; Beissinger, Timothy M; Peñagaricano, Francisco

    2016-01-01

    Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO), and more recently by Medical Subject Headings (MeSH). Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i) differentially expressed genes, and (ii) candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis, but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies, as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

  10. Agile development of ontologies through conversation

    Science.gov (United States)

    Braines, Dave; Bhattal, Amardeep; Preece, Alun D.; de Mel, Geeth

    2016-05-01

    Ontologies and semantic systems are necessarily complex but offer great potential in terms of their ability to fuse information from multiple sources in support of situation awareness. Current approaches do not place the ontologies directly into the hands of the end user in the field but instead hide them away behind traditional applications. We have been experimenting with human-friendly ontologies and conversational interactions to enable non-technical business users to interact with and extend these dynamically. In this paper we outline our approach via a worked example, covering: OWL ontologies, ITA Controlled English, Sensor/mission matching and conversational interactions between human and machine agents.

  11. Ontological Engineering for the Cadastral Domain

    DEFF Research Database (Denmark)

    Stubkjær, Erik; Stuckenschmidt, Heiner

    2000-01-01

    conceptualization of the world is that much information remains implicit. Ontologies have set out to overcome the problem of implicit and hidden knowledge by making the conceptualization of a domain (e.g. mathematics) explicit. Ontological engineering is thus an approach to achieve a conceptual rigor......The term 'ontology' has been used in many ways and across different communities. In th following we will introduce ontologies as an explication of some shared vocabulary or conceptualization of a specific subject matter. The main problem with the use of a shared vocabulary according to a specific...... that characterizes established academic disciplines, like geodesy. Many university courses address more application oriented fields, like cadastral law, and spatial planning, and they may benefit from the ontological engineering approach. The paper provides an introduction to the field of ontological engineering...

  12. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...

  13. Markov Chain Ontology Analysis (MCOA)

    Science.gov (United States)

    2012-01-01

    Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches

  14. Simple Ontology Format (SOFT)

    Energy Technology Data Exchange (ETDEWEB)

    2011-10-01

    Simple Ontology Format (SOFT) library and file format specification provides a set of simple tools for developing and maintaining ontologies. The library, implemented as a perl module, supports parsing and verification of the files in SOFt format, operations with ontologies (adding, removing, or filtering of entities), and converting of ontologies into other formats. SOFT allows users to quickly create ontologies using only a basic text editor, verify it, and portray it in a graph layout system using customized styles.

  15. Applying the functional abnormality ontology pattern to anatomical functions

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2010-03-01

    Full Text Available Abstract Background Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions. Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately. Results We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions. Conclusions Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities. Availability The source code and results of our analysis are available at http://bioonto.de.

  16. A gross anatomy ontology for hymenoptera.

    Directory of Open Access Journals (Sweden)

    Matthew J Yoder

    Full Text Available Hymenoptera is an extraordinarily diverse lineage, both in terms of species numbers and morphotypes, that includes sawflies, bees, wasps, and ants. These organisms serve critical roles as herbivores, predators, parasitoids, and pollinators, with several species functioning as models for agricultural, behavioral, and genomic research. The collective anatomical knowledge of these insects, however, has been described or referred to by labels derived from numerous, partially overlapping lexicons. The resulting corpus of information--millions of statements about hymenopteran phenotypes--remains inaccessible due to language discrepancies. The Hymenoptera Anatomy Ontology (HAO was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy. The HAO was built using newly developed interfaces within mx, a Web-based, open source software package, that enables collaborators to simultaneously contribute to an ontology. Over twenty people contributed to the development of this ontology by adding terms, genus differentia, references, images, relationships, and annotations. The database interface returns an Open Biomedical Ontology (OBO formatted version of the ontology and includes mechanisms for extracting candidate data and for publishing a searchable ontology to the Web. The application tools are subject-agnostic and may be used by others initiating and developing ontologies. The present core HAO data constitute 2,111 concepts, 6,977 terms (labels for concepts, 3,152 relations, 4,361 sensus (links between terms, concepts, and references and over 6,000 text and graphical annotations. The HAO is rooted with the Common Anatomy Reference Ontology (CARO, in order to facilitate interoperability with and future alignment to other anatomy ontologies, and is available through the OBO Foundry ontology repository and BioPortal. The HAO provides a foundation through which connections between genomic, evolutionary developmental

  17. A gross anatomy ontology for hymenoptera.

    Science.gov (United States)

    Yoder, Matthew J; Mikó, István; Seltmann, Katja C; Bertone, Matthew A; Deans, Andrew R

    2010-12-29

    Hymenoptera is an extraordinarily diverse lineage, both in terms of species numbers and morphotypes, that includes sawflies, bees, wasps, and ants. These organisms serve critical roles as herbivores, predators, parasitoids, and pollinators, with several species functioning as models for agricultural, behavioral, and genomic research. The collective anatomical knowledge of these insects, however, has been described or referred to by labels derived from numerous, partially overlapping lexicons. The resulting corpus of information--millions of statements about hymenopteran phenotypes--remains inaccessible due to language discrepancies. The Hymenoptera Anatomy Ontology (HAO) was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy. The HAO was built using newly developed interfaces within mx, a Web-based, open source software package, that enables collaborators to simultaneously contribute to an ontology. Over twenty people contributed to the development of this ontology by adding terms, genus differentia, references, images, relationships, and annotations. The database interface returns an Open Biomedical Ontology (OBO) formatted version of the ontology and includes mechanisms for extracting candidate data and for publishing a searchable ontology to the Web. The application tools are subject-agnostic and may be used by others initiating and developing ontologies. The present core HAO data constitute 2,111 concepts, 6,977 terms (labels for concepts), 3,152 relations, 4,361 sensus (links between terms, concepts, and references) and over 6,000 text and graphical annotations. The HAO is rooted with the Common Anatomy Reference Ontology (CARO), in order to facilitate interoperability with and future alignment to other anatomy ontologies, and is available through the OBO Foundry ontology repository and BioPortal. The HAO provides a foundation through which connections between genomic, evolutionary developmental biology

  18. Towards an Ontology to Describe the Taxonomy of Common Modules in Learning Management Systems

    Directory of Open Access Journals (Sweden)

    Carlos E. Montenegro Marin

    2011-12-01

    Full Text Available This article have the objective a create ontology for "common modules in a Learning Management Systems", the steps for the build Ontology were: Determine the domain and scope of the ontology, Consider reusing existing ontology, Enumerate important terms in the ontology, Define the classes and the class hierarch, Define the properties of classes—slot and Define the facets of the slot, finally be explained how the ontology is composed.

  19. An Ontology for Insider Threat Indicators Development and Applications

    Science.gov (United States)

    2014-11-01

    J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, et al., " Gene Ontology : tool for the unification of biology," Nature genetics, vol. 25, pp. 25-29...An Ontology for Insider Threat Indicators Development and Applications Daniel L. Costa, Matthew L. Collins, Samuel J. Perl, Michael J. Albrethsen...cert.org Abstract—We describe our ongoing development of an insider threat indicator ontology . Our ontology is intended to serve as a standardized

  20. Contributions to an animal trait ontology.

    Science.gov (United States)

    Hulsegge, B; Smits, M A; te Pas, M F W; Woelders, H

    2012-06-01

    Improved understanding of the biology of traits of livestock species necessitates the use and combination of information that is stored in a variety of different sources such as databases and literature. The ability to effectively combine information from different sources, however, depends on a high level of standardization within and between various resources, at least with respect to the used terminology. Ontologies represent a set of concepts that facilitate standardization of terminology within specific domains of interest. The biological mechanisms underlying quantitative traits of farm animal species related to reproduction and host pathogen interactions are complex and not well understood. This knowledge could be improved through the availability of domain-specific ontologies that provide enhanced possibilities for data annotation, data retrieval, data integration, data exchange, data analysis, and ontology-based searches. Here we describe a framework for domain-specific ontologies and the development of 2 first-generation ontologies: Reproductive Trait and Phenotype Ontology (REPO) and Host Pathogen Interactions Ontology . In these first-generation ontologies, we focused on "female fertility in cattle" and "interactions between pigs and Salmonella". Through this, we contribute to the global initiative toward the development of an Animal Trait Ontology for livestock species. To demonstrate its usefulness, we show how REPO can be used to select candidate genes for fertility.

  1. Margin based ontology sparse vector learning algorithm and applied in biology science.

    Science.gov (United States)

    Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad

    2017-01-01

    In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  2. Datamining with Ontologies.

    Science.gov (United States)

    Hoehndorf, Robert; Gkoutos, Georgios V; Schofield, Paul N

    2016-01-01

    The use of ontologies has increased rapidly over the past decade and they now provide a key component of most major databases in biology and biomedicine. Consequently, datamining over these databases benefits from considering the specific structure and content of ontologies, and several methods have been developed to use ontologies in datamining applications. Here, we discuss the principles of ontology structure, and datamining methods that rely on ontologies. The impact of these methods in the biological and biomedical sciences has been profound and is likely to increase as more datasets are becoming available using common, shared ontologies.

  3. An ontology approach to comparative phenomics in plants

    KAUST Repository

    Oellrich, Anika

    2015-02-25

    Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics

  4. Comparing Relational and Ontological Triple Stores in Healthcare Domain

    Directory of Open Access Journals (Sweden)

    Ozgu Can

    2017-01-01

    Full Text Available Today’s technological improvements have made ubiquitous healthcare systems that converge into smart healthcare applications in order to solve patients’ problems, to communicate effectively with patients, and to improve healthcare service quality. The first step of building a smart healthcare information system is representing the healthcare data as connected, reachable, and sharable. In order to achieve this representation, ontologies are used to describe the healthcare data. Combining ontological healthcare data with the used and obtained data can be maintained by storing the entire health domain data inside big data stores that support both relational and graph-based ontological data. There are several big data stores and different types of big data sets in the healthcare domain. The goal of this paper is to determine the most applicable ontology data store for storing the big healthcare data. For this purpose, AllegroGraph and Oracle 12c data stores are compared based on their infrastructural capacity, loading time, and query response times. Hence, healthcare ontologies (GENE Ontology, Gene Expression Ontology (GEXO, Regulation of Transcription Ontology (RETO, Regulation of Gene Expression Ontology (REXO are used to measure the ontology loading time. Thereafter, various queries are constructed and executed for GENE ontology in order to measure the capacity and query response times for the performance comparison between AllegroGraph and Oracle 12c triple stores.

  5. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  6. The foundational ontology library ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available A purpose of a foundational ontology is to solve interoperability issues among domain ontologies and they are used for ontology- driven conceptual data modelling. Multiple foundational ontologies have been developed in recent years, and most of them...

  7. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  8. A Simple Strategy to Start Domain Ontology from Scratch

    Directory of Open Access Journals (Sweden)

    Ivo Wolff Gersberg

    2014-01-01

    Full Text Available Aiming the usage of Domain Ontology as an educational tool for neophyte students and focusing in a fast and easy way to start Domain Ontology from scratch, the semantics are set aside to identify contexts of concepts (terms to build the ontology. Text Mining, Link Analysis and Graph Analysis create an abstract rough sketch of interactions between terms. This first rough sketch is presented to the expert providing insights into and inspires him to inform or communicate knowledge, through assertive sentences. Those assertive sentences subsidize the creation of the ontology. A web prototype tool to visualize the ontology and retrieve book contents is also presented.

  9. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant.

    Science.gov (United States)

    Ilic, Katica; Kellogg, Elizabeth A; Jaiswal, Pankaj; Zapata, Felipe; Stevens, Peter F; Vincent, Leszek P; Avraham, Shulamit; Reiser, Leonore; Pujar, Anuradha; Sachs, Martin M; Whitman, Noah T; McCouch, Susan R; Schaeffer, Mary L; Ware, Doreen H; Stein, Lincoln D; Rhee, Seung Y

    2007-02-01

    Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological representation of anatomy and morphology of a flowering plant. The PSO is intended for a broad plant research community, including bench scientists, curators in genomic databases, and bioinformaticians. The initial releases of the PSO integrated existing ontologies for Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and rice (Oryza sativa); more recent versions of the ontology encompass terms relevant to Fabaceae, Solanaceae, additional cereal crops, and poplar (Populus spp.). Databases such as The Arabidopsis Information Resource, Nottingham Arabidopsis Stock Centre, Gramene, MaizeGDB, and SOL Genomics Network are using the PSO to describe expression patterns of genes and phenotypes of mutants and natural variants and are regularly contributing new annotations to the Plant Ontology database. The PSO is also used in specialized public databases, such as BRENDA, GENEVESTIGATOR, NASCArrays, and others. Over 10,000 gene annotations and phenotype descriptions from participating databases can be queried and retrieved using the Plant Ontology browser. The PSO, as well as contributed gene associations, can be obtained at www.plantontology.org.

  10. Textpresso: an ontology-based information retrieval and extraction system for biological literature.

    Directory of Open Access Journals (Sweden)

    Hans-Michael Müller

    2004-11-01

    Full Text Available We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc. and classes that relate two objects (e.g., association, regulation, etc. or describe one (e.g., biological process, etc.. Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other

  11. Textpresso: an ontology-based information retrieval and extraction system for biological literature.

    Science.gov (United States)

    Müller, Hans-Michael; Kenny, Eimear E; Sternberg, Paul W

    2004-11-01

    We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism

  12. Ontological metaphors for negative energy in an interdisciplinary context

    Science.gov (United States)

    Dreyfus, Benjamin W.; Geller, Benjamin D.; Gouvea, Julia; Sawtelle, Vashti; Turpen, Chandra; Redish, Edward F.

    2014-12-01

    Teaching about energy in interdisciplinary settings that emphasize coherence among physics, chemistry, and biology leads to a more central role for chemical bond energy. We argue that an interdisciplinary approach to chemical energy leads to modeling chemical bonds in terms of negative energy. While recent work on ontological metaphors for energy has emphasized the affordances of the substance ontology, this ontology is problematic in the context of negative energy. Instead, we apply a dynamic ontologies perspective to argue that blending the substance and location ontologies for energy can be effective in reasoning about negative energy in the context of reasoning about chemical bonds. We present data from an introductory physics for the life sciences course in which both experts and students successfully use this blended ontology. Blending these ontologies is most successful when the substance and location ontologies are combined such that each is strategically utilized in reasoning about particular aspects of energetic processes.

  13. Ontological metaphors for negative energy in an interdisciplinary context

    CERN Document Server

    Dreyfus, Benjamin W; Gouvea, Julia; Sawtelle, Vashti; Turpen, Chandra; Redish, Edward F

    2013-01-01

    Teaching about energy in interdisciplinary settings that emphasize coherence among physics, chemistry, and biology leads to a more central role for chemical bond energy. We argue that an interdisciplinary approach to chemical energy leads to modeling chemical bonds in terms of negative energy. While recent work on ontological metaphors for energy has emphasized the affordances of the substance ontology, this ontology is problematic in the context of negative energy. Instead, we apply a dynamic ontologies perspective to argue that blending the substance and location ontologies for energy can be effective in reasoning about negative energy in the context of reasoning about chemical bonds. We present data from an introductory physics for the life sciences (IPLS) course in which both experts and students successfully use this blended ontology. Blending these ontologies is most successful when the substance and location ontologies are combined such that each is strategically utilized in reasoning about particular ...

  14. Use of the CIM Ontology

    Energy Technology Data Exchange (ETDEWEB)

    Neumann, Scott; Britton, Jay; Devos, Arnold N.; Widergren, Steven E.

    2006-02-08

    There are many uses for the Common Information Model (CIM), an ontology that is being standardized through Technical Committee 57 of the International Electrotechnical Commission (IEC TC57). The most common uses to date have included application modeling, information exchanges, information management and systems integration. As one should expect, there are many issues that become apparent when the CIM ontology is applied to any one use. Some of these issues are shortcomings within the current draft of the CIM, and others are a consequence of the different ways in which the CIM can be applied using different technologies. As the CIM ontology will and should evolve, there are several dangers that need to be recognized. One is overall consistency and impact upon applications when extending the CIM for a specific need. Another is that a tight coupling of the CIM to specific technologies could limit the value of the CIM in the longer term as an ontology, which becomes a larger issue over time as new technologies emerge. The integration of systems is one specific area of interest for application of the CIM ontology. This is an area dominated by the use of XML for the definition of messages. While this is certainly true when using Enterprise Application Integration (EAI) products, it is even more true with the movement towards the use of Web Services (WS), Service-Oriented Architectures (SOA) and Enterprise Service Buses (ESB) for integration. This general IT industry trend is consistent with trends seen within the IEC TC57 scope of power system management and associated information exchange. The challenge for TC57 is how to best leverage the CIM ontology using the various XML technologies and standards for integration. This paper will provide examples of how the CIM ontology is used and describe some specific issues that should be addressed within the CIM in order to increase its usefulness as an ontology. It will also describe some of the issues and challenges that will

  15. An ontology for sensor networks

    Science.gov (United States)

    Compton, Michael; Neuhaus, Holger; Bermudez, Luis; Cox, Simon

    2010-05-01

    Sensors and networks of sensors are important ways of monitoring and digitizing reality. As the number and size of sensor networks grows, so too does the amount of data collected. Users of such networks typically need to discover the sensors and data that fit their needs without necessarily understanding the complexities of the network itself. The burden on users is eased if the network and its data are expressed in terms of concepts familiar to the users and their job functions, rather than in terms of the network or how it was designed. Furthermore, the task of collecting and combining data from multiple sensor networks is made easier if metadata about the data and the networks is stored in a format and conceptual models that is amenable to machine reasoning and inference. While the OGC's (Open Geospatial Consortium) SWE (Sensor Web Enablement) standards provide for the description and access to data and metadata for sensors, they do not provide facilities for abstraction, categorization, and reasoning consistent with standard technologies. Once sensors and networks are described using rich semantics (that is, by using logic to describe the sensors, the domain of interest, and the measurements) then reasoning and classification can be used to analyse and categorise data, relate measurements with similar information content, and manage, query and task sensors. This will enable types of automated processing and logical assurance built on OGC standards. The W3C SSN-XG (Semantic Sensor Networks Incubator Group) is producing a generic ontology to describe sensors, their environment and the measurements they make. The ontology provides definitions for the structure of sensors and observations, leaving the details of the observed domain unspecified. This allows abstract representations of real world entities, which are not observed directly but through their observable qualities. Domain semantics, units of measurement, time and time series, and location and mobility

  16. Improving ontologies by automatic reasoning and evaluation of logical definitions

    Directory of Open Access Journals (Sweden)

    Köhler Sebastian

    2011-10-01

    Full Text Available Abstract Background Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly being adopted as a method for quality control as well as for facilitating interoperability and data integration. Results We show how automated reasoning over logical definitions of ontology terms can be used to improve ontology structure. We provide the Java software package GULO (Getting an Understanding of LOgical definitions, which allows fast and easy evaluation for any kind of logically decomposed ontology by generating a composite OWL ontology from appropriate subsets of the referenced ontologies and comparing the inferred relationships with the relationships asserted in the target ontology. As a case study we show how to use GULO to evaluate the logical definitions that have been developed for the Mammalian Phenotype Ontology (MPO. Conclusions Logical definitions of terms from biomedical ontologies represent an important resource for error and disagreement detection. GULO gives ontology curators a fast and simple tool for validation of their work.

  17. GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products

    Directory of Open Access Journals (Sweden)

    Poustka Annemarie

    2007-05-01

    Full Text Available Abstract Background With the increased availability of high throughput data, such as DNA microarray data, researchers are capable of producing large amounts of biological data. During the analysis of such data often there is the need to further explore the similarity of genes not only with respect to their expression, but also with respect to their functional annotation which can be obtained from Gene Ontology (GO. Results We present the freely available software package GOSim, which allows to calculate the functional similarity of genes based on various information theoretic similarity concepts for GO terms. GOSim extends existing tools by providing additional lately developed functional similarity measures for genes. These can e.g. be used to cluster genes according to their biological function. Vice versa, they can also be used to evaluate the homogeneity of a given grouping of genes with respect to their GO annotation. GOSim hence provides the researcher with a flexible and powerful tool to combine knowledge stored in GO with experimental data. It can be seen as complementary to other tools that, for instance, search for significantly overrepresented GO terms within a given group of genes. Conclusion GOSim is implemented as a package for the statistical computing environment R and is distributed under GPL within the CRAN project.

  18. Ontology for Genome Comparison and Genomic Rearrangements

    Directory of Open Access Journals (Sweden)

    Anil Wipat

    2006-04-01

    Full Text Available We present an ontology for describing genomes, genome comparisons, their evolution and biological function. This ontology will support the development of novel genome comparison algorithms and aid the community in discussing genomic evolution. It provides a framework for communication about comparative genomics, and a basis upon which further automated analysis can be built. The nomenclature defined by the ontology will foster clearer communication between biologists, and also standardize terms used by data publishers in the results of analysis programs. The overriding aim of this ontology is the facilitation of consistent annotation of genomes through computational methods, rather than human annotators. To this end, the ontology includes definitions that support computer analysis and automated transfer of annotations between genomes, rather than relying upon human mediation.

  19. Ontology-based, Tissue MicroArray oriented, image centered tissue bank

    Directory of Open Access Journals (Sweden)

    Viti Federica

    2008-04-01

    Full Text Available Abstract Background Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. Results In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. Conclusions Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes.

  20. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  1. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  2. Statistical mechanics of ontology based annotations

    CERN Document Server

    Hoyle, David C

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotate...

  3. Aber-OWL: a framework for ontology-based data access in biology

    KAUST Repository

    Hoehndorf, Robert

    2015-01-28

    Background: Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. Results: We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net. Conclusions: Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.

  4. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  5. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  6. BiNChE: A web tool and library for chemical enrichment analysis based on the ChEBI ontology

    OpenAIRE

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-01-01

    Background Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology,...

  7. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  8. Therapeutic indications and other use-case-driven updates in the drug ontology: anti-malarials, anti-hypertensives, opioid analgesics, and a large term request.

    Science.gov (United States)

    Hogan, William R; Hanna, Josh; Hicks, Amanda; Amirova, Samira; Bramblett, Baxter; Diller, Matthew; Enderez, Rodel; Modzelewski, Timothy; Vasconcelos, Mirela; Delcher, Chris

    2017-03-03

    The Drug Ontology (DrOn) is an OWL2-based representation of drug products and their ingredients, mechanisms of action, strengths, and dose forms. We originally created DrOn for use cases in comparative effectiveness research, primarily to identify historically complete sets of United States National Drug Codes (NDCs) that represent packaged drug products, by the ingredient(s), mechanism(s) of action, and so on contained in those products. Although we had designed DrOn from the outset to carefully distinguish those entities that have a therapeutic indication from those entities that have a molecular mechanism of action, we had not previously represented in DrOn any particular therapeutic indication. In this work, we add therapeutic indications for three research use cases: resistant hypertension, malaria, and opioid abuse research. We also added mechanisms of action for opioid analgesics and added 108 classes representing drug products in response to a large term request from the Program for Resistance, Immunology, Surveillance and Modeling of Malaria in Uganda (PRISM) project. The net result is a new version of DrOn, current to May 2016, that represents three major therapeutic classes of drugs and six new mechanisms of action. A therapeutic indication of a drug product is represented as a therapeutic function in DrOn. Adverse effects of drug products, as well as other therapeutic uses for which the drug product was not designed are dispositions. Our work provides a framework for representing additional therapeutic indications, adverse effects, and uses of drug products beyond their design. Our work also validated our past modeling decisions for specific types of mechanisms of action, namely effects mediated via receptor and/or enzyme binding. DrOn is available at: http://purl.obolibrary.org/obo/dron.owl . A smaller version without NDCs is available at: http://purl.obolibrary.org/obo/dron/dron-lite.owl.

  9. Toxicology ontology perspectives.

    Science.gov (United States)

    Hardy, Barry; Apic, Gordana; Carthew, Philip; Clark, Dominic; Cook, David; Dix, Ian; Escher, Sylvia; Hastings, Janna; Heard, David J; Jeliazkova, Nina; Judson, Philip; Matis-Mitchell, Sherri; Mitic, Dragana; Myatt, Glenn; Shah, Imran; Spjuth, Ola; Tcheremenskaia, Olga; Toldo, Luca; Watson, David; White, Andrew; Yang, Chihae

    2012-01-01

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  10. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta d...... classification systems and meta data taxonomies, should be based on ontologies.......What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...

  11. Students' Ontological Security and Agency in Science Education—An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-09-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from the societal structures described by Giddens was used to structure the analysis. The results illustrate how the participating students used both modalities for 'Legitimation' and 'Domination' to justify positions that accept or reject new technology. The analysis also showed how norms and knowledge can be used to justify opposing positions in relation to building trust in science and technology, or in democratic decisions expected to favour personal norms. Here, students accepted or rejected the authority of experts based on perceptions of the knowledge base that the authority was seen to be anchored in. Difficulty in discerning between material risks (reduced safety) and immaterial risks (loss of norms) was also found. These outcomes are used to draw attention to the educational challenges associated with students' using knowledge claims (Domination) to support norms (Legitimation) and how this is related to the development of a sense of agency in terms of sharing norms with experts or with laymen.

  12. dcGOR: an R package for analysing ontologies and protein domain annotations.

    Directory of Open Access Journals (Sweden)

    Hai Fang

    2014-10-01

    Full Text Available I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i domain-based enrichment analysis and visualisation; (ii construction of a domain (semantic similarity network according to ontology annotations; and (iii significance analysis for estimating a contact (statistical significance network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro and RNAs (from Rfam as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

  13. dcGOR: an R package for analysing ontologies and protein domain annotations.

    Science.gov (United States)

    Fang, Hai

    2014-10-01

    I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

  14. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....

  15. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research [v1; ref status: indexed, http://f1000r.es/p5

    Directory of Open Access Journals (Sweden)

    Sebastian Köhler

    2013-02-01

    Full Text Available Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  16. Temporal Ontologies for Geoscience: Alignment Challenges

    Science.gov (United States)

    Cox, S. J. D.

    2014-12-01

    Time is a central concept in geoscience. Geologic histories are composed of sequences of geologic processes and events. Calibration of their timing ties a local history into a broader context, and enables correlation of events between locations. The geologic timescale is standardized in the International Chronostratigraphic Chart, which specifies interval names, and calibrations for the ages of the interval boundaries. Time is also a key concept in the world at large. A number of general purpose temporal ontologies have been developed, both stand-alone and as parts of general purpose or upper ontologies. A temporal ontology for geoscience should apply or extend a suitable general purpose temporal ontology. However, geologic time presents two challenges: Geology involves greater spans of time than in other temporal ontologies, inconsistent with the year-month-day/hour-minute-second formalization that is a basic assumption of most general purpose temporal schemes; The geologic timescale is a temporal topology. Its calibration in terms of an absolute (numeric) scale is a scientific issue in its own right supporting a significant community. In contrast, the general purpose temporal ontologies are premised on exact numeric values for temporal position, and do not allow for temporal topology as a primary structure. We have developed an ontology for the geologic timescale to account for these concerns. It uses the ISO 19108 distinctions between different types of temporal reference system, also linking to an explicit temporal topology model. Stratotypes used in the calibration process are modelled as sampling-features following the ISO 19156 Observations and Measurements model. A joint OGC-W3C harmonization project is underway, with standardization of the W3C OWL-Time ontology as one of its tasks. The insights gained from the geologic timescale ontology will assist in development of a general ontology capable of modelling a richer set of use-cases from geoscience.

  17. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms

    Directory of Open Access Journals (Sweden)

    Argraves W Scott

    2010-04-01

    Full Text Available Abstract Background An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Results Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc. or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.. Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. Conclusions GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies

  18. Brucellosis Ontology (IDOBRU as an extension of the Infectious Disease Ontology

    Directory of Open Access Journals (Sweden)

    Lin Yu

    2011-10-01

    Full Text Available Abstract Background Caused by intracellular Gram-negative bacteria Brucella spp., brucellosis is the most common bacterial zoonotic disease. Extensive studies in brucellosis have yielded a large number of publications and data covering various topics ranging from basic Brucella genetic study to vaccine clinical trials. To support data interoperability and reasoning, a community-based brucellosis-specific biomedical ontology is needed. Results The Brucellosis Ontology (IDOBRU: http://sourceforge.net/projects/idobru, a biomedical ontology in the brucellosis domain, is an extension ontology of the core Infectious Disease Ontology (IDO-core and follows OBO Foundry principles. Currently IDOBRU contains 1503 ontology terms, which includes 739 Brucella-specific terms, 414 IDO-core terms, and 350 terms imported from 10 existing ontologies. IDOBRU has been used to model different aspects of brucellosis, including host infection, zoonotic disease transmission, symptoms, virulence factors and pathogenesis, diagnosis, intentional release, vaccine prevention, and treatment. Case studies are typically used in our IDOBRU modeling. For example, diurnal temperature variation in Brucella patients, a Brucella-specific PCR method, and a WHO-recommended brucellosis treatment were selected as use cases to model brucellosis symptom, diagnosis, and treatment, respectively. Developed using OWL, IDOBRU supports OWL-based ontological reasoning. For example, by performing a Description Logic (DL query in the OWL editor Protégé 4 or a SPARQL query in an IDOBRU SPARQL server, a check of Brucella virulence factors showed that eight of them are known protective antigens based on the biological knowledge captured within the ontology. Conclusions IDOBRU is the first reported bacterial infectious disease ontology developed to represent different disease aspects in a formal logical format. It serves as a brucellosis knowledgebase and supports brucellosis data integration and

  19. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    Directory of Open Access Journals (Sweden)

    Robert Hoehndorf

    Full Text Available Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  20. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    Science.gov (United States)

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V

    2011-01-01

    Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  1. Ontology-based representation and analysis of host-Brucella interactions.

    Science.gov (United States)

    Lin, Yu; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this study, IDOBRU is used as a platform to model and analyze how the hosts, especially host macrophages, interact with virulent Brucella strains or live attenuated Brucella vaccine strains. Such a study allows us to better integrate and understand intricate Brucella pathogenesis and host immunity mechanisms. Different levels of host-Brucella interactions based on different host cell types and Brucella strains were first defined ontologically. Three important processes of virulent Brucella interacting with host macrophages were represented: Brucella entry into macrophage, intracellular trafficking, and intracellular replication. Two Brucella pathogenesis mechanisms were ontologically represented: Brucella Type IV secretion system that supports intracellular trafficking and replication, and Brucella erythritol metabolism that participates in Brucella intracellular survival and pathogenesis. The host cell death pathway is critical to the outcome of host-Brucella interactions. For better survival and replication, virulent Brucella prevents macrophage cell death. However, live attenuated B. abortus vaccine strain RB51 induces caspase-2-mediated proinflammatory cell death. Brucella-associated cell death processes are represented in IDOBRU. The gene and protein information of 432 manually annotated Brucella virulence factors were represented using the Ontology of Genes and Genomes (OGG) and Protein Ontology (PRO), respectively. Seven inference rules were defined to capture the knowledge of host

  2. OntoELAN: An Ontology-based Linguistic Multimedia Annotator

    CERN Document Server

    Chebotko, Artem; Lu, Shiyong; Fotouhi, Farshad; Aristar, Anthony; Brugman, Hennie; Klassmann, Alexander; Sloetjes, Han; Russel, Albert; Wittenburg, Peter

    2009-01-01

    Despite its scientific, political, and practical value, comprehensive information about human languages, in all their variety and complexity, is not readily obtainable and searchable. One reason is that many language data are collected as audio and video recordings which imposes a challenge to document indexing and retrieval. Annotation of multimedia data provides an opportunity for making the semantics explicit and facilitates the searching of multimedia documents. We have developed OntoELAN, an ontology-based linguistic multimedia annotator that features: (1) support for loading and displaying ontologies specified in OWL; (2) creation of a language profile, which allows a user to choose a subset of terms from an ontology and conveniently rename them if needed; (3) creation of ontological tiers, which can be annotated with profile terms and, therefore, corresponding ontological terms; and (4) saving annotations in the XML format as Multimedia Ontology class instances and, linked to them, class instances of o...

  3. Biomedical imaging ontologies: A survey and proposal for future work

    Directory of Open Access Journals (Sweden)

    Barry Smith

    2015-01-01

    Full Text Available Background: Ontology is one strategy for promoting interoperability of heterogeneous data through consistent tagging. An ontology is a controlled structured vocabulary consisting of general terms (such as "cell" or "image" or "tissue" or "microscope" that form the basis for such tagging. These terms are designed to represent the types of entities in the domain of reality that the ontology has been devised to capture; the terms are provided with logical defi nitions thereby also supporting reasoning over the tagged data. Aim: This paper provides a survey of the biomedical imaging ontologies that have been developed thus far. It outlines the challenges, particularly faced by ontologies in the fields of histopathological imaging and image analysis, and suggests a strategy for addressing these challenges in the example domain of quantitative histopathology imaging. Results and Conclusions: The ultimate goal is to support the multiscale understanding of disease that comes from using interoperable ontologies to integrate imaging data with clinical and genomics data.

  4. A Method for Recommending Ontology Alignment Strategies

    Science.gov (United States)

    Tan, He; Lambrix, Patrick

    In different areas ontologies have been developed and many of these ontologies contain overlapping information. Often we would therefore want to be able to use multiple ontologies. To obtain good results, we need to find the relationships between terms in the different ontologies, i.e. we need to align them. Currently, there already exist a number of different alignment strategies. However, it is usually difficult for a user that needs to align two ontologies to decide which of the different available strategies are the most suitable. In this paper we propose a method that provides recommendations on alignment strategies for a given alignment problem. The method is based on the evaluation of the different available alignment strategies on several small selected pieces from the ontologies, and uses the evaluation results to provide recommendations. In the paper we give the basic steps of the method, and then illustrate and discuss the method in the setting of an alignment problem with two well-known biomedical ontologies. We also experiment with different implementations of the steps in the method.

  5. Primer on Ontologies.

    Science.gov (United States)

    Hastings, Janna

    2017-01-01

    As molecular biology has increasingly become a data-intensive discipline, ontologies have emerged as an essential computational tool to assist in the organisation, description and analysis of data. Ontologies describe and classify the entities of interest in a scientific domain in a computationally accessible fashion such that algorithms and tools can be developed around them. The technology that underlies ontologies has its roots in logic-based artificial intelligence, allowing for sophisticated automated inference and error detection. This chapter presents a general introduction to modern computational ontologies as they are used in biology.

  6. Kuhn's Ontological Relativism.

    Science.gov (United States)

    Sankey, Howard

    2000-01-01

    Discusses Kuhn's model of scientific theory change. Documents Kuhn's move away from conceptual relativism and rational relativism. Provides an analysis of his present ontological form of relativism. (CCM)

  7. The MMI Device Ontology: Enabling Sensor Integration

    Science.gov (United States)

    Rueda, C.; Galbraith, N.; Morris, R. A.; Bermudez, L. E.; Graybeal, J.; Arko, R. A.; Mmi Device Ontology Working Group

    2010-12-01

    The Marine Metadata Interoperability (MMI) project has developed an ontology for devices to describe sensors and sensor networks. This ontology is implemented in the W3C Web Ontology Language (OWL) and provides an extensible conceptual model and controlled vocabularies for describing heterogeneous instrument types, with different data characteristics, and their attributes. It can help users populate metadata records for sensors; associate devices with their platforms, deployments, measurement capabilities and restrictions; aid in discovery of sensor data, both historic and real-time; and improve the interoperability of observational oceanographic data sets. We developed the MMI Device Ontology following a community-based approach. By building on and integrating other models and ontologies from related disciplines, we sought to facilitate semantic interoperability while avoiding duplication. Key concepts and insights from various communities, including the Open Geospatial Consortium (eg., SensorML and Observations and Measurements specifications), Semantic Web for Earth and Environmental Terminology (SWEET), and W3C Semantic Sensor Network Incubator Group, have significantly enriched the development of the ontology. Individuals ranging from instrument designers, science data producers and consumers to ontology specialists and other technologists contributed to the work. Applications of the MMI Device Ontology are underway for several community use cases. These include vessel-mounted multibeam mapping sonars for the Rolling Deck to Repository (R2R) program and description of diverse instruments on deepwater Ocean Reference Stations for the OceanSITES program. These trials involve creation of records completely describing instruments, either by individual instances or by manufacturer and model. Individual terms in the MMI Device Ontology can be referenced with their corresponding Uniform Resource Identifiers (URIs) in sensor-related metadata specifications (e

  8. Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.

    Science.gov (United States)

    Percha, Bethany; Altman, Russ B

    2013-01-01

    The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.

  9. MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken

    Directory of Open Access Journals (Sweden)

    Gota Morota

    2016-08-01

    Full Text Available Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO, and more recently by Medical Subject Headings (MeSH. Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and MeSH-guided semantic similarity among terms and gene products, using two lists of chicken genes available in public repositories. The two published datasets that were employed represent (i differentially expressed genes, and (ii candidate genes under selective sweep or epistatic selection. The comparison of MeSH with GO overrepresentation analyses suggested not only that MeSH supports the findings obtained from GO analysis, but also that MeSH is able to further enrich the representation of biological knowledge and often provide more interpretable results. Based on the hierarchical structures of MeSH and GO, we computed semantic similarities among vocabularies, as well as semantic similarities among selected genes. These yielded the similarity levels between significant functional terms, and the annotation of each gene yielded the measures of gene similarity. Our findings show the benefits of using MeSH as an alternative choice of annotation in order to draw biological inferences from a list of genes of interest. We argue that the use of MeSH in conjunction with GO will be instrumental in facilitating the understanding of the genetic basis of complex traits.

  10. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  11. Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base

    Directory of Open Access Journals (Sweden)

    Smith Barry

    2010-06-01

    Full Text Available Abstract Background The Salivaomics Knowledge Base (SKB is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1 the saliva biomarker discovery and validation resources at UCLA with (2 the ontology resources developed by the OBO (Open Biomedical Ontologies Foundry, including a new Saliva Ontology (SALO. Results We define the Saliva Ontology (SALO; http://www.skb.ucla.edu/SALO/ as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics domain and to saliva-related diagnostics following the principles of the OBO (Open Biomedical Ontologies Foundry. Conclusions The Saliva Ontology is an ongoing exploratory initiative. The ontology will be used to facilitate salivaomics data retrieval and integration across multiple fields of research together with data analysis and data mining. The ontology will be tested through its ability to serve the annotation ('tagging' of a representative corpus of salivaomics research literature that is to be incorporated into the SKB.

  12. Development of an Adolescent Depression Ontology for Analyzing Social Data.

    Science.gov (United States)

    Jung, Hyesil; Park, Hyeoun-Ae; Song, Tae-Min; Jeon, Eunjoo; Kim, Ae Ran; Lee, Joo Yun

    2015-01-01

    Depression in adolescence is associated with significant suicidality. Therefore, it is important to detect the risk for depression and provide timely care to adolescents. This study aims to develop an ontology for collecting and analyzing social media data about adolescent depression. This ontology was developed using the 'ontology development 101'. The important terms were extracted from several clinical practice guidelines and postings on Social Network Service. We extracted 777 terms, which were categorized into 'risk factors', 'sign and symptoms', 'screening', 'diagnosis', 'treatment', and 'prevention'. An ontology developed in this study can be used as a framework to understand adolescent depression using unstructured data from social media.

  13. Short- and long-term changes in sugarbeet (Beta vulgaris L. gene expression due to postharvest jasmonic acid treatment - Data

    Directory of Open Access Journals (Sweden)

    Lucilene Silva de Oliveira

    2017-04-01

    Full Text Available Jasmonic acid is a natural plant hormone that induces native defense responses in plants. Sugarbeet (Beta vulgaris L. root unigenes that were differentially expressed 2 and 60 days after a postharvest jasmonic acid treatment are presented. Data include changes in unigene expression relative to water-treated controls, unigene annotations against nonredundant (Nr, Swiss-Prot, Clusters of Orthologous Groups (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG protein databases, and unigene annotations with Gene Ontology (GO terms. Putative defense unigenes are compiled and annotated against the sugarbeet genome. Differential gene expression data were generated by RNA sequencing. Interpretation of the data is available in the research article, “Jasmonic acid causes short- and long-term alterations to the transcriptome and the expression of defense genes in sugarbeet roots” (K.K. Fugate, L.S. Oliveira, J.P. Ferrareze, M.D. Bolton, E.L. Deckard, F.L. Finger, 2017 [1]. Public dissemination of this dataset will allow further analyses of the data.

  14. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  15. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  16. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  17. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  18. Students' Ontological Security and Agency in Science Education--An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-01-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from…

  19. Ayurveda research: Ontological challenges.

    Science.gov (United States)

    Nayak, Jayakrishna

    2012-01-01

    Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the 'ASIIA (A Science Initiative In Ayurveda)' projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  20. Ayurveda research: Ontological challenges

    Directory of Open Access Journals (Sweden)

    Jayakrishna Nayak

    2012-01-01

    Full Text Available Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the ′ASIIA (A Science Initiative In Ayurveda′ projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  1. Ontology Maintenance using Textual Analysis

    Directory of Open Access Journals (Sweden)

    Yassine Gargouri

    2003-10-01

    Full Text Available Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

  2. Ontology Maintenance using Textual Analysis

    Directory of Open Access Journals (Sweden)

    Yassine Gargouri

    2003-10-01

    Full Text Available Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

  3. Practical ontologies for information professionals

    CERN Document Server

    AUTHOR|(CDS)2071712

    2016-01-01

    Practical Ontologies for Information Professionals provides an introduction to ontologies and their development, an essential tool for fighting back against information overload. The development of robust and widely used ontologies is an increasingly important tool in the fight against information overload. The publishing and sharing of explicit explanations for a wide variety of conceptualizations, in a machine readable format, has the power to both improve information retrieval and identify new knowledge. This new book provides an accessible introduction to the following: * What is an ontology? Defining the concept and why it is increasingly important to the information professional * Ontologies and the semantic web * Existing ontologies, such as SKOS, OWL, FOAF, schema.org, and the DBpedia Ontology * Adopting and building ontologies, showing how to avoid repetition of work and how to build a simple ontology with Protege * Interrogating semantic web ontologies * The future of ontologies and the role of the ...

  4. Ontologies in biological data visualization.

    Science.gov (United States)

    Carpendale, Sheelagh; Chen, Min; Evanko, Daniel; Gehlenborg, Nils; Gorg, Carsten; Hunter, Larry; Rowland, Francis; Storey, Margaret-Anne; Strobelt, Hendrik

    2014-01-01

    In computer science, an ontology is essentially a graph-based knowledge representation in which each node corresponds to a concept and each edge specifies a relation between two concepts. Ontological development in biology can serve as a focus to discuss the challenges and possible research directions for ontologies in visualization. The principle challenges are the dynamic and evolving nature of ontologies, the ever-present issue of scale, the diversity and richness of the relationships in ontologies, and the need to better understand the relationship between ontologies and the data analysis tasks scientists wish to support. Research directions include visualizing ontologies; visualizing semantically or ontologically annotated texts, documents, and corpora; automated generation of visualizations using ontologies; and visualizing ontological context to support search. Although this discussion uses issues of ontologies in biological data visualization as a springboard, these topics are of general relevance to visualization.

  5. Documenting the emergence of bio-ontologies: or, why researching bioinformatics requires HPSSB.

    Science.gov (United States)

    Leonelli, Sabina

    2010-01-01

    This paper reflects on the analytic challenges emerging from the study of bioinformatic tools recently created to store and disseminate biological data, such as databases, repositories, and bio-ontologies. I focus my discussion on the Gene Ontology, a term that defines three entities at once: a classification system facilitating the distribution and use of genomic data as evidence towards new insights; an expert community specialised in the curation of those data; and a scientific institution promoting the use of this tool among experimental biologists. These three dimensions of the Gene Ontology can be clearly distinguished analytically, but are tightly intertwined in practice. I suggest that this is true of all bioinformatic tools: they need to be understood simultaneously as epistemic, social, and institutional entities, since they shape the knowledge extracted from data and at the same time regulate the organisation, development, and communication of research. This viewpoint has one important implication for the methodologies used to study these tools; that is, the need to integrate historical, philosophical, and sociological approaches. I illustrate this claim through examples of misunderstandings that may result from a narrowly disciplinary study of the Gene Ontology, as I experienced them in my own research.

  6. Ontological foundations for evolutionary economics: A Darwinian social ontology

    NARCIS (Netherlands)

    J.W. Stoelhorst

    2008-01-01

    The purpose of this paper is to further the project of generalized Darwinism by developing a social ontology on the basis of a combined commitment to ontological continuity and ontological commonality. Three issues that are central to the development of a social ontology are addressed: (1) the speci

  7. Multi-species Ontologies of the Craniofacial Musculoskeletal System

    Science.gov (United States)

    Mejino, Jose L.V.; Detwiler, Landon T.; Cox, Timothy C.; Brinkley, James F.

    2017-01-01

    We created the Ontology of Craniofacial Development and Malformation (OCDM) [1] to provide a unifying framework for organizing and integrating craniofacial data ranging from genes to clinical phenotypes from multi-species. Within this framework we focused on spatio-structural representation of anatomical entities related to craniofacial development and malformation, such as craniosynostosis and midface hypoplasia. Animal models are used to support human studies and so we built multi-species ontologies that would allow for cross-species correlation of anatomical information. For this purpose we first developed and enhanced the craniofacial component of the human musculoskeletal system in the Foundational Model of Anatomy Ontology (FMA)[2], and then imported this component, which we call the Craniofacial Human Ontology (CHO), into the OCDM. The CHO was then used as a template to create the anatomy for the mouse, the Craniofacial Mouse Ontology (CMO) as well as for the zebrafish, the Craniofacial Zebrafish Ontology (CZO).

  8. Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms.

    Science.gov (United States)

    Cheng, Liang; Li, Jie; Hu, Yang; Jiang, Yue; Liu, Yongzhuang; Chu, Yanshuo; Wang, Zhenxing; Wang, Yadong

    2015-01-01

    Relative terms often appear together in the literature. Methods have been presented for weighting relativity of pairwise terms by their co-occurring literature and inferring new relationship. Terms in the literature are also in the directed acyclic graph of ontologies, such as Gene Ontology and Disease Ontology. Therefore, semantic association between terms may help for establishing relativities between terms in literature. However, current methods do not use these associations. In this paper, an adjusted R-scaled score (ARSS) based on information content (ARSSIC) method is introduced to infer new relationship between terms. First, set inclusion relationship between terms of ontology was exploited to extend relationships between these terms and literature. Next, the ARSS method was presented to measure relativity between terms across ontologies according to these extensional relationships. Then, the ARSSIC method using ratios of information shared of term's ancestors was designed to infer new relationship between terms across ontologies. The result of the experiment shows that ARSS identified more pairs of statistically significant terms based on corresponding gene sets than other methods. And the high average area under the receiver operating characteristic curve (0.9293) shows that ARSSIC achieved a high true positive rate and a low false positive rate. Data is available at http://mlg.hit.edu.cn/ARSSIC/.

  9. Microposts Ontology Construction Via Concept Extraction

    Directory of Open Access Journals (Sweden)

    Beenu Yadav

    2012-08-01

    Full Text Available The social networking website Facebook offers to its users a feature called “status updates” (or just “status”, which allows users to create Microposts directed to all their contacts, or a subset thereof. Readers can respond to Microposts, or in addition to that also click a “Like” button to show their appreciation for a certain Micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such Microposts. We can make a start towards semantic web by adding semantic annotation to web resources. Ontology are used to specify meaning of annotations. Ontology provide a vocabulary for representing and communicating knowledge about some topic and a set of semantic relationships that hold among the terms in that vocabulary. For increasing the efficiency of ontology based application there is a need to develop a mechanism that reduces the manual work in developing ontology. In this paper, we proposed Microposts’ ontology construction. In this paper we present a method that extracts meaningfulknowledge from microposts shared in social platforms. This process involves different steps for the analysis of such microposts (extraction of keywords, named entities and their matching to ontological concepts.

  10. [ ] Toward an Ontology of Finitude

    Directory of Open Access Journals (Sweden)

    Julia Hölzl

    2011-09-01

    Full Text Available Hölzl palpates an ontology of fracture. Unlike original ontologies that are concerned with essence rather than being, the ontology proposed here does not believe in its originality. This project is concerned with becoming as such rather than with its Wesen. With the indefinite striving for remaining in itself. This ontology is a fissure, fissuring itself.

  11. Perspectives on ontology learning

    CERN Document Server

    Lehmann, J

    2014-01-01

    Perspectives on Ontology Learning brings together researchers and practitioners from different communities − natural language processing, machine learning, and the semantic web − in order to give an interdisciplinary overview of recent advances in ontology learning.Starting with a comprehensive introduction to the theoretical foundations of ontology learning methods, the edited volume presents the state-of-the-start in automated knowledge acquisition and maintenance. It outlines future challenges in this area with a special focus on technologies suitable for pushing the boundaries beyond the c

  12. The sexual and ontology

    Directory of Open Access Journals (Sweden)

    Zupančič Alenka

    2014-01-01

    Full Text Available This paper explores some of the crucial ontological implications of the psychoanalytic theory of sexuality in its Freudo-Lacanian orientation. As irreducible to different sexual practices and contents, the concept of sexuality obtains conceptual weight that makes it particularly relevant for philosophical ontological thinking. Starting from the hypothesis that something about sexuality is constitutively unconscious - that is to say, existing only in the form of the unconscious - the paper points at the singular short-circuit of the epistemological and ontological level which is at work in psychoanalytic theory, and which cannot be neglected in philosophical examination of the relation between knowledge and being.

  13. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  14. Ontology Based Metadata Management for National Healthcare Data Dictionary

    Directory of Open Access Journals (Sweden)

    Yasemin Yüksek

    2012-02-01

    Full Text Available Ontology based metadata is based on ontologies that give formal semantics to information for content level. In this study, ontology based metadata management that intended the metadata modeling developed for National Health Data Dictionary (NHDD was proposed. NHDD is used as a reference to all health institutions in Turkey and it provides great contribution in terms of the terminology. The approach of the proposed ontology based metadata management was achieved by using modeling methodology of metadata requirements. This methodology includes determination of metadata beneficiaries, listing of metadata requirements for each beneficiary, identification of the source of metadata, categorizing of metadata and a metamodel building.

  15. Towards automated biomedical ontology harmonization.

    Science.gov (United States)

    Uribe, Gustavo A; Lopez, Diego M; Blobel, Bernd

    2014-01-01

    The use of biomedical ontologies is increasing, especially in the context of health systems interoperability. Ontologies are key pieces to understand the semantics of information exchanged. However, given the diversity of biomedical ontologies, it is essential to develop tools that support harmonization processes amongst them. Several algorithms and tools are proposed by computer scientist for partially supporting ontology harmonization. However, these tools face several problems, especially in the biomedical domain where ontologies are large and complex. In the harmonization process, matching is a basic task. This paper explains the different ontology harmonization processes, analyzes existing matching tools, and proposes a prototype of an ontology harmonization service. The results demonstrate that there are many open issues in the field of biomedical ontology harmonization, such as: overcoming structural discrepancies between ontologies; the lack of semantic algorithms to automate the process; the low matching efficiency of existing algorithms; and the use of domain and top level ontologies in the matching process.

  16. An Object-Oriented Metamodel for Bunge-Wand-Weber Ontology

    CERN Document Server

    Kiwelekar, Arvind W

    2010-01-01

    A UML based metamodel for Bunge-Wand-Weber (BWW) ontology is presented. BWW ontology is a generic framework for analysis and conceptualization of real world objects. It includes categories that can be applied to analyze and classify objects found in an information system. In the context of BWW ontology, the metamodel is a representation of the ontological categories and relationships among them. An objective behind developing an object-oriented metamodel has been to model BWW ontology in terms of widely used notions in software development. The main contributions of this paper are a classification for ontological categories, a description template, and representations through UML and typed based models.

  17. The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology.

    Science.gov (United States)

    Thacker, Robert W; Díaz, Maria Cristina; Kerner, Adeline; Vignes-Lebbe, Régine; Segerdell, Erik; Haendel, Melissa A; Mungall, Christopher J

    2014-01-01

    Porifera (sponges) are ancient basal metazoans that lack organs. They provide insight into key evolutionary transitions, such as the emergence of multicellularity and the nervous system. In addition, their ability to synthesize unusual compounds offers potential biotechnical applications. However, much of the knowledge of these organisms has not previously been codified in a machine-readable way using modern web standards. The Porifera Ontology is intended as a standardized coding system for sponge anatomical features currently used in systematics. The ontology is available from http://purl.obolibrary.org/obo/poro.owl, or from the project homepage http://porifera-ontology.googlecode.com/. The version referred to in this manuscript is permanently available from http://purl.obolibrary.org/obo/poro/releases/2014-03-06/. By standardizing character representations, we hope to facilitate more rapid description and identification of sponge taxa, to allow integration with other evolutionary database systems, and to perform character mapping across the major clades of sponges to better understand the evolution of morphological features. Future applications of the ontology will focus on creating (1) ontology-based species descriptions; (2) taxonomic keys that use the nested terms of the ontology to more quickly facilitate species identifications; and (3) methods to map anatomical characters onto molecular phylogenies of sponges. In addition to modern taxa, the ontology is being extended to include features of fossil taxa.

  18. Towards a core ontology for integrating ecological and environmental ontologies to enable improved data interoperability

    Science.gov (United States)

    Bowers, S.; Madin, J.; Jones, M.; Schildhauer, M.; Ludaescher, B.

    2007-12-01

    Research in the ecological and environmental sciences increasingly relies on the integration of traditionally small, focused studies to form larger datasets for synthetic analyses. However, a broad range of data types, structures, and semantic subtleties occur in ecological data, making data discovery and integration a difficult and time-consuming task. Our work focuses on capturing the subtleties of scientific data through semantic annotations, which involve linking ecological data to concepts and relationships in domain-specific ontologies, thereby enabling more advanced forms of data discovery and integration. A variety of ontologies related to ecological data are actively being developed, ranging from low-level and highly focused vocabularies to high-level models and classifications. However, as the number of ontologies and their included terms increase, organizing these into a coherent framework useful for data annotation becomes increasingly complex (we note that similar issues have been recognized within the molecular biology and bioinformatics communities). We describe a core ontology model for semantic annotation that provides a structured approach for integrating the growing number of ecology-relevant ontologies. The ontology defines the notion of "scientific observation" as a unifying concept for capturing the basic semantics of ecological data. Observations are distinguished at the level of the entity (e.g., location, time, thing, concept), and characteristics of an entity (e.g., height, name, color) are measured (named or classified) as data. The ontology permits observations to be related via context (such as spatial or temporal containment), further supporting the discovery and automated comparison and alignment (e.g., merging) of heterogeneous data. The core ontology also defines a set of extension points that can be used to either directly build new domain ontologies (as extension ontologies), or to provide a common basis to which existing

  19. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  20. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  1. The design ontology

    DEFF Research Database (Denmark)

    Storga, Mario; Andreasen, Mogens Myrup; Marjanovic, Dorian

    2010-01-01

    The article presents the research of the nature, building and practical role of a Design Ontology as a potential framework for the more efficient product development (PD) data-, information- and knowledge- description, -explanation, -understanding and -reusing. In the methodology for development...... of the ontology two steps could be identified: empirical research and computer implementation. Empirical research has included domain documentation analysis (Genetic Design Model System developed by Mortensen 1999), identification of the key concepts and relations between them, and categorisation of the concepts...... and relations into taxonomies. As an epistemological foundation for the concepts formalisation, The Suggested Upper Merged Ontology (SUMO) proposed by IEEE, was reused. As the result of the previously described process, the ontology content has been categorised into six main subcategories divided between...

  2. Mechanisms in biomedical ontology

    National Research Council Canada - National Science Library

    Röhl, Johannes

    2012-01-01

    .... Taking some hints from an "ontology of devices" I suggest as a general approach for this task the introduction of functional kinds and functional parts by which the particular relations between a mechanism and its components can be captured.

  3. ``Force,'' ontology, and language

    Science.gov (United States)

    Brookes, David T.; Etkina, Eugenia

    2009-06-01

    We introduce a linguistic framework through which one can interpret systematically students’ understanding of and reasoning about force and motion. Some researchers have suggested that students have robust misconceptions or alternative frameworks grounded in everyday experience. Others have pointed out the inconsistency of students’ responses and presented a phenomenological explanation for what is observed, namely, knowledge in pieces. We wish to present a view that builds on and unifies aspects of this prior research. Our argument is that many students’ difficulties with force and motion are primarily due to a combination of linguistic and ontological difficulties. It is possible that students are primarily engaged in trying to define and categorize the meaning of the term “force” as spoken about by physicists. We found that this process of negotiation of meaning is remarkably similar to that engaged in by physicists in history. In this paper we will describe a study of the historical record that reveals an analogous process of meaning negotiation, spanning multiple centuries. Using methods from cognitive linguistics and systemic functional grammar, we will present an analysis of the force and motion literature, focusing on prior studies with interview data. We will then discuss the implications of our findings for physics instruction.

  4. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Directory of Open Access Journals (Sweden)

    José Cuenca

    Full Text Available Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR to map a genome region linked to Alternaria brown spot (ABS resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  5. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  6. Genetically Based Location from Triploid Populations and Gene Ontology of a 3.3-Mb Genome Region Linked to Alternaria Brown Spot Resistance in Citrus Reveal Clusters of Resistance Genes

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids. PMID:24116149

  7. Manufacturing ontology through templates

    Directory of Open Access Journals (Sweden)

    Diciuc Vlad

    2017-01-01

    Full Text Available The manufacturing industry contains a high volume of knowhow and of high value, much of it being held by key persons in the company. The passing of this know-how is the basis of manufacturing ontology. Among other methods like advanced filtering and algorithm based decision making, one way of handling the manufacturing ontology is via templates. The current paper tackles this approach and highlights the advantages concluding with some recommendations.

  8. Ontology alignment with OLA

    OpenAIRE

    Euzenat, Jérôme; Loup, David; Touzani, Mohamed; Valtchev, Petko

    2004-01-01

    euzenat2004d; International audience; Using ontologies is the standard way to achieve interoperability of heterogeneous systems within the Semantic web. However, as the ontologies underlying two systems are not necessarily compatible, they may in turn need to be aligned. Similarity-based approaches to alignment seems to be both powerful and flexible enough to match the expressive power of languages like OWL. We present an alignment tool that follows the similarity-based paradigm, called OLA. ...

  9. Marker2sequence, mine your QTL regions for candidate genes

    NARCIS (Netherlands)

    Chibon, P.Y.F.R.P.; Schoof, H.; Visser, R.G.F.; Finkers, H.J.

    2012-01-01

    Marker2sequence (M2S) aims at mining quantitative trait loci (QTLs) for candidate genes. For each gene, within the QTL region, M2S uses data integration technology to integrate putative gene function with associated gene ontology terms, proteins, pathways and literature. As a typical QTL region

  10. A method exploiting syntactic patterns and the UMLS semantics for aligning biomedical ontologies: the case of OBO disease ontologies.

    Science.gov (United States)

    Marquet, Gwenaëlle; Mosser, Jean; Burgun, Anita

    2007-12-01

    The OBO ontologies include more than 50 standard vocabularies that cover different domains, including genomics, chemistry, anatomy and phenotype. Ontology alignment is a means to build consistent biomedical ontologies compatible with standard vocabularies and dedicated to specific domains, such as cancer. An alignment is defined as a set of pairs of concepts, coming from two ontologies, related by a relation R, R not being restricted to the equivalence or subsumption relations. Alignment is performed in three major steps: first, the concepts that are equivalent in the ontologies are identified; second the pairs of concepts that are related although not equivalent are searched for; third the relations between the concepts are characterized. We have developed a method to align ontologies that exploits the compositionality of the terms in OBO ontologies, uses the UMLS to provide synonyms and relations, and defines syntactico-semantic patterns that characterize semantically the relations between concepts. We have applied it to four OBO phenotype ontologies: mouse pathology, human disease, mammalian phenotype, and PATO. We found 386 pairs of equivalent concepts and 20,461 pairs of concepts where one concept name is included in the other term. Among the 20,460 inclusions, we were able to provide a semantic categorization for 2682 relations. In 2552 cases, the relation was present and semantically defined in the UMLS Metathesaurus, in 131 cases the relation was characterized through semantic patterns. Our approach may help to find the semantic relations between concepts in ontologies.

  11. Applications of ontology design patterns in biomedical ontologies.

    Science.gov (United States)

    Mortensen, Jonathan M; Horridge, Matthew; Musen, Mark A; Noy, Natalya F

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited.

  12. Applications of Ontology Design Patterns in Biomedical Ontologies

    Science.gov (United States)

    Mortensen, Jonathan M.; Horridge, Matthew; Musen, Mark A.; Noy, Natalya F.

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited. PMID:23304337

  13. Determining Fitness-For-Use of Ontologies Through Change Management, Versioning and Publication Best Practices

    Science.gov (United States)

    West, P.; Zednik, S.; Fu, L.; Ma, X.; Fox, P. A.

    2015-12-01

    There is a large and growing number of domain ontologies available for researchers to leverage in their applications. When evaluating the use of an ontology it is important to not only consider whether the concepts and relationships defined in the ontology meet the requirements for purpose of use, but also how the change management, versioning and publication practices followed by the ontology publishers affect the maturity, stability, and long-term fitness-for-use of the ontology. In this presentation we share our experiences and a list of best practices we have developed when determining fitness for use of existing ontologies, and the process we follow when developing of our own ontologies and extensions to existing ontologies. Our experience covers domains such as solar terrestrial physics, geophysics and oceanography; and the use of general purpose ontologies such as those with representations of people, organizations, data catalogs, observations and measurements and provenance. We will cover how we determine ontology scope, manage ontology change, specify ontology version, and what best practices we follow for ontology publication and use. The implications of following these best practices is that the ontologies we use and develop are mature, stable, have a well-defined scope, and are published in accordance with linked data principles.

  14. Shiva++: An Enhanced Graph based Ontology Matcher

    Science.gov (United States)

    Mathur, Iti; Joshi, Nisheeth; Darbari, Hemant; Kumar, Ajai

    2014-04-01

    With the web getting bigger and assimilating knowledge about different concepts and domains, it is becoming very difficult for simple database driven applications to capture the data for a domain. Thus developers have come out with ontology based systems which can store large amount of information and can apply reasoning and produce timely information. Thus facilitating effective knowledge management. Though this approach has made our lives easier, but at the same time has given rise to another problem. Two different ontologies assimilating same knowledge tend to use different terms for the same concepts. This creates confusion among knowledge engineers and workers, as they do not know which is a better term then the other. Thus we need to merge ontologies working on same domain so that the engineers can develop a better application over it. This paper shows the development of one such matcher which merges the concepts available in two ontologies at two levels; 1) at string level and 2) at semantic level; thus producing better merged ontologies. We have used a graph matching technique which works at the core of the system. We have also evaluated the system and have tested its performance with its predecessor which works only on string matching. Thus current approach produces better results.

  15. OAE: The Ontology of Adverse Events.

    Science.gov (United States)

    He, Yongqun; Sarntivijai, Sirarat; Lin, Yu; Xiang, Zuoshuang; Guo, Abra; Zhang, Shelley; Jagannathan, Desikan; Toldo, Luca; Tao, Cui; Smith, Barry

    2014-01-01

    A medical intervention is a medical procedure or application intended to relieve or prevent illness or injury. Examples of medical interventions include vaccination and drug administration. After a medical intervention, adverse events (AEs) may occur which lie outside the intended consequences of the intervention. The representation and analysis of AEs are critical to the improvement of public health. The Ontology of Adverse Events (OAE), previously named Adverse Event Ontology (AEO), is a community-driven ontology developed to standardize and integrate data relating to AEs arising subsequent to medical interventions, as well as to support computer-assisted reasoning. OAE has over 3,000 terms with unique identifiers, including terms imported from existing ontologies and more than 1,800 OAE-specific terms. In OAE, the term 'adverse event' denotes a pathological bodily process in a patient that occurs after a medical intervention. Causal adverse events are defined by OAE as those events that are causal consequences of a medical intervention. OAE represents various adverse events based on patient anatomic regions and clinical outcomes, including symptoms, signs, and abnormal processes. OAE has been used in the analysis of several different sorts of vaccine and drug adverse event data. For example, using the data extracted from the Vaccine Adverse Event Reporting System (VAERS), OAE was used to analyse vaccine adverse events associated with the administrations of different types of influenza vaccines. OAE has also been used to represent and classify the vaccine adverse events cited in package inserts of FDA-licensed human vaccines in the USA. OAE is a biomedical ontology that logically defines and classifies various adverse events occurring after medical interventions. OAE has successfully been applied in several adverse event studies. The OAE ontological framework provides a platform for systematic representation and analysis of adverse events and of the factors (e

  16. The ins and outs of eukaryotic viruses: Knowledge base and ontology of a viral infection

    Science.gov (United States)

    Hulo, Chantal; Masson, Patrick; de Castro, Edouard; Auchincloss, Andrea H.; Foulger, Rebecca; Poux, Sylvain; Lomax, Jane; Bougueleret, Lydie; Xenarios, Ioannis

    2017-01-01

    Viruses are genetically diverse, infect a wide range of tissues and host cells and follow unique processes for replicating themselves. All these processes were investigated and indexed in ViralZone knowledge base. To facilitate standardizing data, a simple ontology of viral life-cycle terms was developed to provide a common vocabulary for annotating data sets. New terminology was developed to address unique viral replication cycle processes, and existing terminology was modified and adapted. The virus life-cycle is classically described by schematic pictures. Using this ontology, it can be represented by a combination of successive terms: “entry”, “latency”, “transcription”, “replication” and “exit”. Each of these parts is broken down into discrete steps. For example Zika virus “entry” is broken down in successive steps: “Attachment”, “Apoptotic mimicry”, “Viral endocytosis/ macropinocytosis”, “Fusion with host endosomal membrane”, “Viral factory”. To demonstrate the utility of a standard ontology for virus biology, this work was completed by annotating virus data in the ViralZone, UniProtKB and Gene Ontology databases. PMID:28207819

  17. Unintended consequences of existential quantifications in biomedical ontologies

    Directory of Open Access Journals (Sweden)

    Boeker Martin

    2011-11-01

    Full Text Available Abstract Background The Open Biomedical Ontologies (OBO Foundry is a collection of freely available ontologically structured controlled vocabularies in the biomedical domain. Most of them are disseminated via both the OBO Flatfile Format and the semantic web format Web Ontology Language (OWL, which draws upon formal logic. Based on the interpretations underlying OWL description logics (OWL-DL semantics, we scrutinize the OWL-DL releases of OBO ontologies to assess whether their logical axioms correspond to the meaning intended by their authors. Results We analyzed ontologies and ontology cross products available via the OBO Foundry site http://www.obofoundry.org for existential restrictions (someValuesFrom, from which we examined a random sample of 2,836 clauses. According to a rating done by four experts, 23% of all existential restrictions in OBO Foundry candidate ontologies are suspicious (Cohens' κ = 0.78. We found a smaller proportion of existential restrictions in OBO Foundry cross products are suspicious, but in this case an accurate quantitative judgment is not possible due to a low inter-rater agreement (κ = 0.07. We identified several typical modeling problems, for which satisfactory ontology design patterns based on OWL-DL were proposed. We further describe several usability issues with OBO ontologies, including the lack of ontological commitment for several common terms, and the proliferation of domain-specific relations. Conclusions The current OWL releases of OBO Foundry (and Foundry candidate ontologies contain numerous assertions which do not properly describe the underlying biological reality, or are ambiguous and difficult to interpret. The solution is a better anchoring in upper ontologies and a restriction to relatively few, well defined relation types with given domain and range constraints.

  18. KNOWLEDGE MANAGEMENT IN HIGHER EDUCATION – AN ONTOLOGICAL APPROACH IN COLLABORATIVE ENVIRONMENTS

    OpenAIRE

    Vasile BODEA; Bodea, Constanta-Nicoleta

    2010-01-01

    The paper presents an ontology-based knowledge management system developed for a Romanian university. The university used a classic Management Information System (MIS), which was the starting point for developing the knowledge management system. The developed knowledge management system has a general ontology, containing terms which are valid for a public institution, and specific ontology for two process categories, didactic and research process. The ontology is implemented using Protege. Th...

  19. Una visión general sobre las imágenes del área de la salud, una propuesta de construcción de una ontología

    OpenAIRE

    Bentes Pinto, Virgínia; de Holanda Campos, Henvy; Oliveira Ferreira, Jefferson Leite

    2011-01-01

    Las imágenes del área de la salud son de gran importancia para confirmar la existencia o no de una enfermedad, lo que permite una mayor precisión en los diagnósticos y el tratamiento de patologías. Son ricas fuentes de información y, por lo tanto, requieren una organización informacional. Es en ese contexto que se inscribe este artículo en donde se presentan los resultados de una investigación cuyo objetivo es planificar y construir una ontología de la imagen del campo de ...

  20. Overview of Ontology Servers Research

    Directory of Open Access Journals (Sweden)

    Robert M. Colomb

    2007-06-01

    Full Text Available An ontology is increasingly becoming an essential tool for solving problems in many research areas. The ontology is a complex information object. It can contain millions of concepts in complex relationships. When we want to manage complex information objects, we generally turn to information systems technology. An information system intended to manage ontology is called an ontology server. The ontology server technology is at the time of writing quite immature. Therefore, this paper reviews and compares the main ontology servers that have been reported in the literatures. As a result, we point out several research questions related to server technology.

  1. Process attributes in bio-ontologies

    Directory of Open Access Journals (Sweden)

    Andrade André Q

    2012-08-01

    Full Text Available Abstract Background Biomedical processes can provide essential information about the (mal- functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency. Results We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity. Conclusions We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.

  2. Toward an Ontology of Simulated Social Interaction

    DEFF Research Database (Denmark)

    2016-01-01

    The paper develops a general conceptual framework for the ontological classification of human-robot interaction. After arguing against fictionalist interpretations of human-robot interactions, I present five notions of simulation or partial realization, formally defined in terms of relationships...

  3. Quantum Physics in a different ontology

    CERN Document Server

    de Silva, Nalin

    2010-01-01

    It is shown that neither the wave picture nor the ordinary particle picture offers a satisfactory explanation of the double-slit experiment. The Physicists who have been successful in formulating theories in the Newtonian Paradigm with its corresponding ontology find it difficult to interpret Quantum Physics which deals with particles that are not sensory perceptible. A different interpretation of Quantum Physics based in a different ontology is presented in what follows. According to the new interpretation Quantum particles have different properties from those of Classical Newtonian particles. The interference patterns are explained in terms of particles each of which passes through both slits.

  4. A RESTful way to Manage Ontologies

    Science.gov (United States)

    Lowry, R. K.; Lawrence, B. N.

    2009-04-01

    In 2005 BODC implemented the first version of a vocabulary server developed as a contribution to the NERC DataGrid project. Vocabularies were managed within an RDBMS environment and accessed through a SOAP Web Service API. This was designed as a database query interface with operations targeted at designated database fields and results returned as strings. At the end of 2007 a new version of the server was released capable of serving thesauri and ontologies as well as vocabularies. The SOAP API functionality was enhanced and the output format changed to XML. In addition, a pseudo-RESTful query interface was developed directly addressing terms and lists by URLs. This is in full operational use by projects such as SeaDataNet and will run for the foreseeable future. However, operational experience has ex