WorldWideScience

Sample records for gene ontology analysis

  1. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-09-05

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  2. Gene Ontology

    Directory of Open Access Journals (Sweden)

    Gaston K. Mazandu

    2012-01-01

    Full Text Available The wide coverage and biological relevance of the Gene Ontology (GO, confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.

  3. An Ontology of Gene

    OpenAIRE

    Masuya, Hiroshi; Mizoguchi, Riichiro

    2012-01-01

    The concept of a gene was established in the era of classical genetics and is now essential for life science for elucidating the molecular basis of the coding of genetic information necessary to realize the body of an organism and its biological functions. However, an ontology fully representing multiple aspects of a gene is still not available. In this study, we dissected the biological and ontological definitions of bearers of genetic information, including genes and alleles. Based on this ...

  4. Gene Ontology Consortium: going forward.

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Gene Ontology Consortium: going forward

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. PMID:25428369

  6. Globaltest and GOEAST: two different approaches for Gene Ontology analysis

    NARCIS (Netherlands)

    Hulsegge, B.; Kommadath, A.; Smits, M.A.

    2009-01-01

    Background Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST. Globaltest is a method for testing whether sets of

  7. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    Science.gov (United States)

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  8. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2013-01-01

    Full Text Available One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR method followed by incremental feature selection (IFS. 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  9. Analysis of tumor suppressor genes based on gene ontology and the KEGG pathway.

    Science.gov (United States)

    Yang, Jing; Chen, Lei; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs) are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO) and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS), were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments.

  10. Improvements to cardiovascular gene ontology.

    Science.gov (United States)

    Lovering, Ruth C; Dimmer, Emily C; Talmud, Philippa J

    2009-07-01

    Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of genes and gene products in any organism. Although one might initially wonder what relevance a 'controlled vocabulary' might have for cardiovascular science, such a resource is proving highly useful for researchers investigating complex cardiovascular disease phenotypes as well as those interpreting results from high-throughput methodologies. GO enables the current functional knowledge of individual genes to be used to annotate genomic or proteomic datasets. In this way, the GO data provides a very effective way of linking biological knowledge with the analysis of the large datasets of post-genomics research. Consequently, users of high-throughput methodologies such as expression arrays or proteomics will be the main beneficiaries of such annotation sets. However, as GO annotations increase in quality and quantity, groups using small-scale approaches will gradually begin to benefit too. For example, genome wide association scans for coronary heart disease are identifying novel genes, with previously unknown connections to cardiovascular processes, and the comprehensive annotation of these novel genes might provide clues to their cardiovascular link. At least 4000 genes, to date, have been implicated in cardiovascular processes and an initiative is underway to focus on annotating these genes for the benefit of the cardiovascular community. In this article we review the current uses of Gene Ontology annotation to highlight why Gene Ontology should be of interest to all those involved in cardiovascular research.

  11. A multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors for functional gene analysis.

    Science.gov (United States)

    Weber, Kristoffer; Bartsch, Udo; Stocking, Carol; Fehse, Boris

    2008-04-01

    Functional gene analysis requires the possibility of overexpression, as well as downregulation of one, or ideally several, potentially interacting genes. Lentiviral vectors are well suited for this purpose as they ensure stable expression of complementary DNAs (cDNAs), as well as short-hairpin RNAs (shRNAs), and can efficiently transduce a wide spectrum of cell targets when packaged within the coat proteins of other viruses. Here we introduce a multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors designed according to the "building blocks" principle. Using a wide spectrum of different fluorescent markers, including drug-selectable enhanced green fluorescent protein (eGFP)- and dTomato-blasticidin-S resistance fusion proteins, LeGO vectors allow simultaneous analysis of multiple genes and shRNAs of interest within single, easily identifiable cells. Furthermore, each functional module is flanked by unique cloning sites, ensuring flexibility and individual optimization. The efficacy of these vectors for analyzing multiple genes in a single cell was demonstrated in several different cell types, including hematopoietic, endothelial, and neural stem and progenitor cells, as well as hepatocytes. LeGO vectors thus represent a valuable tool for investigating gene networks using conditional ectopic expression and knock-down approaches simultaneously.

  12. Ontology based molecular signatures for immune cell types via gene expression analysis

    Science.gov (United States)

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  13. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL.

    Science.gov (United States)

    Jupp, Simon; Stevens, Robert; Hoehndorf, Robert

    2012-04-24

    Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of

  14. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.

    Science.gov (United States)

    Karadeniz, İlknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan

    2015-01-01

    Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our

  15. The Ontology of the Gene Ontology

    Science.gov (United States)

    Smith, Barry; Williams, Jennifer; Steffen, Schulze-Kremer

    2003-01-01

    The rapidly increasing wealth of genomic data has driven the development of tools to assist in the task of representing and processing information about genes, their products and their functions. One of the most important of these tools is the Gene Ontology (GO), which is being developed in tandem with work on a variety of bioinformatics databases. An examination of the structure of GO, however, reveals a number of problems, which we believe can be resolved by taking account of certain organizing principles drawn from philosophical ontology. We shall explore the results of applying such principles to GO with a view to improving GO’s consistency and coherence and thus its future applicability in the automated processing of biological data. PMID:14728245

  16. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Science.gov (United States)

    Vashisht, Shikha; Bagler, Ganesh

    2012-01-01

    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  17. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Directory of Open Access Journals (Sweden)

    Shikha Vashisht

    Full Text Available Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  18. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    Science.gov (United States)

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  19. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  20. Markov Chain Ontology Analysis (MCOA)

    Science.gov (United States)

    2012-01-01

    Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches

  1. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

    Science.gov (United States)

    Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J

    2016-02-01

    Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Identification of oral cancer related candidate genes by integrating protein-protein interactions, gene ontology, pathway analysis and immunohistochemistry.

    Science.gov (United States)

    Kumar, Ravindra; Samal, Sabindra K; Routray, Samapika; Dash, Rupesh; Dixit, Anshuman

    2017-05-30

    In the recent years, bioinformatics methods have been reported with a high degree of success for candidate gene identification. In this milieu, we have used an integrated bioinformatics approach assimilating information from gene ontologies (GO), protein-protein interaction (PPI) and network analysis to predict candidate genes related to oral squamous cell carcinoma (OSCC). A total of 40973 PPIs were considered for 4704 cancer-related genes to construct human cancer gene network (HCGN). The importance of each node was measured in HCGN by ten different centrality measures. We have shown that the top ranking genes are related to a significantly higher number of diseases as compared to other genes in HCGN. A total of 39 candidate oral cancer target genes were predicted by combining top ranked genes and the genes corresponding to significantly enriched oral cancer related GO terms. Initial verification using literature and available experimental data indicated that 29 genes were related with OSCC. A detailed pathway analysis led us to propose a role for the selected candidate genes in the invasion and metastasis in OSCC. We further validated our predictions using immunohistochemistry (IHC) and found that the gene FLNA was upregulated while the genes ARRB1 and HTT were downregulated in the OSCC tissue samples.

  3. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

    Science.gov (United States)

    Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

    2017-03-20

    The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .

  4. How the gene ontology evolves.

    Science.gov (United States)

    Leonelli, Sabina; Diehl, Alexander D; Christie, Karen R; Harris, Midori A; Lomax, Jane

    2011-08-05

    Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource. Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms. This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.

  5. Large-scale Gene Ontology analysis of plant transcriptome-derived sequences retrieved by AFLP technology

    Directory of Open Access Journals (Sweden)

    Ramina Angelo

    2008-07-01

    Full Text Available Abstract Background After 10-year-use of AFLP (Amplified Fragment Length Polymorphism technology for DNA fingerprinting and mRNA profiling, large repertories of genome- and transcriptome-derived sequences are available in public databases for model, crop and tree species. AFLP marker systems have been and are being extensively exploited for genome scanning and gene mapping, as well as cDNA-AFLP for transcriptome profiling and differentially expressed gene cloning. The evaluation, annotation and classification of genomic markers and expressed transcripts would be of great utility for both functional genomics and systems biology research in plants. This may be achieved by means of the Gene Ontology (GO, consisting in three structured vocabularies (i.e. ontologies describing genes, transcripts and proteins of any organism in terms of their associated cellular component, biological process and molecular function in a species-independent manner. In this paper, the functional annotation of about 8,000 AFLP-derived ESTs retrieved in the NCBI databases was carried out by using GO terminology. Results Descriptive statistics on the type, size and nature of gene sequences obtained by means of AFLP technology were calculated. The gene products associated with mRNA transcripts were then classified according to the three main GO vocabularies. A comparison of the functional content of cDNA-AFLP records was also performed by splitting the sequence dataset into monocots and dicots and by comparing them to all annotated ESTs of Arabidopsis and rice, respectively. On the whole, the statistical parameters adopted for the in silico AFLP-derived transcriptome-anchored sequence analysis proved to be critical for obtaining reliable GO results. Such an exhaustive annotation may offer a suitable platform for functional genomics, particularly useful in non-model species. Conclusion Reliable GO annotations of AFLP-derived sequences can be gathered through the optimization

  6. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  7. Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis.

    Science.gov (United States)

    Pashaiasl, Maryam; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2016-09-01

    Diminished ovarian reserve (DOR) is one of the reasons for infertility that not only affects both older and young women. Ovarian reserve assessment can be used as a new prognostic tool for infertility treatment decision making. Here, up- and down-regulated gene expression profiles of granulosa cells were analysed to generate a putative interaction map of the involved genes. In addition, gene ontology (GO) analysis was used to get insight intol the biological processes and molecular functions of involved proteins in DOR. Eleven up-regulated genes and nine down-regulated genes were identified and assessed by constructing interaction networks based on their biological processes. PTGS2, CTGF, LHCGR, CITED, SOCS2, STAR and FSTL3 were the key nodes in the up-regulated networks, while the IGF2, AMH, GREM, and FOXC1 proteins were key in the down-regulated networks. MIRN101-1, MIRN153-1 and MIRN194-1 inhibited the expression of SOCS2, while CSH1 and BMP2 positively regulated IGF1 and IGF2. Ossification, ovarian follicle development, vasculogenesis, sequence-specific DNA binding transcription factor activity, and golgi apparatus are the major differential groups between up-regulated and down-regulated genes in DOR. Meta-analysis of publicly available transcriptomic data highlighted the high coexpression of CTGF, connective tissue growth factor, with the other key regulators of DOR. CTGF is involved in organ senescence and focal adhesion pathway according to GO analysis. These findings provide a comprehensive system biology based insight into the aetiology of DOR through network and gene ontology analyses.

  8. Extending the Interpretation of Gene Profiling Microarray Experiments to Pathway Analysis Through the Use of Gene Ontology Terms

    Science.gov (United States)

    Chatziioannou, Aristotelis; Moulos, Panagiotis

    Microarray technology allows the survey of gene expression at a global level by measuring mRNA abundance. However, the grand complexity characterizing a microarray experiment entails the development of computationally powerful tools apt for probing the biological problem studied. Here we propose a suite for flexible, adaptable to a wide range of possible needs of the biological end-user, data-driven interpretation of microarray experiments. The suite is implemented in MATLAB and is making use of two modules, able to perform all steps of typical microarray data analysis starting from data standardization and normalization up to statistical selection and pathway analysis utilizing Gene Ontology Term annotations for the species genomes interrogated, whereas due to its modular structure it is scalable thus enabling the incorporation or its seamless assembly with other existing tools.

  9. An improved method for functional similarity analysis of genes based on Gene Ontology.

    Science.gov (United States)

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  10. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  11. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer.

    Science.gov (United States)

    Lando, Malin; Holden, Marit; Bergersen, Linn C; Svendsrud, Debbie H; Stokke, Trond; Sundfør, Kolbein; Glad, Ingrid K; Kristensen, Gunnar B; Lyng, Heidi

    2009-11-01

    Integrative analysis of gene dosage, expression, and ontology (GO) data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q) associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1) and 13q (FAM48A, MED4) correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.

  12. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  13. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution.

  15. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D.

    2017-01-01

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new ‘hierarchical view’ of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. PMID:27899595

  16. Visualization and analysis of microarray and gene ontology data with treemaps

    Directory of Open Access Journals (Sweden)

    Babaria Ketan

    2004-06-01

    Full Text Available Abstract Background The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Results Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Conclusions Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.

  17. The Gene Ontology (GO) project in 2006

    National Research Council Canada - National Science Library

    2006-01-01

    The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net...

  18. The Gene Ontology project in 2008

    National Research Council Canada - National Science Library

    The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org...

  19. Correlating Expression Data with Gene Function Using Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    LIU,Qi; DENG,Yong; WANG,Chuan; SHI,Tie-Liu; LI,Yi-Xue

    2006-01-01

    Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological conditions.However, whether function annotation by similarity metrics is reliable or not and to what extent the similarity in gene expression patterns is useful for annotation of gene functions, has not been evaluated. This paper made a comprehensive research on the correlation between the similarity of expression data and of gene functions using Gene Ontology. It has been found that although the similarity in expression patterns and the similarity in gene functions are significantly dependent on each other, this association is rather weak. In addition, among the three categories of Gene Ontology, the similarity of expression data is more useful for cellular component annotation than for biological process and molecular function. The results presented are interesting for the gene functions prediction research area.

  20. Orymold: ontology based gene expression data integration and analysis tool applied to rice

    Directory of Open Access Journals (Sweden)

    Segura Jordi

    2009-05-01

    Full Text Available Abstract Background Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies. Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces. Results We designed and implemented a software tool that allows investigators to create their own semantic model of an organism and to use it to dynamically integrate expression data obtained from DNA microarrays and other probe based technologies. The software provides tools to use the semantic model to postulate and validate of hypotheses on the spatial and temporal expression and function of genes. In order to illustrate the software's use and features, we used it to build a semantic model of rice (Oryza sativa and integrated experimental data into it. Conclusion In this paper we describe the development and features of a flexible software application for dynamic gene expression data annotation, integration, and exploration called Orymold. Orymold is freely available for non-commercial users from http://www.oryzon.com/media/orymold.html

  1. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    OpenAIRE

    Zhen Li; Bi-Qing Li; Min Jiang; Lei Chen; Jian Zhang; Lin Liu; Tao Huang

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance...

  2. The language of gene ontology: a Zipf’s law analysis

    Directory of Open Access Journals (Sweden)

    Kalankesh Leila

    2012-06-01

    Full Text Available Abstract Background Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf’s law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results Annotations from the Gene Ontology Annotation project were found to follow Zipf’s law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component. On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

  3. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  4. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  5. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.

    Science.gov (United States)

    Xing, Zhihao; Chu, Chen; Chen, Lei; Kong, Xiangyin

    2016-11-01

    Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor

  6. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    Energy Technology Data Exchange (ETDEWEB)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G. [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada); Law, R. David, E-mail: dlaw@lakeheadu.ca [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada)

    2012-10-15

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  7. A task-based approach for Gene Ontology evaluation.

    Science.gov (United States)

    Clarke, Erik L; Loguercio, Salvatore; Good, Benjamin M; Su, Andrew I

    2013-04-15

    The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.

  8. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data.

    Science.gov (United States)

    Koç, Ibrahim; Caetano-Anollés, Gustavo

    2017-01-01

    The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.

  9. Practical Applications of the Gene Ontology Resource

    Science.gov (United States)

    Huntley, Rachael P.; Dimmer, Emily C.; Apweiler, Rolf

    The Gene Ontology (GO) is a controlled vocabulary that represents knowledge about the functional attributes of gene products in a structured manner and can be used in both computational and human analyses. This vocabulary has been used by diverse curation groups to associate functional information to individual gene products in the form of annotations. GO has proven an invaluable resource for evaluating and interpreting the biological significance of large data sets, enabling researchers to create hypotheses to direct their future research. This chapter provides an overview of the Gene Ontology, how it can be used, and tips on getting the most out of GO analyses.

  10. The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems

    Directory of Open Access Journals (Sweden)

    Rubin Eitan

    2010-09-01

    Full Text Available Abstract Background The Gene Ontology (GO is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis. Here, we propose a new approach to editing the gene ontology, clipping, which is the editing of GO according to biological relevance. Creation of a GO subset by clipping is achieved by removing terms (from all hierarchal levels if they are not functionally relevant to a given domain of interest. Terms that are located in levels higher to relevant terms are kept, thus, biologically irrelevant terms are only removed if they are not parental to terms that are relevant. Results Using this approach, we have created the Neural-Immune Gene Ontology (NIGO subset of GO directed for neurological and immunological systems. We tested the performance of NIGO in extracting knowledge from microarray experiments by conducting functional analysis and comparing the results to those obtained using the full GO and a generic GO slim. NIGO not only improved the statistical scores given to relevant terms, but was also able to retrieve functionally relevant terms that did not pass statistical cutoffs when using the full GO or the slim subset. Conclusions Our results validate the pipeline used to generate NIGO, suggesting it is indeed enriched with terms that are specific to the neural/immune domains. The results suggest that NIGO can enhance the analysis of microarray experiments involving neural and/or immune related systems. They also directly demonstrate the potential such a domain-specific GO has in generating meaningful hypotheses.

  11. BiNChE: A web tool and library for chemical enrichment analysis based on the ChEBI ontology

    OpenAIRE

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-01-01

    Background Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology,...

  12. Fast Gene Ontology based clustering for microarray experiments

    OpenAIRE

    Ovaska Kristian; Laakso Marko; Hautaniemi Sampsa

    2008-01-01

    Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fa...

  13. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  14. Gene function prediction based on the Gene Ontology hierarchical structure.

    Science.gov (United States)

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  15. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  16. Brief isoflurane anaesthesia affects differential gene expression, gene ontology and gene networks in rat brain.

    Science.gov (United States)

    Lowes, Damon A; Galley, Helen F; Moura, Alessandro P S; Webster, Nigel R

    2017-01-15

    Much is still unknown about the mechanisms of effects of even brief anaesthesia on the brain and previous studies have simply compared differential expression profiles with and without anaesthesia. We hypothesised that network analysis, in addition to the traditional differential gene expression and ontology analysis, would enable identification of the effects of anaesthesia on interactions between genes. Rats (n=10 per group) were randomised to anaesthesia with isoflurane in oxygen or oxygen only for 15min, and 6h later brains were removed. Differential gene expression and gene ontology analysis of microarray data was performed. Standard clustering techniques and principal component analysis with Bayesian rules were used along with social network analysis methods, to quantitatively model and describe the gene networks. Anaesthesia had marked effects on genes in the brain with differential regulation of 416 probe sets by at least 2 fold. Gene ontology analysis showed 23 genes were functionally related to the anaesthesia and of these, 12 were involved with neurotransmitter release, transport and secretion. Gene network analysis revealed much greater connectivity in genes from brains from anaesthetised rats compared to controls. Other importance measures were also altered after anaesthesia; median [range] closeness centrality (shortest path) was lower in anaesthetized animals (0.07 [0-0.30]) than controls (0.39 [0.30-0.53], pgenes after anaesthesia and suggests future targets for investigation. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. [Key effect genes responding to nerve injury identified by gene ontology and computer pattern recognition].

    Science.gov (United States)

    Pan, Qian; Peng, Jin; Zhou, Xue; Yang, Hao; Zhang, Wei

    2012-07-01

    In order to screen out important genes from large gene data of gene microarray after nerve injury, we combine gene ontology (GO) method and computer pattern recognition technology to find key genes responding to nerve injury, and then verify one of these screened-out genes. Data mining and gene ontology analysis of gene chip data GSE26350 was carried out through MATLAB software. Cd44 was selected from screened-out key gene molecular spectrum by comparing genes' different GO terms and positions on score map of principal component. Function interferences were employed to influence the normal binding of Cd44 and one of its ligands, chondroitin sulfate C (CSC), to observe neurite extension. Gene ontology analysis showed that the first genes on score map (marked by red *) mainly distributed in molecular transducer activity, receptor activity, protein binding et al molecular function GO terms. Cd44 is one of six effector protein genes, and attracted us with its function diversity. After adding different reagents into the medium to interfere the normal binding of CSC and Cd44, varying-degree remissions of CSC's inhibition on neurite extension were observed. CSC can inhibit neurite extension through binding Cd44 on the neuron membrane. This verifies that important genes in given physiological processes can be identified by gene ontology analysis of gene chip data.

  18. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community. PMID:24093723

  19. Measuring the evolution of ontology complexity: the gene ontology case study.

    Science.gov (United States)

    Dameron, Olivier; Bettembourg, Charles; Le Meur, Nolwenn

    2013-01-01

    Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure. The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred. The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.

  20. Lentiviral gene ontology (LeGO) vectors equipped with novel drug-selectable fluorescent proteins: new building blocks for cell marking and multi-gene analysis.

    Science.gov (United States)

    Weber, K; Mock, U; Petrowitz, B; Bartsch, U; Fehse, B

    2010-04-01

    Vector-encoded fluorescent proteins (FPs) facilitate unambiguous identification or sorting of gene-modified cells by fluorescence-activated cell sorting (FACS). Exploiting this feature, we have recently developed lentiviral gene ontology (LeGO) vectors (www.LentiGO-Vectors.de) for multi-gene analysis in different target cells. In this study, we extend the LeGO principle by introducing 10 different drug-selectable FPs created by fusing one of the five selection marker (protecting against blasticidin, hygromycin, neomycin, puromycin and zeocin) and one of the five FP genes (Cerulean, eGFP, Venus, dTomato and mCherry). All tested fusion proteins allowed both fluorescence-mediated detection and drug-mediated selection of LeGO-transduced cells. Newly generated codon-optimized hygromycin- and neomycin-resistance genes showed improved expression as compared with their ancestors. New LeGO constructs were produced at titers >10(6) per ml (for non-concentrated supernatants). We show efficient combinatorial marking and selection of various cells, including mesenchymal stem cells, simultaneously transduced with different LeGO constructs. Inclusion of the cytomegalovirus early enhancer/chicken beta-actin promoter into LeGO vectors facilitated robust transgene expression in and selection of neural stem cells and their differentiated progeny. We suppose that the new drug-selectable markers combining advantages of FACS and drug selection are well suited for numerous applications and vector systems. Their inclusion into LeGO vectors opens new possibilities for (stem) cell tracking and functional multi-gene analysis.

  1. Classification analysis of microarray data based on ontological engineering

    Institute of Scientific and Technical Information of China (English)

    LI Guo-qi; SHENG Huan-ye

    2007-01-01

    Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip.With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.

  2. Representing Kidney Development Using the Gene Ontology

    Science.gov (United States)

    Alam-Faruque, Yasmin; Hill, David P.; Dimmer, Emily C.; Harris, Midori A.; Foulger, Rebecca E.; Tweedie, Susan; Attrill, Helen; Howe, Douglas G.; Thomas, Stephen Randall; Davidson, Duncan; Woolf, Adrian S.; Blake, Judith A.; Mungall, Christopher J.; O’Donovan, Claire; Apweiler, Rolf; Huntley, Rachael P.

    2014-01-01

    Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease. PMID:24941002

  3. Expansion of the Gene Ontology knowledgebase and resources

    Science.gov (United States)

    2017-01-01

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. PMID:27899567

  4. Expansion of the Gene Ontology knowledgebase and resources.

    Science.gov (United States)

    2017-01-04

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Hierarchical Analysis of the Omega Ontology

    Energy Technology Data Exchange (ETDEWEB)

    Joslyn, Cliff A.; Paulson, Patrick R.

    2009-12-01

    Initial delivery for mathematical analysis of the Omega Ontology. We provide an analysis of the hierarchical structure of a version of the Omega Ontology currently in use within the US Government. After providing an initial statistical analysis of the distribution of all link types in the ontology, we then provide a detailed order theoretical analysis of each of the four main hierarchical links present. This order theoretical analysis includes the distribution of components and their properties, their parent/child and multiple inheritance structure, and the distribution of their vertical ranks.

  6. POEAS: Automated Plant Phenomic Analysis Using Plant Ontology.

    Science.gov (United States)

    Shameer, Khader; Naika, Mahantesha Bn; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Biological enrichment analysis using gene ontology (GO) provides a global overview of the functional role of genes or proteins identified from large-scale genomic or proteomic experiments. Phenomic enrichment analysis of gene lists can provide an important layer of information as well as cellular components, molecular functions, and biological processes associated with gene lists. Plant phenomic enrichment analysis will be useful for performing new experiments to better understand plant systems and for the interpretation of gene or proteins identified from high-throughput experiments. Plant ontology (PO) is a compendium of terms to define the diverse phenotypic characteristics of plant species, including plant anatomy, morphology, and development stages. Adoption of this highly useful ontology is limited, when compared to GO, because of the lack of user-friendly tools that enable the use of PO for statistical enrichment analysis. To address this challenge, we introduce Plant Ontology Enrichment Analysis Server (POEAS) in the public domain. POEAS uses a simple list of genes as input data and performs enrichment analysis using Ontologizer 2.0 to provide results in two levels, enrichment results and visualization utilities, to generate ontological graphs that are of publication quality. POEAS also offers interactive options to identify user-defined background population sets, various multiple-testing correction methods, different enrichment calculation methods, and resampling tests to improve statistical significance. The availability of such a tool to perform phenomic enrichment analyses using plant genes as a complementary resource will permit the adoption of PO-based phenomic analysis as part of analytical workflows. POEAS can be accessed using the URL http://caps.ncbs.res.in/poeas.

  7. OAHG: an integrated resource for annotating human genes with multi-level ontologies

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-01-01

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e–16). PMID:27703231

  8. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.

    Science.gov (United States)

    Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G

    2016-05-01

    Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Ontological Modeling for Integrated Spacecraft Analysis

    Science.gov (United States)

    Wicks, Erica

    2011-01-01

    Current spacecraft work as a cooperative group of a number of subsystems. Each of these requiresmodeling software for development, testing, and prediction. It is the goal of my team to create anoverarching software architecture called the Integrated Spacecraft Analysis (ISCA) to aid in deploying the discrete subsystems' models. Such a plan has been attempted in the past, and has failed due to the excessive scope of the project. Our goal in this version of ISCA is to use new resources to reduce the scope of the project, including using ontological models to help link the internal interfaces of subsystems' models with the ISCA architecture.I have created an ontology of functions specific to the modeling system of the navigation system of a spacecraft. The resulting ontology not only links, at an architectural level, language specificinstantiations of the modeling system's code, but also is web-viewable and can act as a documentation standard. This ontology is proof of the concept that ontological modeling can aid in the integration necessary for ISCA to work, and can act as the prototype for future ISCA ontologies.

  10. Ontology Maintenance using Textual Analysis

    Directory of Open Access Journals (Sweden)

    Yassine Gargouri

    2003-10-01

    Full Text Available Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

  11. Ontology Maintenance using Textual Analysis

    Directory of Open Access Journals (Sweden)

    Yassine Gargouri

    2003-10-01

    Full Text Available Ontologies are continuously confronted to evolution problem. Due to the complexity of the changes to be made, a maintenance process, at least a semi-automatic one, is more and more necessary to facilitate this task and to ensure its reliability. In this paper, we propose a maintenance ontology model for a domain, whose originality is to be language independent and based on a sequence of text processing in order to extract highly related terms from corpus. Initially, we deploy the document classification technique using GRAMEXCO to generate classes of texts segments having a similar information type and identify their shared lexicon, agreed as highly related to a unique topic. This technique allows a first general and robust exploration of the corpus. Further, we apply the Latent Semantic Indexing method to extract from this shared lexicon, the most associated terms that has to be seriously considered by an expert to eventually confirm their relevance and thus updating the current ontology. Finally, we show how the complementarity between these two techniques, based on cognitive foundation, constitutes a powerful refinement process.

  12. Cross-Ontology multi-level association rule mining in the Gene Ontology.

    Directory of Open Access Journals (Sweden)

    Prashanti Manda

    Full Text Available The Gene Ontology (GO has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  13. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network.

    Science.gov (United States)

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C; Mohanty, Bidyut K; Gao, Nan; Tang, Jijun; Lawson, Andrew B; Hannun, Yusuf A; Zheng, W Jim

    2014-10-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

    Science.gov (United States)

    Lewin, Alex; Grieve, Ian C

    2006-10-03

    Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  15. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Grieve Ian C

    2006-10-01

    Full Text Available Abstract Background Gene Ontology (GO terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. Results We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Conclusion Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  16. Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Tratz, Stephen C.; Gregory, Michelle L.

    2006-06-08

    With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  17. Semantic particularity measure for functional characterization of gene sets using gene ontology.

    Science.gov (United States)

    Bettembourg, Charles; Diot, Christian; Dameron, Olivier

    2014-01-01

    Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

  18. Detecting Inconsistencies in the Gene Ontology Using Ontology Databases with Not-gadgets

    Science.gov (United States)

    Lependu, Paea; Dou, Dejing; Howe, Doug

    We present ontology databases with not-gadgets, a method for detecting inconsistencies in an ontology with large numbers of annotated instances by using triggers and exclusion dependencies in a unique way. What makes this work relevant is the use of the database itself, rather than an external reasoner, to detect logical inconsistencies given large numbers of annotated instances. What distinguishes this work is the use of event-driven triggers together with the introduction of explicit negations. We applied this approach toward the serotonin example, an open problem in biomedical informatics which aims to use annotations to help identify inconsistencies in the Gene Ontology. We discovered 75 inconsistencies that have important implications in biology, which include: (1) methods for refining transfer rules used for inferring electronic annotations, and (2) highlighting possible biological differences across species worth investigating.

  19. A Process for Engineer Domain Ontology: An Experience in Developing Business Analysis Ontology

    Directory of Open Access Journals (Sweden)

    Irena ATANASOVA

    2011-01-01

    Full Text Available During the last years several works have been aimed to improve ontology technological as-pects, like representation language and inference mechanisms. This paper presents a discussion on the process and product of an experience in developing ontology for the public sector whose organization requires a strong knowledge management. This process is applied to engineer and develop ontology for Business analysis domain.

  20. The Representation of Heart Development in the Gene Ontology

    Science.gov (United States)

    Khodiyar, Varsha K.; Hill, David P.; Howe, Doug; Berardini, Tanya Z.; Tweedie, Susan; Talmud, Philippa J.; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C.

    2012-01-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. PMID:21419760

  1. The representation of heart development in the gene ontology.

    Science.gov (United States)

    Khodiyar, Varsha K; Hill, David P; Howe, Doug; Berardini, Tanya Z; Tweedie, Susan; Talmud, Philippa J; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C

    2011-06-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.

  2. Integrating Gene Ontology and Blast to predict gene functions

    Institute of Scientific and Technical Information of China (English)

    WANG Cheng-gang; MO Zhi-hong

    2007-01-01

    A GoBlast system was built to predict gene function by integrating Blast search and Gene Ontology (GO) annotations together. The operation system was based on Debian Linux 3.1, with Apache as the web server and Mysql database as the data storage system. FASTA files with GO annotations were taken as the sequence source for blast alignment, which were formatted by wu-formatdb program. The GoBlast system includes three Bioperl modules in Perl: a data input module, a data process module and a data output module. A GoBlast query starts with an amino acid or nucleotide sequence. It ends with an output in an html page, presenting high scoring gene products which are of a high homology to the queried sequence and listing associated GO terms beside respective gene poducts. A simple click on a GO term leads to the detailed explanation of the specific gene function. This avails gene function prediction by Blast. GoBlast can be a very useful tool for functional genome research and is available for free at http://bioq.org/goblast.

  3. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science.

    Science.gov (United States)

    Klie, Sebastian; Nikoloski, Zoran

    2012-01-01

    Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  4. The choice between MapMan and Gene Ontology for automated gene function prediction in plant science

    Directory of Open Access Journals (Sweden)

    Sebastian eKlie

    2012-06-01

    Full Text Available Since the introduction of the Gene Ontology (GO, the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of coexpression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  5. Terminological Ontologies for Risk and Vulnerability Analysis

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2014-01-01

    Risk and vulnerability analyses are an important preliminary stage in civil contingency planning. The Danish Emergency Management Agency has developed a generic model and a set of tools that may be used in the preparedness planning, i.e. for identifying and describing society’s critical functions......, for formulating threat scenarios and for assessing consequences. Terminological ontologies, which are systems of domain specific concepts comprising concept relations and characteristics, are useful, both when describing the central concepts of risk and vulnerability analysis (meta concepts), and for further...... structuring and enriching the taxonomies of society’s critical functions and threats, which form an important part of the model. Creating terminological ontologies is a time consuming work, and therefore there is a need for automatic tools for extraction of terms, concept relations and characteristics...

  6. Visualization of mappings between the gene ontology and cluster trees

    Science.gov (United States)

    Jusufi, Ilir; Kerren, Andreas; Aleksakhin, Vladyslav; Schreiber, Falk

    2012-01-01

    Ontologies and hierarchical clustering are both important tools in biology and medicine to study high-throughput data such as transcriptomics and metabolomics data. Enrichment of ontology terms in the data is used to identify statistically overrepresented ontology terms, giving insight into relevant biological processes or functional modules. Hierarchical clustering is a standard method to analyze and visualize data to find relatively homogeneous clusters of experimental data points. Both methods support the analysis of the same data set, but are usually considered independently. However, often a combined view is desired: visualizing a large data set in the context of an ontology under consideration of a clustering of the data. This paper proposes a new visualization method for this task.

  7. The Vision and Challenges of the Gene Ontology.

    Science.gov (United States)

    Lewis, Suzanna E

    2017-01-01

    The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.

  8. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  9. Automatic, context-specific generation of Gene Ontology slims

    Directory of Open Access Journals (Sweden)

    Sehgal Muhammad

    2010-10-01

    Full Text Available Abstract Background The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual. Results Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power. Conclusions Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.

  10. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. Conclusions The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl. PMID:23895341

  11. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

    Science.gov (United States)

    Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane

    2013-07-29

    The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

  12. Ontological Enrichment of the Genes-to-Systems Breast Cancer Database

    Science.gov (United States)

    Viti, Federica; Mosca, Ettore; Merelli, Ivan; Calabria, Andrea; Alfieri, Roberta; Milanesi, Luciano

    Breast cancer research need the development of specific and suitable tools to appropriately manage biomolecular knowledge. The presented work deals with the integrative storage of breast cancer related biological data, in order to promote a system biology approach to this network disease. To increase data standardization and resource integration, annotations maintained in Genes-to-Systems Breast Cancer (G2SBC) database are associated to ontological terms, which provide a hierarchical structure to organize data enabling more effective queries, statistical analysis and semantic web searching. Exploited ontologies, which cover all levels of the molecular environment, from genes to systems, are among the most known and widely used bioinformatics resources. In G2SBC database ontology terms both provide a semantic layer to improve data storage, accessibility and analysis and represent a user friendly instrument to identify relations among biological components.

  13. Quality assurance of the gene ontology using abstraction networks.

    Science.gov (United States)

    Ochs, Christopher; Perl, Yehoshua; Halper, Michael; Geller, James; Lomax, Jane

    2016-06-01

    The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.

  14. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  15. Guidelines for the functional annotation of microRNAs using the Gene Ontology.

    Science.gov (United States)

    Huntley, Rachael P; Sitnikov, Dmitry; Orlic-Milacic, Marija; Balakrishnan, Rama; D'Eustachio, Peter; Gillespie, Marc E; Howe, Doug; Kalea, Anastasia Z; Maegdefessel, Lars; Osumi-Sutherland, David; Petri, Victoria; Smith, Jennifer R; Van Auken, Kimberly; Wood, Valerie; Zampetaki, Anna; Mayr, Manuel; Lovering, Ruth C

    2016-05-01

    MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). © 2016 Huntley et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. A robust data-driven approach for gene ontology annotation

    OpenAIRE

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For th...

  17. Ontology-Based Analysis of Microarray Data.

    Science.gov (United States)

    Giuseppe, Agapito; Milano, Marianna

    2016-01-01

    The importance of semantic-based methods and algorithms for the analysis and management of biological data is growing for two main reasons. From a biological side, knowledge contained in ontologies is more and more accurate and complete, from a computational side, recent algorithms are using in a valuable way such knowledge. Here we focus on semantic-based management and analysis of protein interaction networks referring to all the approaches of analysis of protein-protein interaction data that uses knowledge encoded into biological ontologies. Semantic approaches for studying high-throughput data have been largely used in the past to mine genomic and expression data. Recently, the emergence of network approaches for investigating molecular machineries has stimulated in a parallel way the introduction of semantic-based techniques for analysis and management of network data. The application of these computational approaches to the study of microarray data can broad the application scenario of them and simultaneously can help the understanding of disease development and progress.

  18. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  19. Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology

    Science.gov (United States)

    Hill, David P.; Berardini, Tanya Z.; Howe, Douglas G.; Van Auken, Kimberly M.

    2010-01-01

    Developmental biology, like many other areas of biology, has undergone a dramatic shift in the perspective from which developmental processes are viewed. Instead of focusing on the actions of a handful of genes or functional RNAs, we now consider the interactions of large functional gene networks and study how these complex systems orchestrate the unfolding of an organism, from gametes to adult. Developmental biologists are beginning to realize that understanding ontogeny on this scale requires the utilization of computational methods to capture, store and represent the knowledge we have about the underlying processes. Here we review the use of the Gene Ontology (GO) to study developmental biology. We describe the organization and structure of the GO and illustrate some of the ways we use it to capture the current understanding of many common developmental processes. We also discuss ways in which gene product annotations using the GO have been used to ask and answer developmental questions in a variety of model developmental systems. We provide suggestions as to how the GO might be used in more powerful ways to address questions about development. Our goal is to provide developmental biologists with enough background about the GO that they can begin to think about how they might use the ontology efficiently and in the most powerful ways possible. PMID:19921742

  20. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    OpenAIRE

    Tsatsoulis Costas; Amthauer Heather A

    2010-01-01

    Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces ce...

  1. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  2. Functional discrimination of gene expression patterns in terms of the gene ontology.

    Science.gov (United States)

    Badea, Liviu

    2003-01-01

    The ever-growing amount of experimental data in molecular biology and genetics requires its automated analysis, by employing sophisticated knowledge discovery tools. We use an Inductive Logic Programming (ILP) learner to induce functional discrimination rules between genes studied using microarrays and found to be differentially expressed in three recently discovered subtypes of adenocarcinoma of the lung. The discrimination rules involve functional annotations from the Proteome HumanPSD database in terms of the Gene Ontology, whose hierarchical structure is essential for this task. While most of the lower levels of gene expression data (pre)processing have been automated, our work can be seen as a step toward automating the higher level functional analysis of the data. We view our application not just as a prototypical example of applying more sophisticated machine learning techniques to the functional analysis of genes, but also as an incentive for developing increasingly more sophisticated functional annotations and ontologies, that can be automatically processed by such learning algorithms.

  3. The Functional Genetics of Handedness and Language Lateralization: Insights from Gene Ontology, Pathway and Disease Association Analyses.

    Science.gov (United States)

    Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian

    2017-01-01

    Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.

  4. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL

    Directory of Open Access Journals (Sweden)

    Aranguren Mikel

    2007-02-01

    Full Text Available Abstract The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.

  5. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  6. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning.

    Science.gov (United States)

    Amthauer, Heather A; Tsatsoulis, Costas

    2010-05-28

    There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  7. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.

    Science.gov (United States)

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-02-21

    Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.

  8. Analysis of the OWL ontologies: A survey

    OpenAIRE

    García-Peñalvo, Francisco José; García, Juan; Therón-Sánchez, Roberto

    2011-01-01

    [ES] Web Ontology Language (OWL) es una de las últimas recomendaciones de la World Wide Web Consortium (W3C) para desarrollar ontologías. El uso de ontologías OWL debe implicar la posibilidad de la evaluación de la calidad y exactitud. Se han propuesto una gran diversidad de herramientas y métricas para alcanzar este objetivo. OntoQA ONTOMETRIC, representa las herramientas más importantes para evaluar ontologías que suelen apoyarse en mediciones. Este trabajo analiza todas estas herramientas ...

  9. Identification of canine platelet proteins separated by differential detergent fractionation for nonelectrophoretic proteomics analyzed by Gene Ontology and pathways analysis

    Directory of Open Access Journals (Sweden)

    Trichler SA

    2014-01-01

    , identification of potential treatment targets and biomarkers, and sets a new standard for the resting platelet proteome. Keywords: proteome, differential detergent fractionation, dog, functional analysis, protein

  10. Protein-Protein Interaction Network and Gene Ontology

    Science.gov (United States)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  11. GOseek: a gene ontology search engine using enhanced keywords.

    Science.gov (United States)

    Taha, Kamal

    2013-01-01

    We propose in this paper a biological search engine called GOseek, which overcomes the limitation of current gene similarity tools. Given a set of genes, GOseek returns the most significant genes that are semantically related to the given genes. These returned genes are usually annotated to one of the Lowest Common Ancestors (LCA) of the Gene Ontology (GO) terms annotating the given genes. Most genes have several annotation GO terms. Therefore, there may be more than one LCA for the GO terms annotating the given genes. The LCA annotating the genes that are most semantically related to the given gene is the one that receives the most aggregate semantic contribution from the GO terms annotating the given genes. To identify this LCA, GOseek quantifies the contribution of the GO terms annotating the given genes to the semantics of their LCAs. That is, it encodes the semantic contribution into a numeric format. GOseek uses microarray experiment data to rank result genes based on their significance. We evaluated GOseek experimentally and compared it with a comparable gene prediction tool. Results showed marked improvement over the tool.

  12. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.

  13. The mammalian adult neurogenesis gene ontology (MANGO) provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Science.gov (United States)

    Overall, Rupert W; Paszkowski-Rogacz, Maciej; Kempermann, Gerd

    2012-01-01

    Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes) to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already successful 'top-down' approach of the Gene Ontology.

  14. The mammalian adult neurogenesis gene ontology (MANGO provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Directory of Open Access Journals (Sweden)

    Rupert W Overall

    Full Text Available BACKGROUND: Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. METHODOLOGY/PRINCIPAL FINDINGS: We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. CONCLUSIONS/SIGNIFICANCE: The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already

  15. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  16. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  17. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  18. Choose wisely: Network, ontology and annotation resources for the analysis of Staphylococcus aureus omics data.

    Science.gov (United States)

    Broadbent, J A; Sampson, D L; Broszczak, D A; Upton, Z; Huygens, F

    2015-05-01

    Staphylococcus aureus (S. aureus) is a prominent human and livestock pathogen investigated widely using omic technologies. Critically, due to availability, low visibility or scattered resources, robust network and statistical contextualisation of the resulting data is generally under-represented. Here, we present novel meta-analyses of freely-accessible molecular network and gene ontology annotation information resources for S. aureus omics data interpretation. Furthermore, through the application of the gene ontology annotation resources we demonstrate their value and ability (or lack-there-of) to summarise and statistically interpret the emergent properties of gene expression and protein abundance changes using publically available data. This analysis provides simple metrics for network selection and demonstrates the availability and impact that gene ontology annotation selection can have on the contextualisation of bacterial omics data. Copyright © 2015 Elsevier GmbH. All rights reserved.

  19. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  20. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...... can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties...

  1. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  3. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  4. Correlating information contents of gene ontology terms to infer semantic similarity of gene products.

    Science.gov (United States)

    Gan, Mingxin

    2014-01-01

    Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  5. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

    Science.gov (United States)

    Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

    2016-03-11

    Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.

  6. Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach.

    Science.gov (United States)

    Patel, Sejal; Park, Min Tae M; Chakravarty, M Mallar; Knight, Jo

    2016-01-01

    Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease.

  7. GOParGenPy: a high throughput method to generate gene ontology data matrices.

    Science.gov (United States)

    Kumar, Ajay Anand; Holm, Liisa; Toronen, Petri

    2013-08-08

    Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

  8. Gene-based and semantic structure of the Gene Ontology as a complex network

    Science.gov (United States)

    Coronnello, Claudia; Tumminello, Michele; Miccichè, Salvatore

    2016-09-01

    The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The Gene Ontology (GO) is constantly evolving over time. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. Here we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium. Moreover, the GO is a natural example of bipartite network of terms and genes. Here we are interested in studying the properties of the projected network of terms, i.e. a gene-based weighted network of GO terms, in which a link between any two terms is set if at least one gene is annotated in both terms. One aim of the present paper is to compare the structural properties of the semantic and the gene-based network. The relative importance of terms is very similar in the two networks, but the community structure changes. We show that in some cases GO terms that appear to be distinct from a semantic point of view are instead connected, and appear in the same community when considering their gene content. The identification of such gene-based communities of terms might therefore be the basis of a simple protocol aiming at improving the semantic structure of GO. Information about terms that share large gene content might also be important from a biomedical point of view, as it might reveal how genes over-expressed in a certain term also affect other biological processes, molecular functions and cellular components not directly linked according to GO semantics.

  9. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  10. Ontological Discovery Environment: a system for integrating gene-phenotype associations.

    Science.gov (United States)

    Baker, Erich J; Jay, Jeremy J; Philip, Vivek M; Zhang, Yun; Li, Zuopan; Kirova, Roumyana; Langston, Michael A; Chesler, Elissa J

    2009-12-01

    The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental

  11. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  12. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  13. Evaluation of clustering algorithms for gene expression data using gene ontology annotations

    Institute of Scientific and Technical Information of China (English)

    MA Ning; ZHANG Zheng-guo

    2012-01-01

    Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.

  14. Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

    Science.gov (United States)

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  15. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining.

    Science.gov (United States)

    Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2012-12-20

    Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since

  16. Bi-directional semantic similarity for gene ontology to optimize biological and clinical analyses.

    Science.gov (United States)

    Bien, Sang Jay; Park, Chan Hee; Shim, Hae Jin; Yang, Woongcheol; Kim, Jihun; Kim, Ju Han

    2012-01-01

    Semantic similarity analysis facilitates automated semantic explanations of biological and clinical data annotated by biomedical ontologies. Gene ontology (GO) has become one of the most important biomedical ontologies with a set of controlled vocabularies, providing rich semantic annotations for genes and molecular phenotypes for diseases. Current methods for measuring GO semantic similarities are limited to considering only the ancestor terms while neglecting the descendants. One can find many GO term pairs whose ancestors are identical but whose descendants are very different and vice versa. Moreover, the lower parts of GO trees are full of terms with more specific semantics. This study proposed a method of measuring semantic similarities between GO terms using the entire GO tree structure, including both the upper (ancestral) and the lower (descendant) parts. Comprehensive comparison studies were performed with well-known information content-based and graph structure-based semantic similarity measures with protein sequence similarities, gene expression-profile correlations, protein-protein interactions, and biological pathway analyses. The proposed bidirectional measure of semantic similarity outperformed other graph-based and information content-based methods.

  17. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

    Science.gov (United States)

    Köhler, Sebastian; Doelken, Sandra C; Ruef, Barbara J; Bauer, Sebastian; Washington, Nicole; Westerfield, Monte; Gkoutos, George; Schofield, Paul; Smedley, Damian; Lewis, Suzanna E; Robinson, Peter N; Mungall, Christopher J

    2013-01-01

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  18. The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation.

    Science.gov (United States)

    Malone, James; Brown, Andy; Lister, Allyson L; Ison, Jon; Hull, Duncan; Parkinson, Helen; Stevens, Robert

    2014-01-01

    Biomedical ontologists to date have concentrated on ontological descriptions of biomedical entities such as gene products and their attributes, phenotypes and so on. Recently, effort has diversified to descriptions of the laboratory investigations by which these entities were produced. However, much biological insight is gained from the analysis of the data produced from these investigations, and there is a lack of adequate descriptions of the wide range of software that are central to bioinformatics. We need to describe how data are analyzed for discovery, audit trails, provenance and reproducibility. The Software Ontology (SWO) is a description of software used to store, manage and analyze data. Input to the SWO has come from beyond the life sciences, but its main focus is the life sciences. We used agile techniques to gather input for the SWO and keep engagement with our users. The result is an ontology that meets the needs of a broad range of users by describing software, its information processing tasks, data inputs and outputs, data formats versions and so on. Recently, the SWO has incorporated EDAM, a vocabulary for describing data and related concepts in bioinformatics. The SWO is currently being used to describe software used in multiple biomedical applications. The SWO is another element of the biomedical ontology landscape that is necessary for the description of biomedical entities and how they were discovered. An ontology of software used to analyze data produced by investigations in the life sciences can be made in such a way that it covers the important features requested and prioritized by its users. The SWO thus fits into the landscape of biomedical ontologies and is produced using techniques designed to keep it in line with user's needs. The Software Ontology is available under an Apache 2.0 license at http://theswo.sourceforge.net/; the Software Ontology blog can be read at http://softwareontology.wordpress.com.

  19. Identifying redundant and missing relations in the gene ontology.

    Science.gov (United States)

    Mougin, Fleur

    2015-01-01

    Significant efforts have been undertaken for providing the Gene Ontology (GO) in a computable format as well as for enriching it with logical definitions. Automated approaches can thus be applied to GO for assisting its maintenance and for checking its internal coherence. However, inconsistencies may still remain within GO. In this frame, the objective of this work was to audit GO relationships. First, reasoning over relationships was exploited for detecting redundant relations existing between GO concepts. Missing necessary and sufficient conditions were then identified based on the compositional structure of the preferred names of GO concepts. More than one thousand redundant relations and 500 missing necessary and sufficient conditions were found. The proposed approach was thus successful for detecting inconsistencies within GO relations. The application of lexical approaches as well as the exploitation of synonyms and textual definitions could be useful for identifying additional necessary and sufficient conditions. Multiple necessary and sufficient conditions for a given GO concept may be indicative of inconsistencies.

  20. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  1. GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms.

    NARCIS (Netherlands)

    M. Smid (Marcel); L.C.J. Dorssers (Lambert)

    2004-01-01

    textabstractMOTIVATION: Retrieval of information on biological processes from large-scale expression data is still a time-consuming task. An automated analysis utilizing all expression information would greatly increase our understanding of the samples under study. RESULTS: We

  2. GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms.

    NARCIS (Netherlands)

    M. Smid (Marcel); L.C.J. Dorssers (Lambert)

    2004-01-01

    textabstractMOTIVATION: Retrieval of information on biological processes from large-scale expression data is still a time-consuming task. An automated analysis utilizing all expression information would greatly increase our understanding of the samples under study. RESULTS: We desc

  3. Codon bias and gene ontology in holometabolous and hemimetabolous insects.

    Science.gov (United States)

    Carlini, David B; Makowski, Matthew

    2015-12-01

    The relationship between preferred codon use (PCU), developmental mode, and gene ontology (GO) was investigated in a sample of nine insect species with sequenced genomes. These species were selected to represent two distinct modes of insect development, holometabolism and hemimetabolism, with an aim toward determining whether the differences in developmental timing concomitant with developmental mode would be mirrored by differences in PCU in their developmental genes. We hypothesized that the developmental genes of holometabolous insects should be under greater selective pressure for efficient translation, manifest as increased PCU, than those of hemimetabolous insects because holometabolism requires abundant protein expression over shorter time intervals than hemimetabolism, where proteins are required more uniformly in time. Preferred codon sets were defined for each species, from which the frequency of PCU for each gene was obtained. Although there were substantial differences in the genomic base composition of holometabolous and hemimetabolous insects, both groups exhibited a general preference for GC-ending codons, with the former group having higher PCU averaged across all genes. For each species, the biological process GO term for each gene was assigned that of its Drosophila homolog(s), and PCU was calculated for each GO term category. The top two GO term categories for PCU enrichment in the holometabolous insects were anatomical structure development and cell differentiation. The increased PCU in the developmental genes of holometabolous insects may reflect a general strategy to maximize the protein production of genes expressed in bursts over short time periods, e.g., heat shock proteins. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 686-698, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  4. A new gene ontology-based measure for the functional similarity of gene products

    Institute of Scientific and Technical Information of China (English)

    QI Guo-long; QIAN Shi-yu; FANG Ji-qian

    2013-01-01

    Background Although biomedical ontologies have standardized the representation of gene products across species and databases,a method for determining the functional similarities of gene products has not yet been developed.Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph.Our measure was compared with Resnik's measure in two applications,which were based on the association of the measure used with the gene co-expression and the proteinprotein interactions.Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions,and our measure performed the best overall.Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.

  5. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    Science.gov (United States)

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  6. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

    Science.gov (United States)

    Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin M

    2016-09-09

    Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

  7. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  8. Ontology-based specification, identification and analysis of perioperative risks.

    Science.gov (United States)

    Uciteli, Alexandr; Neumann, Juliane; Tahar, Kais; Saleh, Kutaiba; Stucke, Stephan; Faulbrück-Röhr, Sebastian; Kaeding, André; Specht, Martin; Schmidt, Tobias; Neumuth, Thomas; Besting, Andreas; Stegemann, Dominik; Portheine, Frank; Herre, Heinrich

    2017-09-06

    Medical personnel in hospitals often works under great physical and mental strain. In medical decision-making, errors can never be completely ruled out. Several studies have shown that between 50 and 60% of adverse events could have been avoided through better organization, more attention or more effective security procedures. Critical situations especially arise during interdisciplinary collaboration and the use of complex medical technology, for example during surgical interventions and in perioperative settings (the period of time before, during and after surgical intervention). In this paper, we present an ontology and an ontology-based software system, which can identify risks across medical processes and supports the avoidance of errors in particular in the perioperative setting. We developed a practicable definition of the risk notion, which is easily understandable by the medical staff and is usable for the software tools. Based on this definition, we developed a Risk Identification Ontology (RIO) and used it for the specification and the identification of perioperative risks. An agent system was developed, which gathers risk-relevant data during the whole perioperative treatment process from various sources and provides it for risk identification and analysis in a centralized fashion. The results of such an analysis are provided to the medical personnel in form of context-sensitive hints and alerts. For the identification of the ontologically specified risks, we developed an ontology-based software module, called Ontology-based Risk Detector (OntoRiDe). About 20 risks relating to cochlear implantation (CI) have already been implemented. Comprehensive testing has indicated the correctness of the data acquisition, risk identification and analysis components, as well as the web-based visualization of results.

  9. Onto-CC: a web server for identifying Gene Ontology conceptual clusters

    Science.gov (United States)

    Romero-Zaliz, R.; del Val, C.; Cobb, J. P.; Zwir, I.

    2008-01-01

    The Gene Ontology (GO) vocabulary has been extensively explored to analyze the functions of coexpressed genes. However, despite its extended use in Biology and Medical Sciences, there are still high levels of uncertainty about which ontology (i.e. Molecular Process, Cellular Component or Molecular Function) should be used, and at which level of specificity. Moreover, the GO database can contain incomplete information resulting from human annotations, or highly influenced by the available knowledge about a specific branch in an ontology. In spite of these drawbacks, there is a trend to ignore these problems and even use GO terms to conduct searches of gene expression profiles (i.e. expression + GO) instead of more cautious approaches that just consider them as an independent source of validation (i.e. expression versus GO). Consequently, propagating the uncertainty and producing biased analysis of the required gene grouping hypotheses. We proposed a web tool, Onto-CC, as an automatic method specially suited for independent explanation/validation of gene grouping hypotheses (e.g. coexpressed genes) based on GO clusters (i.e. expression versus GO). Onto-CC approach reduces the uncertainty of the queries by identifying optimal conceptual clusters that combine terms from different ontologies simultaneously, as well as terms defined at different levels of specificity in the GO hierarchy. To do so, we implemented the EMO-CC methodology to find clusters in structural databases [GO Directed acyclic Graph (DAG) tree], inspired on Conceptual Clustering algorithms. This approach allows the management of optimal cluster sets as potential parallel hypotheses, guided by multiobjective/multimodal optimization techniques. Therefore, we can generate alternative and, still, optimal explanations of queries that can provide new insights for a given problem. Onto-CC has been successfully used to test different medical and biological hypotheses including the explanation and prediction of

  10. Combining Hierarchical and Associative Gene Ontology Relations with Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Riensche, Roderick M.; Beagley, Nathaniel; Baddeley, Bob L.; Tratz, Stephen C.; Gregory, Michelle L.

    2007-03-01

    Gene and gene product similarity is a fundamental diagnostic measure in analyzing biological data and constructing predictive models for functional genomics. With the rising influence of the Gene Ontology, two complementary approaches have emerged where the similarity between two genes or gene products is obtained by comparing Gene Ontology (GO) annotations associated with the genes or gene products. One approach captures GO-based similarity in terms of hierarchical relations within each gene subontology. The other approach identifies GO-based similarity in terms of associative relations across the three gene subontologies. We propose a novel methodology where the two approaches can be merged with ensuing benefits in coverage and accuracy, and demonstrate that further improvements can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  11. Constructing Ontology for Knowledge Sharing of Materials Failure Analysis

    Directory of Open Access Journals (Sweden)

    Peng Shi

    2014-01-01

    Full Text Available Materials failure indicates the fault with materials or components during their performance. To avoid the reoccurrence of similar failures, materials failure analysis is executed to investigate the reasons for the failure and to propose improved strategies. The whole procedure needs sufficient domain knowledge and also produces valuable new knowledge. However, the information about the materials failure analysis is usually retained by the domain expert, and its sharing is technically difficult. This phenomenon may seriously reduce the efficiency and decrease the veracity of the failure analysis. To solve this problem, this paper adopts ontology, a novel technology from the Semantic Web, as a tool for knowledge representation and sharing and describes the construction of the ontology to obtain information concerning the failure analysis, application area, materials, and failure cases. The ontology represented information is machine-understandable and can be easily shared through the Internet. At the same time, failure case intelligent retrieval, advanced statistics, and even automatic reasoning can be accomplished based on ontology represented knowledge. Obviously this can promote the knowledge sharing of materials service safety and improve the efficiency of failure analysis. The case of a nuclear power plant area is presented to show the details and benefits of this method.

  12. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  13. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  14. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    Science.gov (United States)

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  15. Changes in winter depression phenotype correlate with white blood cell gene expression profiles: a combined metagene and gene ontology approach.

    Science.gov (United States)

    Bosker, Fokko J; Terpstra, Peter; Gladkevich, Anatoliy V; Janneke Dijck-Brouwer, D A; te Meerman, Gerard; Nolen, Willem A; Schoevers, Robert A; Meesters, Ybe

    2015-04-03

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior and following bright light therapy and in summer. RNA was isolated, converted into cRNA, amplified and hybridized on Illumina® gene expression arrays. The raw optical array data were quantile normalized and thereafter analyzed using a metagene approach, based on previously published Affymetrix gene array data. The raw data were also subjected to a secondary analysis focusing on circadian genes and genes involved in serotonergic neurotransmission. Differences between the conditions were analyzed, using analysis of variance on the principal components of the metagene score matrix. After correction for multiple testing no statistically significant differences were found. Another approach uses the correlation between metagene factor weights and the actual expression values, averaged over conditions. When comparing the correlations of winter vs. summer and bright light therapy vs. summer significant changes for several metagenes were found. Subsequent gene ontology analyses (DAVID and GeneTrail) of 5 major metagenes suggest an interaction between brain and white blood cells. The hypothesis driven analysis with a smaller group of genes failed to demonstrate any significant effects. The results from the combined metagene and gene ontology analyses support the idea of communication between brain and white blood cells. Future studies will need a much larger sample size to obtain information at the level of single genes. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

    Science.gov (United States)

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.

  17. Formal modeling of Gene Ontology annotation predictions based on factor graphs

    Science.gov (United States)

    Spetale, Flavio; Murillo, Javier; Tapia, Elizabeth; Arce, Débora; Ponce, Sergio; Bulacio, Pilar

    2016-04-01

    Gene Ontology (GO) is a hierarchical vocabulary for gene product annotation. Its synergy with machine learning classification methods has been widely used for the prediction of protein functions. Current classification methods rely on heuristic solutions to check the consistency with some aspects of the underlying GO structure. In this work we formalize the GO is-a relationship through predicate logic. Moreover, an ontology model based on Forney Factor Graph (FFG) is shown on a general fragment of Cellular Component GO.

  18. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  19. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  20. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  1. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  2. Efficient Management of Biomedical Ontology Versions

    Science.gov (United States)

    Kirsten, Toralf; Hartung, Michael; Groß, Anika; Rahm, Erhard

    Ontologies have become very popular in life sciences and other domains. They mostly undergo continuous changes and new ontology versions are frequently released. However, current analysis studies do not consider the ontology changes reflected in different versions but typically limit themselves to a specific ontology version which may quickly become obsolete. To allow applications easy access to different ontology versions we propose a central and uniform management of the versions of different biomedical ontologies. The proposed database approach takes concept and structural changes of succeeding ontology versions into account thereby supporting different kinds of change analysis. Furthermore, it is very space-efficient by avoiding redundant storage of ontology components which remain unchanged in different versions. We evaluate the storage requirements and query performance of the proposed approach for the Gene Ontology.

  3. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  4. Sentiment analysis and ontology engineering an environment of computational intelligence

    CERN Document Server

    Chen, Shyi-Ming

    2016-01-01

    This edited volume provides the reader with a fully updated, in-depth treatise on the emerging principles, conceptual underpinnings, algorithms and practice of Computational Intelligence in the realization of concepts and implementation of models of sentiment analysis and ontology –oriented engineering. The volume involves studies devoted to key issues of sentiment analysis, sentiment models, and ontology engineering. The book is structured into three main parts. The first part offers a comprehensive and prudently structured exposure to the fundamentals of sentiment analysis and natural language processing. The second part consists of studies devoted to the concepts, methodologies, and algorithmic developments elaborating on fuzzy linguistic aggregation to emotion analysis, carrying out interpretability of computational sentiment models, emotion classification, sentiment-oriented information retrieval, a methodology of adaptive dynamics in knowledge acquisition. The third part includes a plethora of applica...

  5. DNA Microarray and Gene Ontology Enrichment Analysis Reveals That a Mutation in opsX Affects Virulence and Chemotaxis in Xanthomonas oryzae pv. oryzae

    Directory of Open Access Journals (Sweden)

    Hong-Il Kim

    2016-06-01

    Full Text Available Xanthomonas oryzae pv. oryzae (Xoo causes bacterial leaf blight (BLB in rice (Oryza sativa L.. In this study, we investigated the effect of a mutation in opsX (XOO1056, which encodes a saccharide biosynthesis regulatory protein, on the virulence and bacterial chemotaxis of Xoo. We performed DNA microarray analysis, which showed that 63 of 2,678 genes, including genes related to bacterial motility (flagellar and chemotaxis proteins were significantly downregulated (<−2 log₂ fold changes by the mutation in opsX. Indeed, motility assays showed that the mutant strain was nonmotile on semisolid agar swarm plates. In addition, a mutant strain (opsX::Tn5 showed decreased virulence against the susceptible rice cultivar, IR24. Quantitative real-time RT-PCR reaction was performed to confirm the expression levels of these genes, including those related to flagella and chemotaxis, in the opsX mutant. Our findings revealed that mutation of opsX affects both virulence and bacterial motility. These results will help to improve our understanding of Xoo and provide insight into Xoo-rice interactions.

  6. Identification of disease-causing genes using microarray data mining and Gene Ontology.

    Science.gov (United States)

    Mohammadi, Azadeh; Saraee, Mohammad H; Salehi, Mansoor

    2011-01-26

    One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene

  7. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Directory of Open Access Journals (Sweden)

    Saraee Mohammad H

    2011-01-01

    Full Text Available Abstract Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional

  8. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Science.gov (United States)

    2011-01-01

    Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional methods by adding a redundancy

  9. FYPO: the fission yeast phenotype ontology.

    Science.gov (United States)

    Harris, Midori A; Lock, Antonia; Bähler, Jürg; Oliver, Stephen G; Wood, Valerie

    2013-07-01

    To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).

  10. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data.

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.

  11. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L.

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer’s disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology. PMID:28199395

  12. Initial implementation of a comparative data analysis ontology.

    Science.gov (United States)

    Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico; Thompson, Julie D; Stoltzfus, Arlin

    2009-07-03

    Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species) are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: "Operational Taxonomic Units" (OTUs), representing the entities to be compared; "character-state data" representing the observations compared among OTUs; "phylogenetic tree", representing the historical path of evolution among the entities; and "transitions", the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL), we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO). CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc.) that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.

  13. An empirical analysis of ontology reuse in BioPortal.

    Science.gov (United States)

    Ochs, Christopher; Perl, Yehoshua; Geller, James; Arabandi, Sivaram; Tudorache, Tania; Musen, Mark A

    2017-07-01

    Biomedical ontologies often reuse content (i.e., classes and properties) from other ontologies. Content reuse enables a consistent representation of a domain and reusing content can save an ontology author significant time and effort. Prior studies have investigated the existence of reused terms among the ontologies in the NCBO BioPortal, but as of yet there has not been a study investigating how the ontologies in BioPortal utilize reused content in the modeling of their own content. In this study we investigate how 355 ontologies hosted in the NCBO BioPortal reuse content from other ontologies for the purposes of creating new ontology content. We identified 197 ontologies that reuse content. Among these ontologies, 108 utilize reused classes in the modeling of their own classes and 116 utilize reused properties in class restrictions. Current utilization of reuse and quality issues related to reuse are discussed. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    Science.gov (United States)

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery.

  15. Multimodal probabilistic generative models for time-course gene expression data and Gene Ontology (GO) tags.

    Science.gov (United States)

    Gabbur, Prasad; Hoying, James; Barnard, Kobus

    2015-10-01

    We propose four probabilistic generative models for simultaneously modeling gene expression levels and Gene Ontology (GO) tags. Unlike previous approaches for using GO tags, the joint modeling framework allows the two sources of information to complement and reinforce each other. We fit our models to three time-course datasets collected to study biological processes, specifically blood vessel growth (angiogenesis) and mitotic cell cycles. The proposed models result in a joint clustering of genes and GO annotations. Different models group genes based on GO tags and their behavior over the entire time-course, within biological stages, or even individual time points. We show how such models can be used for biological stage boundary estimation de novo. We also evaluate our models on biological stage prediction accuracy of held out samples. Our results suggest that the models usually perform better when GO tag information is included. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  17. Cellular functions of genetically imprinted genes in human and mouse as annotated in the gene ontology.

    Science.gov (United States)

    Hamed, Mohamed; Ismael, Siba; Paulsen, Martina; Helms, Volkhard

    2012-01-01

    By analyzing the cellular functions of genetically imprinted genes as annotated in the Gene Ontology for human and mouse, we found that imprinted genes are often involved in developmental, transport and regulatory processes. In the human, paternally expressed genes are enriched in GO terms related to the development of organs and of anatomical structures. In the mouse, maternally expressed genes regulate cation transport as well as G-protein signaling processes. Furthermore, we investigated if imprinted genes are regulated by common transcription factors. We identified 25 TF families that showed an enrichment of binding sites in the set of imprinted genes in human and 40 TF families in mouse. In general, maternally and paternally expressed genes are not regulated by different transcription factors. The genes Nnat, Klf14, Blcap, Gnas and Ube3a contribute most to the enrichment of TF families. In the mouse, genes that are maternally expressed in placenta are enriched for AP1 binding sites. In the human, we found that these genes possessed binding sites for both, AP1 and SP1.

  18. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  19. Initial Implementation of a comparative Data Analysis Ontology

    Directory of Open Access Journals (Sweden)

    Francisco Prosdocimi

    2009-01-01

    Full Text Available Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: “Operational Taxonomic Units” (OTUs, representing the entities to be compared; “character-state data” representing the observations compared among OTUs; “phylogenetic tree”, representing the historical path of evolution among the entities; and “transitions”, the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL, we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO. CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc. that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.

  20. Initial Implementation of a Comparative Data Analysis Ontology

    Directory of Open Access Journals (Sweden)

    Francisco Prosdocimi

    2009-07-01

    Full Text Available Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: “Operational Taxonomic Units” (OTUs, representing the entities to be compared; “character-state data” representing the observations compared among OTUs; “phylogenetic tree”, representing the historical path of evolution among the entities; and “transitions”, the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL, we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO. CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc. that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.

  1. GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

    Directory of Open Access Journals (Sweden)

    Yang Da

    2007-01-01

    Full Text Available Abstract Background Rapid progress in high-throughput biotechnologies (e.g. microarrays and exponential accumulation of gene functional knowledge make it promising for systematic understanding of complex human diseases at functional modules level. Based on Gene Ontology, a large number of automatic tools have been developed for the functional analysis and biological interpretation of the high-throughput microarray data. Results Different from the existing tools such as Onto-Express and FatiGO, we develop a tool named GO-2D for identifying 2-dimensional functional modules based on combined GO categories. For example, it refines biological process categories by sorting their genes into different cellular component categories, and then extracts those combined categories enriched with the interesting genes (e.g., the differentially expressed genes for identifying the cellular-localized functional modules. Applications of GO-2D to the analyses of two human cancer datasets show that very specific disease-relevant processes can be identified by using cellular location information. Conclusion For studying complex human diseases, GO-2D can extract functionally compact and detailed modules such as the cellular-localized ones, characterizing disease-relevant modules in terms of both biological processes and cellular locations. The application results clearly demonstrate that 2-dimensional approach complementary to current 1-dimensional approach is powerful for finding modules highly relevant to diseases.

  2. SoFoCles: feature filtering for microarray classification based on gene ontology.

    Science.gov (United States)

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  3. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  4. A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

    Science.gov (United States)

    Huntley, Rachael P; Harris, Midori A; Alam-Faruque, Yasmin; Blake, Judith A; Carbon, Seth; Dietze, Heiko; Dimmer, Emily C; Foulger, Rebecca E; Hill, David P; Khodiyar, Varsha K; Lock, Antonia; Lomax, Jane; Lovering, Ruth C; Mutowo-Meullenet, Prudence; Sawford, Tony; Van Auken, Kimberly; Wood, Valerie; Mungall, Christopher J

    2014-05-21

    The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.

  5. ProKinO: an ontology for integrative analysis of protein kinases in cancer.

    Directory of Open Access Journals (Sweden)

    Gurinder Gosal

    Full Text Available BACKGROUND: Protein kinases are a large and diverse family of enzymes that are genomically altered in many human cancers. Targeted cancer genome sequencing efforts have unveiled the mutational profiles of protein kinase genes from many different cancer types. While mutational data on protein kinases is currently catalogued in various databases, integration of mutation data with other forms of data on protein kinases such as sequence, structure, function and pathway is necessary to identify and characterize key cancer causing mutations. Integrative analysis of protein kinase data, however, is a challenge because of the disparate nature of protein kinase data sources and data formats. RESULTS: Here, we describe ProKinO, a protein kinase-specific ontology, which provides a controlled vocabulary of terms, their hierarchy, and relationships unifying sequence, structure, function, mutation and pathway information on protein kinases. The conceptual representation of such diverse forms of information in one place not only allows rapid discovery of significant information related to a specific protein kinase, but also enables large-scale integrative analysis of protein kinase data in ways not possible through other kinase-specific resources. We have performed several integrative analyses of ProKinO data and, as an example, found that a large number of somatic mutations (∼288 distinct mutations associated with the haematopoietic neoplasm cancer type map to only 8 kinases in the human kinome. This is in contrast to glioma, where the mutations are spread over 82 distinct kinases. We also provide examples of how ontology-based data analysis can be used to generate testable hypotheses regarding cancer mutations. CONCLUSION: We present an integrated framework for large-scale integrative analysis of protein kinase data. Navigation and analysis of ontology data can be performed using the ontology browser available at: http://vulcan.cs.uga.edu/prokino.

  6. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

    Science.gov (United States)

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; Wang, Yadong; Rhee, Seung Y; Chen, Jin

    2015-02-14

    Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/~jinchen/NETSIM .

  7. Interactome and Gene Ontology provide congruent yet subtly different views of a eukaryotic cell

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2009-07-01

    Full Text Available Abstract Background The characterization of the global functional structure of a cell is a major goal in bioinformatics and systems biology. Gene Ontology (GO and the protein-protein interaction network offer alternative views of that structure. Results This study presents a comparison of the global structures of the Gene Ontology and the interactome of Saccharomyces cerevisiae. Sensitive, unsupervised methods of clustering applied to a large fraction of the proteome led to establish a GO-interactome correlation value of +0.47 for a general dataset that contains both high and low-confidence interactions and +0.58 for a smaller, high-confidence dataset. Conclusion The structures of the yeast cell deduced from GO and interactome are substantially congruent. However, some significant differences were also detected, which may contribute to a better understanding of cell function and also to a refinement of the current ontologies.

  8. The effects of shared information on semantic calculations in the gene ontology.

    Science.gov (United States)

    Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

    2017-01-01

    The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

  9. Muscle Research and Gene Ontology: New standards for improved data integration

    Directory of Open Access Journals (Sweden)

    Nori Alessandra

    2009-01-01

    Full Text Available Abstract Background The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community. Results The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature. Conclusion The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.

  10. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology.

    Science.gov (United States)

    Ramsak, Živa; Baebler, Špela; Rotter, Ana; Korbar, Matej; Mozetic, Igor; Usadel, Björn; Gruden, Kristina

    2014-01-01

    GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.

  11. GOPubMed: research on information retrieval and analysis based on gene ontology and MeSH%GOPubMed:基于GO和MeSH的信息检索与分析研究

    Institute of Scientific and Technical Information of China (English)

    张士靖; 杜建

    2009-01-01

    GOPubMed是一种基于PubMed的结果可视化和后处理类型的智能搜索引擎.从工作原理、关键技术以及扩展功能3个方面对其性能进行解析.研究显示,GOPubMed利用基于语义网的语义分类工具--GO(Gene Ontology,基因本体)和MeSH,对PubMed检索结果进行分类,帮助用户快速地找出最相关的命中文献,实现语义网与生物医学信息检索的完美结合,并能对检索结果从多角度进行可视化统计分析.%GoPubMed is an intelligent search engine based on PubMed aiming at realizing visualization and post - processing the re-salts. The paper analyzes its performance from three aspects, including working principle, key technique and extended functions. As the result of the research, it reveals that GoPubMed classifies the retrieval results from PubMed by semantic classification tools based on se-mantic web, namely GO and MESH, therefore, it helps users find high related literature very quickly. The combination between semantic web and biomedical information retrieval is realized perfectly, and visualization statistical analysis is gradually unveiled from multiple an-gles.

  12. From "glycosyltransferase" to "congenital muscular dystrophy": integrating knowledge from NCBI Entrez Gene and the Gene Ontology.

    Science.gov (United States)

    Sahoo, Satya S; Zeng, Kelly; Bodenreider, Olivier; Sheth, Amit

    2007-01-01

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.

  13. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  14. Extending gene ontology in the context of extracellular RNA and vesicle communication

    NARCIS (Netherlands)

    Cheung, Kei-Hoi; Keerthikumar, Shivakumar; Roncaglia, Paola; Subramanian, Sai Lakshmi; Roth, Matthew E; Samuel, Monisha; Anand, Sushma; Gangoda, Lahiru; Gould, Stephen; Alexander, Roger; Galas, David; Gerstein, Mark B; Hill, Andrew F; Kitchen, Robert R; Lötvall, Jan; Patel, Tushar; Procaccini, Dena C; Quesenberry, Peter; Rozowsky, Joel; Raffai, Robert L; Shypitsyna, Aleksandra; Su, Andrew I; Théry, Clotilde; Vickers, Kasey; Wauben, Marca H M; Mathivanan, Suresh; Milosavljevic, Aleksandar; Laurent, Louise C

    2016-01-01

    BACKGROUND: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA

  15. Ontology-Based Approach to Social Data Sentiment Analysis: Detection of Adolescent Depression Signals.

    Science.gov (United States)

    Jung, Hyesil; Park, Hyeoun-Ae; Song, Tae-Min

    2017-07-24

    Social networking services (SNSs) contain abundant information about the feelings, thoughts, interests, and patterns of behavior of adolescents that can be obtained by analyzing SNS postings. An ontology that expresses the shared concepts and their relationships in a specific field could be used as a semantic framework for social media data analytics. The aim of this study was to refine an adolescent depression ontology and terminology as a framework for analyzing social media data and to evaluate description logics between classes and the applicability of this ontology to sentiment analysis. The domain and scope of the ontology were defined using competency questions. The concepts constituting the ontology and terminology were collected from clinical practice guidelines, the literature, and social media postings on adolescent depression. Class concepts, their hierarchy, and the relationships among class concepts were defined. An internal structure of the ontology was designed using the entity-attribute-value (EAV) triplet data model, and superclasses of the ontology were aligned with the upper ontology. Description logics between classes were evaluated by mapping concepts extracted from the answers to frequently asked questions (FAQs) onto the ontology concepts derived from description logic queries. The applicability of the ontology was validated by examining the representability of 1358 sentiment phrases using the ontology EAV model and conducting sentiment analyses of social media data using ontology class concepts. We developed an adolescent depression ontology that comprised 443 classes and 60 relationships among the classes; the terminology comprised 1682 synonyms of the 443 classes. In the description logics test, no error in relationships between classes was found, and about 89% (55/62) of the concepts cited in the answers to FAQs mapped onto the ontology class. Regarding applicability, the EAV triplet models of the ontology class represented about 91

  16. Impact of ontology evolution on functional analyses.

    Science.gov (United States)

    Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard

    2012-10-15

    Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.

  17. A relation based measure of semantic similarity for Gene Ontology annotations

    Directory of Open Access Journals (Sweden)

    Gaudin Benoit

    2008-11-01

    Full Text Available Abstract Background Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description. Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. Results We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. Conclusion We derive a measure of semantic

  18. Evaluating the significance of protein functional similarity based on gene ontology.

    Science.gov (United States)

    Konopka, Bogumil M; Golda, Tomasz; Kotulska, Malgorzata

    2014-11-01

    Gene ontology is among the most successful ontologies in the biomedical domain. It is used to describe, unambiguously, protein molecular functions, cellular localizations, and processes in which proteins participate. The hierarchical structure of gene ontology allows quantifying protein functional similarity by application of algorithms that calculate semantic similarities. The scores, however, are meaningless without a given context. Here, we propose how to evaluate the significance of protein function semantic similarity scores by comparing them to reference distributions calculated for randomly chosen proteins. In the study, thresholds for significant functional semantic similarity, in four representative annotation corpuses, were estimated. We also show that the score significance is influenced by the number and specificity of gene ontology terms that are annotated to compared proteins. While proteins with a greater number of terms tend to yield higher similarity scores, proteins with more specific terms produce lower scores. The estimated significance thresholds were validated using protein sequence-function and structure-function relationships. Taking into account the term number and term specificity improves the distinction between significant and insignificant semantic similarity comparisons.

  19. A bibliometric and visual analysis of global geo-ontology research

    Science.gov (United States)

    Li, Lin; Liu, Yu; Zhu, Haihong; Ying, Shen; Luo, Qinyao; Luo, Heng; Kuai, Xi; Xia, Hui; Shen, Hang

    2017-02-01

    In this paper, the results of a bibliometric and visual analysis of geo-ontology research articles collected from the Web of Science (WOS) database between 1999 and 2014 are presented. The numbers of national institutions and published papers are visualized and a global research heat map is drawn, illustrating an overview of global geo-ontology research. In addition, we present a chord diagram of countries and perform a visual cluster analysis of a knowledge co-citation network of references, disclosing potential academic communities and identifying key points, main research areas, and future research trends. The International Journal of Geographical Information Science, Progress in Human Geography, and Computers & Geosciences are the most active journals. The USA makes the largest contributions to geo-ontology research by virtue of its highest numbers of independent and collaborative papers, and its dominance was also confirmed in the country chord diagram. The majority of institutions are in the USA, Western Europe, and Eastern Asia. Wuhan University, University of Munster, and the Chinese Academy of Sciences are notable geo-ontology institutions. Keywords such as "Semantic Web," "GIS," and "space" have attracted a great deal of attention. "Semantic granularity in ontology-driven geographic information systems, "Ontologies in support of activities in geographical space" and "A translation approach to portable ontology specifications" have the highest cited centrality. Geographical space, computer-human interaction, and ontology cognition are the three main research areas of geo-ontology. The semantic mismatch between the producers and users of ontology data as well as error propagation in interdisciplinary and cross-linguistic data reuse needs to be solved. In addition, the development of geo-ontology modeling primitives based on OWL (Web Ontology Language)and finding methods to automatically rework data in Semantic Web are needed. Furthermore, the topological

  20. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  1. Dictionary and Gene Ontology Based Similarity for Named Entity Relationship Protein-protein Interaction Prediction from Biotext Corpus

    Directory of Open Access Journals (Sweden)

    Smt K. Prabavathy

    2014-12-01

    Full Text Available Protein-protein interactions functions as a significant key role in several biological systems. These involves in complex formation and many pathways which are used to perform biological processes. By accurate identification of the set of interacting proteins can get rid of new light on the functional role of various proteins in the complex surroundings of the cell. The ability to construct biologically consequential gene networks and identification of the exact relationship in the gene network is critical for present-day systems biology. In earlier research, the power of presented gene modules to shed light on the functioning of complex biological systems is studied. Most of modules in these networks have shown small link with meaningful biological function, because these methods doesn’t exactly calculate the semantic relationship between the entities. In order to overcome these problems and improve the PPI results in the biotext corpus a new method is proposed in this research. The proposed method which directly incorporates Gene Ontology (GO annotation in construction of gene modules and Dictionary-based text is proposed to extract biotext information. Dictionary-Based Text and Gene Ontology (DBTGO approach that integrates with various gene-gene pairwise similarity values, protein-protein interaction relationship obtained from gene expression, in order to gain better biotext information retrieval result. A result analysis has been carried out on Biotext Project at UC Berkley. Testing the DBTGO algorithm indicates that it is able to improve PPI relationship identification result with all previously suggested methods in terms of the precision, recall, F measure and Normalized Discounted Cumulative Gain (NDCG. The proposed DBTGO algorithm can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  2. Extending TOPS: Ontology-driven Anomaly Detection and Analysis System

    Science.gov (United States)

    Votava, P.; Nemani, R. R.; Michaelis, A.

    2010-12-01

    Terrestrial Observation and Prediction System (TOPS) is a flexible modeling software system that integrates ecosystem models with frequent satellite and surface weather observations to produce ecosystem nowcasts (assessments of current conditions) and forecasts useful in natural resources management, public health and disaster management. We have been extending the Terrestrial Observation and Prediction System (TOPS) to include a capability for automated anomaly detection and analysis of both on-line (streaming) and off-line data. In order to best capture the knowledge about data hierarchies, Earth science models and implied dependencies between anomalies and occurrences of observable events such as urbanization, deforestation, or fires, we have developed an ontology to serve as a knowledge base. We can query the knowledge base and answer questions about dataset compatibilities, similarities and dependencies so that we can, for example, automatically analyze similar datasets in order to verify a given anomaly occurrence in multiple data sources. We are further extending the system to go beyond anomaly detection towards reasoning about possible causes of anomalies that are also encoded in the knowledge base as either learned or implied knowledge. This enables us to scale up the analysis by eliminating a large number of anomalies early on during the processing by either failure to verify them from other sources, or matching them directly with other observable events without having to perform an extensive and time-consuming exploration and analysis. The knowledge is captured using OWL ontology language, where connections are defined in a schema that is later extended by including specific instances of datasets and models. The information is stored using Sesame server and is accessible through both Java API and web services using SeRQL and SPARQL query languages. Inference is provided using OWLIM component integrated with Sesame.

  3. Applying the functional abnormality ontology pattern to anatomical functions

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2010-03-01

    Full Text Available Abstract Background Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions. Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately. Results We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions. Conclusions Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities. Availability The source code and results of our analysis are available at http://bioonto.de.

  4. How to learn about gene function: text-mining or ontologies?

    Science.gov (United States)

    Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I

    2015-03-01

    As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic

  5. Systematically characterizing and prioritizing chemosensitivity related gene based on Gene Ontology and protein interaction network

    Directory of Open Access Journals (Sweden)

    Chen Xin

    2012-10-01

    Full Text Available Abstract Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN. Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable

  6. Non-lexical approaches to identifying associative relations in the gene ontology.

    Science.gov (United States)

    Bodenreider, Olivier; Aubry, Marc; Burgun, Anita

    2005-01-01

    The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

  7. Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

    Science.gov (United States)

    Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

    2016-04-01

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

  8. Aligning ontologies and integrating textual evidence for pathway analysis of microarray data

    Energy Technology Data Exchange (ETDEWEB)

    Gopalan, Banu; Posse, Christian; Sanfilippo, Antonio P.; Stenzel-Poore, Mary; Stevens, S.L.; Castano, Jose; Beagley, Nathaniel; Riensche, Roderick M.; Baddeley, Bob; Simon, R.P.; Pustejovsky, James

    2006-10-08

    Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

  9. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Science.gov (United States)

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction.

  10. Unifying themes in microbial associations with animal and plant hosts described using the gene ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Collmer, Candace W; Gwinn-Giglio, Michelle; Lindeberg, Magdalen; Meng, Shaowu; Chibucos, Marcus C; Tseng, Tsai-Tien; Lomax, Jane; Biehl, Bryan; Ireland, Amelia; Bird, David; Dean, Ralph A; Glasner, Jeremy D; Perna, Nicole; Setubal, Joao C; Collmer, Alan; Tyler, Brett M

    2010-12-01

    Microbes form intimate relationships with hosts (symbioses) that range from mutualism to parasitism. Common microbial mechanisms involved in a successful host association include adhesion, entry of the microbe or its effector proteins into the host cell, mitigation of host defenses, and nutrient acquisition. Genes associated with these microbial mechanisms are known for a broad range of symbioses, revealing both divergent and convergent strategies. Effective comparisons among these symbioses, however, are hampered by inconsistent descriptive terms in the literature for functionally similar genes. Bioinformatic approaches that use homology-based tools are limited to identifying functionally similar genes based on similarities in their sequences. An effective solution to these limitations is provided by the Gene Ontology (GO), which provides a standardized language to describe gene products from all organisms. The GO comprises three ontologies that enable one to describe the molecular function(s) of gene products, the biological processes to which they contribute, and their cellular locations. Beginning in 2004, the Plant-Associated Microbe Gene Ontology (PAMGO) interest group collaborated with the GO consortium to extend the GO to accommodate terms for describing gene products associated with microbe-host interactions. Currently, over 900 terms that describe biological processes common to diverse plant- and animal-associated microbes are incorporated into the GO database. Here we review some unifying themes common to diverse host-microbe associations and illustrate how the new GO terms facilitate a standardized description of the gene products involved. We also highlight areas where new terms need to be developed, an ongoing process that should involve the whole community.

  11. Development and Evaluation of an Obesity Ontology for Social Big Data Analysis.

    Science.gov (United States)

    Kim, Ae Ran; Park, Hyeoun-Ae; Song, Tae-Min

    2017-07-01

    The aim of this study was to develop and evaluate an obesity ontology as a framework for collecting and analyzing unstructured obesity-related social media posts. The obesity ontology was developed according to the 'Ontology Development 101'. The coverage rate of the developed ontology was examined by mapping concepts and terms of the ontology with concepts and terms extracted from obesity-related Twitter postings. The structure and representative ability of the ontology was evaluated by nurse experts. We applied the ontology to the density analysis of keywords related to obesity types and management strategies and to the sentiment analysis of obesity and diet using social big data. The developed obesity ontology was represented by 8 superclasses and 124 subordinate classes. The superclasses comprised 'risk factors,' 'types,' 'symptoms,' 'complications,' 'assessment,' 'diagnosis,' 'management strategies,' and 'settings.' The coverage rate of the ontology was 100% for the concepts and 87.8% for the terms. The evaluation scores for representative ability were higher than 4.0 out of 5.0 for all of the evaluation items. The density analysis of keywords revealed that the top-two posted types of obesity were abdomen and thigh, and the top-three posted management strategies were diet, exercise, and dietary supplements or drug therapy. Positive expressions of obesity-related postings has increased annually in the sentiment analysis. It was found that the developed obesity ontology was useful to identify the most frequently used terms on obesity and opinions and emotions toward obesity posted by the geneal population on social media.

  12. Using Ontologies for Enterprise Architecture Integration and Analysis

    Directory of Open Access Journals (Sweden)

    Gonçalo Antunes

    2014-03-01

    Full Text Available Enterprise architecture facilitates the alignment between different domains, such as business, applications and information technology. These domains must be described with description languages that best address the concerns of its stakeholders. However, current model-based enterprise architecture techniques are unable to integrate multiple descriptions languages either due to the lack of suitable extension mechanisms or because they lack the means to maintain the coherence, consistency and traceability between the representations of the multiple domains of the enterprise. On the other hand, enterprise architecture models are often designed and used for communication and not for automated analysis of its contents. Model analysis is a valuable tool for assessing the qualities of a model, such as conformance and completeness, and also for supporting decision making. This paper addresses these two issues found in model-based enterprise architecture: (1 the integration of domain description languages, and (2 the automated analysis of models. This proposal uses ontology engineering techniques to specify and integrate the different domains and reasoning and querying as a means to analyse the models. The utility of the proposal is shown through an evaluation scenario that involve the analysis of an enterprise architecture model that spans multiple domains.

  13. The Orthology Ontology: development and applications.

    Science.gov (United States)

    Fernández-Breis, Jesualdo Tomás; Chiba, Hirokazu; Legaz-García, María Del Carmen; Uchiyama, Ikuo

    2016-06-04

    Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.

  14. Combining sequence and Gene Ontology for protein module detection in the Weighted Network.

    Science.gov (United States)

    Yu, Yang; Liu, Jie; Feng, Nuan; Song, Bo; Zheng, Zeyu

    2017-01-07

    Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Ontology-Driven Co-clustering of Gene Expression Data

    Science.gov (United States)

    Cordero, Francesca; Pensa, Ruggero G.; Visconti, Alessia; Ienco, Dino; Botta, Marco

    The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

  16. Genetic resources for advanced biofuel production described with the Gene Ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Purwantini, Endang; Lomax, Jane; Setubal, João C; Mukhopadhyay, Biswarup; Tyler, Brett M

    2014-01-01

    Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology () project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat), can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  17. Genetic Resources for Advanced Biofuel Production Described with the Gene Ontology

    Directory of Open Access Journals (Sweden)

    Trudy eTorto-Alalibo

    2014-10-01

    Full Text Available Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial Energy Gene Ontology (MENGO: http://www.mengo.biochem.vt.edu project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat, can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  18. Supporting the analysis of ontology evolution processes through the combination of static and dynamic scaling functions in OQuaRE.

    Science.gov (United States)

    Duque-Ramos, Astrid; Quesada-Martínez, Manuel; Iniesta-Moreno, Miguela; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2016-10-17

    The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task and biomedical ontologies evolve rapidly, so new versions are regularly and frequently published in ontology repositories. This has the implication of there being a high number of ontology versions over a short time span. Given this level of activity, ontology designers need to be supported in the effective management of the evolution of biomedical ontologies as the different changes may affect the engineering and quality of the ontology. This is why there is a need for methods that contribute to the analysis of the effects of changes and evolution of ontologies. In this paper we approach this issue from the ontology quality perspective. In previous work we have developed an ontology evaluation framework based on quantitative metrics, called OQuaRE. Here, OQuaRE is used as a core component in a method that enables the analysis of the different versions of biomedical ontologies using the quality dimensions included in OQuaRE. Moreover, we describe and use two scales for evaluating the changes between the versions of a given ontology. The first one is the static scale used in OQuaRE and the second one is a new, dynamic scale, based on the observed values of the quality metrics of a corpus defined by all the versions of a given ontology (life-cycle). In this work we explain how OQuaRE can be adapted for understanding the evolution of ontologies. Its use has been illustrated with the ontology of bioinformatics operations, types of data, formats, and topics (EDAM). The two scales included in OQuaRE provide complementary information about the evolution of the ontologies. The application of the static scale, which is the original OQuaRE scale, to the versions of the EDAM ontology reveals a design based on good ontological engineering principles. The application of the dynamic scale has enabled a more detailed analysis of the evolution of

  19. Ontology design patterns to disambiguate relations between genes and gene products in GENIA.

    Science.gov (United States)

    Hoehndorf, Robert; Ngonga Ngomo, Axel-Cyrille; Pyysalo, Sampo; Ohta, Tomoko; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2011-10-06

    Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  20. Ontology design patterns to disambiguate relations between genes and gene products in GENIA

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2011-10-01

    Full Text Available Abstract Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  1. Interaction between leptin and leptin receptor in gastric carcinoma: Gene ontology analysis Interacción entre la leptina y su receptor en el carcinoma gástrico: análisis de ontología genética

    Directory of Open Access Journals (Sweden)

    V. Wiwanitkit

    2007-04-01

    Full Text Available Gastric carcinoma is a rare but important malignancy. The link between leptin, a cytokine that is elevated in obese individuals, and cancer development has been proposed. It is noted that leptin and its receptor may play a positive role in the progression in gastric cancer. However, the exact mechanism resulting form the interaction between leptin and leptin receptor has never been clarified. Here, the author used a new gene ontology technology to predict the molecular function and biological process due to the interaction between leptin and leptin receptor. Comparing to leptin and leptin receptor, the leptin-leptin receptor poses the same function and biological process as leptin receptor. This can confirm that leptin receptor has a significant suppressive effect on the expression of leptin. Loss of hormone activity and disturbance of normal cell signaling pathway of leptin can be seen. Blocking of receptor might be rational therapeutic strategy.El carcinoma gástrico es un cáncer muy poco frecuente pero importante. Se ha postulado que la leptina, una citocina que aparece elevada en las personas obesas, está relacionada con el cáncer. Se sabe que la leptina y su receptor pueden desempeñar un papel positivo en la progresión del cáncer gástrico. Sin embargo, nunca se ha dilucidado el mecanismo exacto al que daría lugar la interacción entre la leptina y el receptor de leptina. Aquí, el autor empleó una nueva tecnología de ontología genética para predecir la función molecular y el proceso biológico resultantes de la interacción entre la leptina y su receptor. Frente a la leptina y su receptor, el compuesto leptina-receptor realiza la misma función y el mismo proceso biológico que el receptor de leptina. Esto puede confirmar que el receptor de leptina ejerce un importante efecto supresor sobre la expresión de leptina. Pueden observarse una pérdida de actividad hormonal y la alteración de la vía normal de señalización celular

  2. Linking experimental results, biological networks and sequence analysis methods using Ontologies and Generalised Data Structures.

    Science.gov (United States)

    Koehler, Jacob; Rawlings, Chris; Verrier, Paul; Mitchell, Rowan; Skusa, Andre; Ruegg, Alexander; Philippi, Stephan

    2005-01-01

    The structure of a closely integrated data warehouse is described that is designed to link different types and varying numbers of biological networks, sequence analysis methods and experimental results such as those coming from microarrays. The data schema is inspired by a combination of graph based methods and generalised data structures and makes use of ontologies and meta-data. The core idea is to consider and store biological networks as graphs, and to use generalised data structures (GDS) for the storage of further relevant information. This is possible because many biological networks can be stored as graphs: protein interactions, signal transduction networks, metabolic pathways, gene regulatory networks etc. Nodes in biological graphs represent entities such as promoters, proteins, genes and transcripts whereas the edges of such graphs specify how the nodes are related. The semantics of the nodes and edges are defined using ontologies of node and relation types. Besides generic attributes that most biological entities possess (name, attribute description), further information is stored using generalised data structures. By directly linking to underlying sequences (exons, introns, promoters, amino acid sequences) in a systematic way, close interoperability to sequence analysis methods can be achieved. This approach allows us to store, query and update a wide variety of biological information in a way that is semantically compact without requiring changes at the database schema level when new kinds of biological information is added. We describe how this datawarehouse is being implemented by extending the text-mining framework ONDEX to link, support and complement different bioinformatics applications and research activities such as microarray analysis, sequence analysis and modelling/simulation of biological systems. The system is developed under the GPL license and can be downloaded from http://sourceforge.net/projects/ondex/

  3. Gene Ontology based housekeeping gene selection for RNA-seq normalization.

    Science.gov (United States)

    Chen, Chien-Ming; Lu, Yu-Lun; Sio, Chi-Pong; Wu, Guan-Chung; Tzou, Wen-Shyong; Pai, Tun-Wen

    2014-06-01

    RNA-seq analysis provides a powerful tool for revealing relationships between gene expression level and biological function of proteins. In order to identify differentially expressed genes among various RNA-seq datasets obtained from different experimental designs, an appropriate normalization method for calibrating multiple experimental datasets is the first challenging problem. We propose a novel method to facilitate biologists in selecting a set of suitable housekeeping genes for inter-sample normalization. The approach is achieved by adopting user defined experimentally related keywords, GO annotations, GO term distance matrices, orthologous housekeeping gene candidates, and stability ranking of housekeeping genes. By identifying the most distanced GO terms from query keywords and selecting housekeeping gene candidates with low coefficients of variation among different spatio-temporal datasets, the proposed method can automatically enumerate a set of functionally irrelevant housekeeping genes for pratical normalization. Novel and benchmark testing RNA-seq datasets were applied to demostrate that different selections of housekeeping gene lead to strong impact on differential gene expression analysis, and compared results have shown that our proposed method outperformed other traditional approaches in terms of both sensitivity and specificity. The proposed mechanism of selecting appropriate houskeeping genes for inter-dataset normalization is robust and accurate for differential expression analyses. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology.

    Science.gov (United States)

    Deng, Yue; Gao, Lin; Wang, Bingbo; Guo, Xingli

    2015-01-01

    Phenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO. HPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA). HPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).

  5. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology.

    Science.gov (United States)

    Clemente, José C; Satou, Kenji; Valiente, Gabriel

    2005-01-01

    There has been much interest in the structural comparison and alignment of metabolic pathways. Several techniques have been conceived to assess the similarity of metabolic pathways of different organisms. In this paper, we show that the combination of a new heuristic algorithm for the comparison of metabolic pathways together with any of three enzyme similarity measures (hierarchical, information content, and gene ontology) can be used to derive a metabolic pathway similarity measure that is suitable for reconstructing phylogenetic relationships from metabolic pathways. Experimental results on the Glycolysis pathway of 73 organisms representing the three domains of life show that our method outperforms previous techniques.

  6. A measure of semantic similarity between gene ontology terms based on semantic pathway covering

    Institute of Scientific and Technical Information of China (English)

    LI Rong; CAO Shunliang; LI Yuanyuan; TAN Hao; ZHU Yangyong; ZHONG Yang; LI Yixue

    2006-01-01

    Semantic similarity between Gene Ontology (GO) terms is critical in resolving semantic heterogeneousness when integrating heterogeneous biological databases. Traditionally, distance based and information content based measures are two major methods.In this paper, a new method based on semantic pathway covering is proposed and an algorithm, COMBINE algorithm, is presented,which considers information contents of two given nodes and those of all nodes included in the two nodes' pathways. Experiments show that COMBINE algorithm obtains the highest correlation index compared with those distance based and information content based algorithms.

  7. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology.

    Science.gov (United States)

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2015-09-01

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible online ( http://www.iitr.ac.in/ajayshiv/ ) through a user-friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multicomponent complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  8. Integration of the Gene Ontology into an object-oriented architecture

    Directory of Open Access Journals (Sweden)

    Zheng W Jim

    2005-05-01

    Full Text Available Abstract Background To standardize gene product descriptions, a formal vocabulary defined as the Gene Ontology (GO has been developed. GO terms have been categorized into biological processes, molecular functions, and cellular components. However, there is no single representation that integrates all the terms into one cohesive model. Furthermore, GO definitions have little information explaining the underlying architecture that forms these terms, such as the dynamic and static events occurring in a process. In contrast, object-oriented models have been developed to show dynamic and static events. A portion of the TGF-beta signaling pathway, which is involved in numerous cellular events including cancer, differentiation and development, was used to demonstrate the feasibility of integrating the Gene Ontology into an object-oriented model. Results Using object-oriented models we have captured the static and dynamic events that occur during a representative GO process, "transforming growth factor-beta (TGF-beta receptor complex assembly" (GO:0007181. Conclusion We demonstrate that the utility of GO terms can be enhanced by object-oriented technology, and that the GO terms can be integrated into an object-oriented model by serving as a basis for the generation of object functions and attributes.

  9. Ontology or formal ontology

    Science.gov (United States)

    Žáček, Martin

    2017-07-01

    Ontology or formal ontology? Which word is correct? The aim of this article is to introduce correct terms and explain their basis. Ontology describes a particular area of interest (domain) in a formal way - defines the classes of objects that are in that area, and relationships that may exist between them. Meaning of ontology consists mainly in facilitating communication between people, improve collaboration of software systems and in the improvement of systems engineering. Ontology in all these areas offer the possibility of unification of view, maintaining consistency and unambiguity.

  10. Ontology-based representation and analysis of host-Brucella interactions.

    Science.gov (United States)

    Lin, Yu; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this study, IDOBRU is used as a platform to model and analyze how the hosts, especially host macrophages, interact with virulent Brucella strains or live attenuated Brucella vaccine strains. Such a study allows us to better integrate and understand intricate Brucella pathogenesis and host immunity mechanisms. Different levels of host-Brucella interactions based on different host cell types and Brucella strains were first defined ontologically. Three important processes of virulent Brucella interacting with host macrophages were represented: Brucella entry into macrophage, intracellular trafficking, and intracellular replication. Two Brucella pathogenesis mechanisms were ontologically represented: Brucella Type IV secretion system that supports intracellular trafficking and replication, and Brucella erythritol metabolism that participates in Brucella intracellular survival and pathogenesis. The host cell death pathway is critical to the outcome of host-Brucella interactions. For better survival and replication, virulent Brucella prevents macrophage cell death. However, live attenuated B. abortus vaccine strain RB51 induces caspase-2-mediated proinflammatory cell death. Brucella-associated cell death processes are represented in IDOBRU. The gene and protein information of 432 manually annotated Brucella virulence factors were represented using the Ontology of Genes and Genomes (OGG) and Protein Ontology (PRO), respectively. Seven inference rules were defined to capture the knowledge of host

  11. Ontology-based time information representation of vaccine adverse events in VAERS for temporal analysis

    Directory of Open Access Journals (Sweden)

    Tao Cui

    2012-12-01

    Full Text Available Abstract Background The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS provides a valuable data source for post-vaccination adverse event analyses. The structured data in the system has been widely used, but the information in the write-up narratives is rarely included in these kinds of analyses. In fact, the unstructured nature of the narratives makes the data embedded in them difficult to be used for any further studies. Results We developed an ontology-based approach to represent the data in the narratives in a “machine-understandable” way, so that it can be easily queried and further analyzed. Our focus is the time aspect in the data for time trending analysis. The Time Event Ontology (TEO, Ontology of Adverse Events (OAE, and Vaccine Ontology (VO are leveraged for the semantic representation of this purpose. A VAERS case report is presented as a use case for the ontological representations. The advantages of using our ontology-based Semantic web representation and data analysis are emphasized. Conclusions We believe that representing both the structured data and the data from write-up narratives in an integrated, unified, and “machine-understandable” way can improve research for vaccine safety analyses, causality assessments, and retrospective studies.

  12. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research.

    Science.gov (United States)

    Primmer, C R; Papakostas, S; Leder, E H; Davis, M J; Ragan, M A

    2013-06-01

    Recent advances in molecular technologies have opened up unprecedented opportunities for molecular ecologists to better understand the molecular basis of traits of ecological and evolutionary importance in almost any organism. Nevertheless, reliable and systematic inference of functionally relevant information from these masses of data remains challenging. The aim of this review is to highlight how the Gene Ontology (GO) database can be of use in resolving this challenge. The GO provides a largely species-neutral source of information on the molecular function, biological role and cellular location of tens of thousands of gene products. As it is designed to be species-neutral, the GO is well suited for cross-species use, meaning that, functional annotation derived from model organisms can be transferred to inferred orthologues in newly sequenced species. In other words, the GO can provide gene annotation information for species with nonannotated genomes. In this review, we describe the GO database, how functional information is linked with genes/gene products in model organisms, and how molecular ecologists can utilize this information to annotate their own data. Then, we outline various applications of GO for enhancing the understanding of molecular basis of traits in ecologically relevant species. We also highlight potential pitfalls, provide step-by-step recommendations for conducting a sound study in nonmodel organisms, suggest avenues for future research and outline a strategy for maximizing the benefits of a more ecological and evolutionary genomics-oriented ontology by ensuring its compatibility with the GO. © 2013 John Wiley & Sons Ltd.

  13. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis.

    Science.gov (United States)

    Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun

    2016-09-14

    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology

  14. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  15. China’s National Health Policies: An Ontological Analysis

    Science.gov (United States)

    Dai, Guobin; Deng, Fang; Ramaprasad, Arkalgud; Syn, Thant

    2016-01-01

    The health care system in China is facing a multitude of challenges owing to the changing demographics of the country, the evolving economics of health care, and the emerging epidemiology of health as well as diseases. China’s many national health care policies are documented in Chinese text documents. It is necessary to map the policies synoptically, systemically, and systematically to discover their emphases and biases, assess them, and modify them in the future. Using a logically constructed ontology of health care policies based on the common bodies of knowledge as a lens, we map the current policies to reveal their ‘bright’, ‘light’, and ‘blind/blank’ spots. The ontological map will help (a) develop a roadmap for future health care policies in China, and (b) compare and contrast China’s health care policies with other countries’. PMID:28210417

  16. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Science.gov (United States)

    2011-01-01

    Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches. PMID:21914205

  17. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Directory of Open Access Journals (Sweden)

    Kirsten Toralf

    2011-09-01

    Full Text Available Abstract Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.

  18. Pre-incident Analysis using Multigraphs and Faceted Ontologies

    Science.gov (United States)

    2013-08-01

    device (IND), or can be used as a dispersal mechanism for a radiological dispersal device (RDD) attack. Hence, the common convention of dealing with...ontology for beverages, part of which is shown in the form of an entity- relationship (ER) graph in Figure 4. The entities Beer, Wine , etc. have is a...links to the entity Alcoholic Drinks. The entities Grapes and Grains have is a links to Plants. There is also a made from link from Wine to Grapes and

  19. Identification of genes involved in radioresistance of nasopharyngeal carcinoma by integrating gene ontology and protein-protein interaction networks.

    Science.gov (United States)

    Guo, Ya; Zhu, Xiao-Dong; Qu, Song; Li, Ling; Su, Fang; Li, Ye; Huang, Shi-Ting; Li, Dan-Rong

    2012-01-01

    Radioresistance remains one of the important factors in relapse and metastasis of nasopharyngeal carcinoma. Thus, it is imperative to identify genes involved in radioresistance and explore the underlying biological processes in the development of radioresistance. In this study, we used cDNA microarrays to select differential genes between radioresistant CNE-2R and parental CNE-2 cell lines. One hundred and eighty-three significantly differentially expressed genes (pgenes were upregulated and 45 genes were downregulated in CNE-2R. We further employed publicly available bioinformatics related software, such as GOEAST and STRING to examine the relationship among differentially expressed genes. The results show that these genes were involved in type I interferon-mediated signaling pathway biological processes; the nodes tended to have high connectivity with the EGFR pathway, IFN-related pathways, NF-κB. The node STAT1 has high connectivity with other nodes in the protein-protein interaction (PPI) networks. Finally, the reliability of microarray data was validated for selected genes by semi-quantitative RT-PCR and Western blotting. The results were consistent with the microarray data. Our study suggests that microarrays combined with gene ontology and protein interaction networks have great value in the identification of genes of radioresistance in nasopharyngeal carcinoma; genes involved in several biological processes and protein interaction networks may be relevant to NPC radioresistance; in particular, the verified genes CCL5, STAT1-α, STAT2 and GSTP1 may become potential biomarkers for predicting NPC response to radiotherapy.

  20. Aplicación de visualización de una ontología para el dominio del análisis del semen humano Application to visualize an ontology for the human semen analysis domain

    Directory of Open Access Journals (Sweden)

    Roberto Casañas

    2007-06-01

    Full Text Available En este trabajo se presenta el diseño e implementación de una ontología para el dominio del análisis del semen humano, cuyo objetivo es representar, organizar, formalizar y estandarizar el conocimiento del dominio, para que éste pueda ser compartido y reutilizado por distintos grupos de personas y aplicaciones de software. Para visualizar la ontología se desarrolló una aplicación basada en una arquitectura cliente/servidor para ambientes Web, la cual está constituida por un módulo de Administración y otro de Acceso Público. A través del primero se mantiene el sitio Web de la ontología, mientras que el segundo permite a los usuarios acceder al conocimiento almacenado y a un conjunto de recursos tales como imágenes, videos, artículos relativos al dominio, manuales y protocolos de laboratorio. La arquitectura propuesta facilita la observación y recuperación de las complejas estructuras de conocimiento, así como la navegación y administración de la información representada en la ontología. El enfoque utilizado en el diseño de los mecanismos de recuperación de información está dirigido tanto a usuarios poco familiarizados con el vocabulario del dominio, como a aquellos que ya lo conocen. Esta funcionalidad es de especial interés dado lo heterogénea que resulta la audiencia a la que está dirigida la ontología, como son profesionales y estudiantes de las ciencias de la salud, entre otros. La metodología Methontology fue seleccionada para desarrollar la ontología y se utilizó el editor Protégé para su implementación.The following work presents the design and implementation of an ontology for human semen analysis whose objective is to present, organize, formalize and standardize the domain knowledge, in order to be shared and reused by different groups of people and software applications. To visualize this ontology, a Web application based on a client/server architecture was developed, which is constituted by an

  1. Genetic resources for methane production from biomass described with gene ontology

    Directory of Open Access Journals (Sweden)

    Endang ePurwantini

    2014-12-01

    Full Text Available Methane (CH4 is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH4 into the atmosphere contributes to climate change. Biological CH4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H2 plus CO2, acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO2. In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H2 plus CO2. In this report we present Gene Ontology (GO terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO project. A recently discovered non-canonical CH4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing gold standards for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/.

  2. Contributions to an animal trait ontology.

    Science.gov (United States)

    Hulsegge, B; Smits, M A; te Pas, M F W; Woelders, H

    2012-06-01

    Improved understanding of the biology of traits of livestock species necessitates the use and combination of information that is stored in a variety of different sources such as databases and literature. The ability to effectively combine information from different sources, however, depends on a high level of standardization within and between various resources, at least with respect to the used terminology. Ontologies represent a set of concepts that facilitate standardization of terminology within specific domains of interest. The biological mechanisms underlying quantitative traits of farm animal species related to reproduction and host pathogen interactions are complex and not well understood. This knowledge could be improved through the availability of domain-specific ontologies that provide enhanced possibilities for data annotation, data retrieval, data integration, data exchange, data analysis, and ontology-based searches. Here we describe a framework for domain-specific ontologies and the development of 2 first-generation ontologies: Reproductive Trait and Phenotype Ontology (REPO) and Host Pathogen Interactions Ontology . In these first-generation ontologies, we focused on "female fertility in cattle" and "interactions between pigs and Salmonella". Through this, we contribute to the global initiative toward the development of an Animal Trait Ontology for livestock species. To demonstrate its usefulness, we show how REPO can be used to select candidate genes for fertility.

  3. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Science.gov (United States)

    Chen, Xiaoshu; Zhang, Jianzhi

    2012-01-01

    The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and

  4. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  5. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaoshu Chen

    Full Text Available The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species paralogs than (between-species orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny

  6. Assessing identity, redundancy and confounds in Gene Ontology annotations over time.

    Science.gov (United States)

    Gillis, Jesse; Pavlidis, Paul

    2013-02-15

    The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. Data available at http://chibi.ubc.ca/assessGO.

  7. DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Directory of Open Access Journals (Sweden)

    Wu Cathy H

    2005-08-01

    Full Text Available Abstract Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations. Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms. Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO. DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete

  8. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.

  9. Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

    Science.gov (United States)

    Mazandu, Gaston K.; Mulder, Nicola J.

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG. PMID:24078912

  10. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

    Directory of Open Access Journals (Sweden)

    Jain Shobhit

    2010-11-01

    Full Text Available Abstract Background Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs. They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO. Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. Results We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS, to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. Conclusions The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.

  11. Grouping miRNAs of similar functions via weighted information content of gene ontology.

    Science.gov (United States)

    Lan, Chaowang; Chen, Qingfeng; Li, Jinyan

    2016-12-22

    Regulation mechanisms between miRNAs and genes are complicated. To accomplish a biological function, a miRNA may regulate multiple target genes, and similarly a target gene may be regulated by multiple miRNAs. Wet-lab knowledge of co-regulating miRNAs is limited. This work introduces a computational method to group miRNAs of similar functions to identify co-regulating miRNAsfrom a similarity matrix of miRNAs. We define a novel information content of gene ontology (GO) to measure similarity between two sets of GO graphs corresponding to the two sets of target genes of two miRNAs. This between-graph similarity is then transferred as a functional similarity between the two miRNAs. Our definition of the information content is based on the size of a GO term's descendants, but adjusted by a weight derived from its depth level and the GO relationships at its path to the root node or to the most informative common ancestor (MICA). Further, a self-tuning technique and the eigenvalues of the normalized Laplacian matrix are applied to determine the optimal parameters for the spectral clustering of the similarity matrix of the miRNAs. Experimental results demonstrate that our method has better clustering performance than the existing edge-based, node-based or hybrid methods. Our method has also demonstrated a novel usefulness for the function annotation of new miRNAs, as reported in the detailed case studies.

  12. CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

    Science.gov (United States)

    Park, Julie; Costanzo, Maria C; Balakrishnan, Rama; Cherry, J Michael; Hong, Eurie L

    2012-01-01

    The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

  13. Air Pollution Analysis using Ontologies and Regression Models

    Directory of Open Access Journals (Sweden)

    Parul Choudhary

    2016-07-01

    Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.

  14. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  15. An Analysis of the Ontological Causal Relation in Physics and Its Educational Implications

    Science.gov (United States)

    Cheong, Yong Wook

    2016-08-01

    An ontological causal relation is a quantified relation between certain interactions and changes in corresponding properties. Key ideas in physics, such as Newton's second law and the first law of thermodynamics, are representative examples of these relations. In connection with the teaching and learning of these relations, this study investigated three issues: the appropriate view concerning ontological category, the role and status of ontological causal relations, and university students' understanding of the role and status of these relations. Concerning the issue of proper ontology, this study suggests an alternative view that distinguishes between interaction and property at the macroscopic level, in contrast to Chi and colleagues' influential view. Concerning the role and status of the relations, we conclude that fundamental ontological causal relations should be regarded as knowledge at the core of relevant physics theories. However, upon analysis of participants' responses, this study finds that university students' views on the status of the heat capacity relation and Newton's second law are quite different. Several possible educational implications of these results are discussed.

  16. An Earthquake Source Ontology for Seismic Hazard Analysis and Ground Motion Simulation

    Science.gov (United States)

    Zechar, J. D.; Jordan, T. H.; Gil, Y.; Ratnakar, V.

    2005-12-01

    Representation of the earthquake source is an important element in seismic hazard analysis and earthquake simulations. Source models span a range of conceptual complexity - from simple time-independent point sources to extended fault slip distributions. Further computational complexity arises because the seismological community has established so many source description formats and variations thereof; what this means is that conceptually equivalent source models are often expressed in different ways. Despite the resultant practical difficulties, there exists a rich semantic vocabulary for working with earthquake sources. For these reasons, we feel it is appropriate to create a semantic model of earthquake sources using an ontology, a computer science tool from the field of knowledge representation. Unlike the domain of most ontology work to date, earthquake sources can be described by a very precise mathematical framework. Another uniqueness associated with developing such an ontology is that earthquake sources are often used as computational objects. A seismologist generally wants more than to simply construct a source and have it be well-formed and properly described; additionally, the source will be used for performing calculations. Representation and manipulation of complex mathematical objects presents a challenge to the ontology development community. In order to enable simulations involving many different types of source models, we have completed preliminary development of a seismic point source ontology. The use of an ontology to represent knowledge provides machine interpretability and the ability to validate logical consistency and completeness. Our ontology, encoded using the OWL Web Ontology Language - a standard from the World Wide Web Consortium, contains the conceptual definitions and relationships necessary for source translation services. For example, specification of strike, dip, rake, and seismic moment will automatically translate into a double

  17. Ontologies in biological data visualization.

    Science.gov (United States)

    Carpendale, Sheelagh; Chen, Min; Evanko, Daniel; Gehlenborg, Nils; Gorg, Carsten; Hunter, Larry; Rowland, Francis; Storey, Margaret-Anne; Strobelt, Hendrik

    2014-01-01

    In computer science, an ontology is essentially a graph-based knowledge representation in which each node corresponds to a concept and each edge specifies a relation between two concepts. Ontological development in biology can serve as a focus to discuss the challenges and possible research directions for ontologies in visualization. The principle challenges are the dynamic and evolving nature of ontologies, the ever-present issue of scale, the diversity and richness of the relationships in ontologies, and the need to better understand the relationship between ontologies and the data analysis tasks scientists wish to support. Research directions include visualizing ontologies; visualizing semantically or ontologically annotated texts, documents, and corpora; automated generation of visualizations using ontologies; and visualizing ontological context to support search. Although this discussion uses issues of ontologies in biological data visualization as a springboard, these topics are of general relevance to visualization.

  18. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

    Science.gov (United States)

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2016-08-31

    The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .

  19. Ontological Analysis of Integrated Process Models: testing hypotheses

    Directory of Open Access Journals (Sweden)

    Michael Rosemann

    2001-11-01

    Full Text Available Integrated process modeling is achieving prominence in helping to document and manage business administration and IT processes in organizations. The ARIS framework is a popular example for a framework of integrated process modeling not least because it underlies the 800 or more reference models embedded in the world's most popular ERP package, SAP R/3. This paper demonstrates the usefulness of the Bunge-Wand-Weber (BWW representation model for evaluating modeling grammars such as those constituting ARIS. It reports some initial insights gained from pilot testing Green and Rosemann's (2000 evaluative propositions. Even when considering all five views of ARIS, modelers have problems representing business rules, the scope and boundary of systems, and decomposing models. However, even though it is completely ontologically redundant, users still find the function view useful in modeling.

  20. Insight: An ontology-based integrated database and analysis platform for epilepsy self-management research.

    Science.gov (United States)

    Sahoo, Satya S; Ramesh, Priya; Welter, Elisabeth; Bukach, Ashley; Valdez, Joshua; Tatsuoka, Curtis; Bamps, Yvan; Stoll, Shelley; Jobst, Barbara C; Sajatovic, Martha

    2016-10-01

    We present Insight as an integrated database and analysis platform for epilepsy self-management research as part of the national Managing Epilepsy Well Network. Insight is the only available informatics platform for accessing and analyzing integrated data from multiple epilepsy self-management research studies with several new data management features and user-friendly functionalities. The features of Insight include, (1) use of Common Data Elements defined by members of the research community and an epilepsy domain ontology for data integration and querying, (2) visualization tools to support real time exploration of data distribution across research studies, and (3) an interactive visual query interface for provenance-enabled research cohort identification. The Insight platform contains data from five completed epilepsy self-management research studies covering various categories of data, including depression, quality of life, seizure frequency, and socioeconomic information. The data represents over 400 participants with 7552 data points. The Insight data exploration and cohort identification query interface has been developed using Ruby on Rails Web technology and open source Web Ontology Language Application Programming Interface to support ontology-based reasoning. We have developed an efficient ontology management module that automatically updates the ontology mappings each time a new version of the Epilepsy and Seizure Ontology is released. The Insight platform features a Role-based Access Control module to authenticate and effectively manage user access to different research studies. User access to Insight is managed by the Managing Epilepsy Well Network database steering committee consisting of representatives of all current collaborating centers of the Managing Epilepsy Well Network. New research studies are being continuously added to the Insight database and the size as well as the unique coverage of the dataset allows investigators to conduct

  1. Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations.

    Science.gov (United States)

    Liu, Gangbiao; Zou, Yangyun; Cheng, Qiqun; Zeng, Yanwu; Gu, Xun; Su, Zhixi

    2014-04-01

    The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional

  2. TopoICSim: a new semantic similarity measure based on gene ontology.

    Science.gov (United States)

    Ehsani, Rezvan; Drabløs, Finn

    2016-07-29

    The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .

  3. Handling multiple testing while interpreting microarrays with the Gene Ontology Database

    Directory of Open Access Journals (Sweden)

    Zhao Hongyu

    2004-09-01

    Full Text Available Abstract Background The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally. Results We present the results from a preliminary study of the distribution one would expect for analyzing sets of genes extracted from Drosophila, S. cerevisiae, Wormbase, and Gramene databases using the Gene Ontology Database. Conclusions We found that the estimated distribution is not regular and is not predictable outside of a particular set of genes. Permutation-based simulations may be necessary to determine the confidence in results of such analyses.

  4. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  5. A Framework for Geographic Object-Based Image Analysis (GEOBIA) based on geographic ontology

    Science.gov (United States)

    Gu, H. Y.; Li, H. T.; Yan, L.; Lu, X. J.

    2015-06-01

    GEOBIA (Geographic Object-Based Image Analysis) is not only a hot topic of current remote sensing and geographical research. It is believed to be a paradigm in remote sensing and GIScience. The lack of a systematic approach designed to conceptualize and formalize the class definitions makes GEOBIA a highly subjective and difficult method to reproduce. This paper aims to put forward a framework for GEOBIA based on geographic ontology theory, which could implement "Geographic entities - Image objects - Geographic objects" true reappearance. It consists of three steps, first, geographical entities are described by geographic ontology, second, semantic network model is built based on OWL(ontology web language), at last, geographical objects are classified with decision rule or other classifiers. A case study of farmland ontology was conducted for describing the framework. The strength of this framework is that it provides interpretation strategies and global framework for GEOBIA with the property of objective, overall, universal, universality, etc., which avoids inconsistencies caused by different experts' experience and provides an objective model for mage analysis.

  6. Toxicology ontology perspectives.

    Science.gov (United States)

    Hardy, Barry; Apic, Gordana; Carthew, Philip; Clark, Dominic; Cook, David; Dix, Ian; Escher, Sylvia; Hastings, Janna; Heard, David J; Jeliazkova, Nina; Judson, Philip; Matis-Mitchell, Sherri; Mitic, Dragana; Myatt, Glenn; Shah, Imran; Spjuth, Ola; Tcheremenskaia, Olga; Toldo, Luca; Watson, David; White, Andrew; Yang, Chihae

    2012-01-01

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  7. Kuhn's Ontological Relativism.

    Science.gov (United States)

    Sankey, Howard

    2000-01-01

    Discusses Kuhn's model of scientific theory change. Documents Kuhn's move away from conceptual relativism and rational relativism. Provides an analysis of his present ontological form of relativism. (CCM)

  8. The Proteasix Ontology.

    Science.gov (United States)

    Arguello Casteleiro, Mercedes; Klein, Julie; Stevens, Robert

    2016-06-04

    The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.

  9. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

    Science.gov (United States)

    Zhang, Shu-Bo; Lai, Jian-Huang

    2016-07-15

    Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Ontology-based meta-analysis of global collections of high-throughput public data.

    Directory of Open Access Journals (Sweden)

    Ilya Kupershmidt

    Full Text Available BACKGROUND: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. METHODOLOGY/RESULTS: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. CONCLUSIONS: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.

  11. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    Science.gov (United States)

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  12. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  13. PPISEARCHENGINE: gene ontology-based search for protein-protein interactions.

    Science.gov (United States)

    Park, Byungkyu; Cui, Guangyu; Lee, Hyunjin; Huang, De-Shuang; Han, Kyungsook

    2013-01-01

    This paper presents a new search engine called PPISearchEngine which finds protein-protein interactions (PPIs) using the gene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a prime number and the relation between the terms is represented by the product of prime numbers. This representation is hidden from users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number that represents the query protein. For a query protein, PPISearchEngine considers not only the GO term associated with the query protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions of the query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching search method cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To the best of our knowledge, this search engine is the first method that can process queries like 'for protein p with GO [Formula: see text], find p's interaction partners with GO [Formula: see text]'. PPISearchEngine is freely available to academics at http://search.hpid.org/.

  14. The Domain Shared by Computational and Digital Ontology: A Phenomenological Exploration and Analysis

    Science.gov (United States)

    Compton, Bradley Wendell

    2009-01-01

    The purpose of this dissertation is to explore and analyze a domain of research thought to be shared by two areas of philosophy: computational and digital ontology. Computational ontology is philosophy used to develop information systems also called computational ontologies. Digital ontology is philosophy dealing with our understanding of Being…

  15. Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: application to Brugada syndrome.

    Science.gov (United States)

    Massanet, Raimon; Gallardo-Chacon, Joan-Josep; Caminal, Pere; Perera, Alexandre

    2009-01-01

    This work presents a methodology for finding phenotype candidate genes starting from a set of known related genes. This is accomplished by automatically mining and organizing the available scientific literature using Gene Ontology-based semantic similarity. As a case study, Brugada syndrome related genes have been used as input in order to obtain a list of other possible candidate genes related with this disease. Brugada anomaly produces a typical alteration in the Electrocardiogram and carriers of the disease show an increased probability of sudden death. Results show a set of semantically coherent proteins that are shown to be related with synaptic transmission and muscle contraction physiological processes.

  16. Analysis of multiplex gene expression maps obtained by voxelation

    Directory of Open Access Journals (Sweden)

    Smith Desmond J

    2009-04-01

    Full Text Available Abstract Background Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. Results To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in

  17. Primer on Ontologies.

    Science.gov (United States)

    Hastings, Janna

    2017-01-01

    As molecular biology has increasingly become a data-intensive discipline, ontologies have emerged as an essential computational tool to assist in the organisation, description and analysis of data. Ontologies describe and classify the entities of interest in a scientific domain in a computationally accessible fashion such that algorithms and tools can be developed around them. The technology that underlies ontologies has its roots in logic-based artificial intelligence, allowing for sophisticated automated inference and error detection. This chapter presents a general introduction to modern computational ontologies as they are used in biology.

  18. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.

    Science.gov (United States)

    Khodiyar, Varsha K; Howe, Doug; Talmud, Philippa J; Breckenridge, Ross; Lovering, Ruth C

    2013-01-01

    For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer's vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer's vesicle determine asymmetry in the developing heart, the direction of 'heart jogging' and the direction of 'heart looping'.  'Heart jogging' is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward 'jog'. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development.    We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging.  We found that the human, mouse and zebrafish 'heart jogging orthologs' are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging.    This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.

  19. The logic of surveillance guidelines: an analysis of vaccine adverse event reports from an ontological perspective.

    Directory of Open Access Journals (Sweden)

    Mélanie Courtot

    Full Text Available BACKGROUND: When increased rates of adverse events following immunization are detected, regulatory action can be taken by public health agencies. However to be interpreted reports of adverse events must be encoded in a consistent way. Regulatory agencies rely on guidelines to help determine the diagnosis of the adverse events. Manual application of these guidelines is expensive, time consuming, and open to logical errors. Representing these guidelines in a format amenable to automated processing can make this process more efficient. METHODS AND FINDINGS: Using the Brighton anaphylaxis case definition, we show that existing clinical guidelines used as standards in pharmacovigilance can be logically encoded using a formal representation such as the Adverse Event Reporting Ontology we developed. We validated the classification of vaccine adverse event reports using the ontology against existing rule-based systems and a manually curated subset of the Vaccine Adverse Event Reporting System. However, we encountered a number of critical issues in the formulation and application of the clinical guidelines. We report these issues and the steps being taken to address them in current surveillance systems, and in the terminological standards in use. CONCLUSIONS: By standardizing and improving the reporting process, we were able to automate diagnosis confirmation. By allowing medical experts to prioritize reports such a system can accelerate the identification of adverse reactions to vaccines and the response of regulatory agencies. This approach of combining ontology and semantic technologies can be used to improve other areas of vaccine adverse event reports analysis and should inform both the design of clinical guidelines and how they are used in the future. AVAILABILITY: Sufficient material to reproduce our results is available, including documentation, ontology, code and datasets, at http://purl.obolibrary.org/obo/aero.

  20. Performing ontology.

    Science.gov (United States)

    Aspers, Patrik

    2015-06-01

    Ontology, and in particular, the so-called ontological turn, is the topic of a recent themed issue of Social Studies of Science (Volume 43, Issue 3, 2013). Ontology, or metaphysics, is in philosophy concerned with what there is, how it is, and forms of being. But to what is the science and technology studies researcher turning when he or she talks of ontology? It is argued that it is unclear what is gained by arguing that ontology also refers to constructed elements. The 'ontological turn' comes with the risk of creating a pseudo-debate or pseudo-activity, in which energy is used for no end, at the expense of empirical studies. This text rebuts the idea of an ontological turn as foreshadowed in the texts of the themed issue. It argues that there is no fundamental qualitative difference between the ontological turn and what we know as constructivism.

  1. Ontology-Based Platform for Conceptual Guided Dataset Analysis

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2016-05-31

    Nowadays organizations should handle a huge amount of both internal and external data from structured, semi-structured, and unstructured sources. This constitutes a major challenge (and also an opportunity) to current Business Intelligence solutions. The complexity and effort required to analyse such plethora of data implies considerable execution times. Besides, the large number of data analysis methods and techniques impede domain experts (laymen from an IT-assisted analytics perspective) to fully exploit their potential, while technology experts lack the business background to get the proper questions. In this work, we present a semantically-boosted platform for assisting layman users in (i) extracting a relevant subdataset from all the data, and (ii) selecting the data analysis technique(s) best suited for scrutinising that subdataset. The outcome is getting better answers in significantly less time. The platform has been evaluated in the music domain with promising results.

  2. The axiological ontology of occupational therapy: a philosophical analysis.

    Science.gov (United States)

    Drolet, Marie-Josée

    2014-01-01

    This article describes the results of a study analyzing several discourses on the values of occupational therapy and some philosophical assumptions upon which these values are based. A qualitative study of several values statements using the hermeneutical method--a conventional analytical approach in philosophy--was conducted. The literature review reveals that opinions on the values of occupational therapy differ greatly--no one value is shared among all the values statements examined. However, the majority of the texts mention occupational participation. A philosophical analysis of the literature shows that this value is based on a conception of human beings that can be traced back to the philosophical anthropologies of thinkers like Marx, Rousseau, Sartre, and Kant. The philosophical analysis also brought to light a certain conceptual confusion about what a value is. This article therefore offers some conceptual clarifications to help distinguish between values, beliefs, attitudes, principles, and non-evaluative concepts. It also presents the implications for practice of this philosophical analysis of values statements of the profession.

  3. An ontological knowledge based system for selection of process monitoring and analysis tools

    DEFF Research Database (Denmark)

    Singh, Ravendra; Gernaey, Krist; Gani, Rafiqul

    2010-01-01

    monitoring and analysis tools for a wide range of operations has made their selection a difficult, time consuming and challenging task. Therefore, an efficient and systematic knowledge base coupled with an inference system is necessary to support the optimal selection of process monitoring and analysis tools......, satisfying the process and user constraints. A knowledge base consisting of the process knowledge as well as knowledge on measurement methods and tools has been developed. An ontology has been designed for knowledge representation and management. The developed knowledge base has a dual feature. On the one...... procedures has been developed to retrieve the data/information stored in the knowledge base....

  4. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  5. Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

    Science.gov (United States)

    Kim, Jin-Dong; Kim, Jung-Jae; Han, Xu; Rebholz-Schuhmann, Dietrich

    2015-01-01

    The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. To further explore the grand theme, we extended the evaluation from a perspective of KB construction. Also, the Gene Regulation Ontology (GRO) task was newly introduced in the third edition. The final evaluation of the participating systems resulted in relatively low performance. The reason was attributed to the large size and complex semantic representation of the ontology. To investigate potential benefits of resource exchange between the presumably similar tasks, we measured the overlap between the datasets of the two tasks, and tested whether the dataset for one task can be used to enhance performance on the other. We report an extended evaluation on all the participating systems in the GE task, incoporating a KB perspective. For the evaluation, the final submission of each participant was converted to RDF statements, and evaluated using 8 queries that were formulated in SPARQL. The results suggest that the evaluation may be concluded differently between the two different perspectives, annotation vs. KB. We also provide a comparison of the GE and GRO tasks by converting their datasets into each other's format. More than 90% of the GE data could be converted into the GRO task format, while only half of the GRO data could be mapped to the GE task format. The imbalance in conversion indicates that the GRO is a comprehensive extension of the GE task ontology. We further used the converted GRO data as additional training data for the GE task, which helped improve GE task participant system performance. However, the converted GE data did not help GRO task participants, due to overfitting and the ontology gap.

  6. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome.

    Science.gov (United States)

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2016-01-04

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈ 7-10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate and link these data. Our group, for the first time, has compiled PCOS-related information available through scientific literature; cross-linked it with molecular, biochemical and clinical databases and presented it as a user-friendly, web-based online knowledgebase for the benefit of the scientific and clinical community. Manually curated information on associated genes, single nucleotide polymorphisms, diseases, gene ontology terms and pathways along with supporting reference literature has been collated and included in PCOSKB (http://pcoskb.bicnirrh.res.in).

  7. Development of a Framework for Ontology Based Sentiment Analysis on Social Media

    Directory of Open Access Journals (Sweden)

    Kadir Tutar

    2015-10-01

    Full Text Available Developing internet technology, trend social media applications and web 2.0 have changed the internet usage habits of internet users. By this means the internet users have started to share their feelings and thoughts on social media from anywhere at anytime. With the increasement of social media usage, valuable feedback data has been increased more and more as well. To this end collection, interpretation and evaluation of this data has come into importance. 'Sentiment analysis' and 'natural language process' methods have been used on text-based data for evaluation and opinion mining to overcome this necessity. In this study, a new ontology-based sentiment analysis method has been developed in order to enhance the accuracy of results that obtained by current sentiment analysis methods. This newly developed method requires to model the domain-specific information on the ontology prior to the analysis procedure. Though this approach, more accurate and more qualified results have been provided to obtain in compared to classic sentiment analysis methods. Another important and innovative feature of this developed infrastructure is being able to do Turkish based sentiment analysis.

  8. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research [v1; ref status: indexed, http://f1000r.es/p5

    Directory of Open Access Journals (Sweden)

    Sebastian Köhler

    2013-02-01

    Full Text Available Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  9. GOlorize: a Cytoscape plug-in for network visualization with Gene Ontology-based layout and coloring

    OpenAIRE

    Garcia, O.; Saveanu, C.; Cline, M.; Fromont-Racine, M; Jacquier, A; Schwikowski, B.; Aittokallio, T.

    2007-01-01

    International audience; We have implemented a graph layout algorithm that exposes Gene Ontology (GO) class structure on the network nodes. It can be used in conjunction with BiNGO plug-in to Cytoscape, which finds the GO categories over-represented in a given network. Our plug-in, named GOlorize, first highlights the class members with category-specific color-coding and then constructs an enhanced visualization of the network using a class-directed layout algorithm. AVAILABILITY: http://www.c...

  10. CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis

    Directory of Open Access Journals (Sweden)

    Son Tran

    2011-04-01

    Full Text Available Abstract Background The Comparative Data Analysis Ontology (CDAO is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Results Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. Conclusions CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.

  11. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

    Science.gov (United States)

    Caniza, Horacio; Romero, Alfonso E; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-08-01

    We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. alberto@cs.rhul.ac.uk GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines. © The Author 2014. Published by Oxford University Press.

  12. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  13. Ontology-oriented MIS Domain Analysis Method%面向本体的MIS领域分析方法

    Institute of Scientific and Technical Information of China (English)

    房秀蓉; 李师贤

    2003-01-01

    Domain engineering is a reuse technology of large-grain size,it focuses the analysis,design and implementation within a specific domain. This paper focuses the first phase of domain engineermg,discusses a new approach of domain analysis-ontology-oriented MIS domain analysis method and introduces the implementation of the prototype system which supports this method.

  14. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    OpenAIRE

    Kirsten Toralf; Gross Anika; Hartung Michael; Rahm Erhard

    2011-01-01

    Abstract Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and ...

  15. Domain-Specific Formal Ontology of Archaeology and Its Application in Knowledge Acquisition and Analysis

    Institute of Scientific and Technical Information of China (English)

    Chun-Xia Zhang; Cun-Gen Cao; Fang Gu; Jin-Xin Si

    2004-01-01

    Inherent heterogeneity and distribution of knowledge strongly prevent knowledge from sharing and reusing among different agents and software entities, and a formal ontology has been viewed as a promising means to tackle this problem. In this paper, a domain-specific formal ontology of archaeology is presented. The ontology mainly consists of three parts: archaeological categories, their relationships and axioms. The ontology not only captures the semantics of archaeological knowledge, but also provides archaeology with an explicit and formal specification of a shared conceptualization, thus making archaeological knowledge shareable and reusable across humans and machines in a structured fashion. Further, we propose a method to verify ontology correctness based on the individuals of categories. As applications of the ontology, we have developed an ontology-driven approach to knowledge acquisition from archaeological text and a question answering system for archaeological knowledge.

  16. PIERO ontology for analysis of biochemical transformations: effective implementation of reaction information in the IUBMB enzyme list.

    Science.gov (United States)

    Kotera, Masaaki; Nishimura, Yosuke; Nakagawa, Zen-ichi; Muto, Ai; Moriya, Yuki; Okamoto, Shinobu; Kawashima, Shuichi; Katayama, Toshiaki; Tokimatsu, Toshiaki; Kanehisa, Minoru; Goto, Susumu

    2014-12-01

    Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.

  17. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  18. Gene ontology study of methyl jasmonate-treated and non-treated hairy roots of Panax ginseng to identify genes involved in secondary metabolic pathway.

    Science.gov (United States)

    Sathiyamoorthy, S; In, J G; Gayathri, S; Kim, Y Ju; Yang, D Ch

    2010-07-01

    The roots of Panax ginseng C.A. Meyer, known as Korean ginseng have been a valuable and important folk medicine in East Asian countries. It mainly used to maintain the homeostasis of the human body, with the presence ofginsenosides and non-saponin compounds like phenol compounds, acidic polysaccharides and polyethylene compounds. Functional genomics aid to annotate based on gene ontology. In this study, we focused on the genes involving in secondary metabolic pathways and to visualize temporal changes of gene expression in ginseng hairy roots with methyl ester methyl jasmonate (MeJA) along with non-treated hairy roots. A 5.774 EST clones were clustered and assembled as 501 contigs and 2.955 singletons. Annotations categorized with molecular functions, biological processes, cellular compounds of gene ontological terms and biochemical functions, enzyme commission to sequences were assigned to metabolic pathways of Kyoto Encyclopedia of Genes and Genomes database. Comparatively, EST sequences are assigned to cellular process, metabolic process, biotic and abiotic stress stimuli, developmental and biological regulations and transports are up-regulated 2-3 fold in MeJA treated hairy roots. 46 different sub groups of enzymes found in the MeJA treated plants. These annotated ESTs represents a significant proportion of the P. ginseng and provides molecular resource for developmental of microarrays for gene expression studies concerning development, metabolism and reproduction.

  19. Engineering Ontologies

    OpenAIRE

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering systems modelling, simulation and design. A general and formal ontology, called PHYSSYS, for dynamic physical systems is presented and its structuring principles are discussed. We show how the PHYSSYS ...

  20. The neurological disease ontology.

    Science.gov (United States)

    Jensen, Mark; Cox, Alexander P; Chaudhry, Naveed; Ng, Marcus; Sule, Donat; Duncan, William; Ray, Patrick; Weinstock-Guttman, Bianca; Smith, Barry; Ruttenberg, Alan; Szigeti, Kinga; Diehl, Alexander D

    2013-12-06

    We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer's disease, multiple sclerosis, and stroke. ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms 'disease', 'diagnosis', 'disease course', and 'disorder'. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer's disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker. ND seeks to provide a formal foundation for the representation of clinical and research data

  1. Ontology-based content analysis of US patent applications from 2001-2010.

    Science.gov (United States)

    Weber, Lutz; Böhme, Timo; Irmer, Matthias

    2013-01-01

    Ontology-based semantic text analysis methods allow to automatically extract knowledge relationships and data from text documents. In this review, we have applied these technologies for the systematic analysis of pharmaceutical patents. Hierarchical concepts from the knowledge domains of chemical compounds, diseases and proteins were used to annotate full-text US patent applications that deal with pharmacological activities of chemical compounds and filed in the years 2001-2010. Compounds claimed in these applications have been classified into their respective compound classes to review the distribution of scaffold types or general compound classes such as natural products in a time-dependent manner. Similarly, the target proteins and claimed utility of the compounds have been classified and the most relevant were extracted. The method presented allows the discovery of the main areas of innovation as well as emerging fields of patenting activities - providing a broad statistical basis for competitor analysis and decision-making efforts.

  2. Ontology-based analysis of multi-scale modeling of geographical features

    Institute of Scientific and Technical Information of China (English)

    WANG; Yanhui; LI; Xiaojuan; GONG; Huili

    2006-01-01

    As multi-scale databases based on scale series of map data are built, conceptual models are needed to define proper multi-scale representation formulas and to extract model entities and the relationships among them. However, the results of multi-scale conceptual abstraction schema may differ, according to which cognition, abstraction and application views are utilized, which presents an obvious obstacle to the reuse and sharing of spatial data. To facilitate the design of unified, common and objective abstract schema views for multi-scale spatial databases, this paper proposes an ontology-based analysis method for the multi-scale modeling of geographical features. It includes a three-layer ontology model, which serves as the framework for common multi-scale abstraction schema; an explanation of formulary abstractions accompanied by definitions of entities and their relationships at the same scale, as well as different scales,which are meant to provide strong feasibility, expansibility and speciality functions; and a case in point involving multi-scale representations of road features, to verify the method's feasibility.

  3. Engineering Ontologies

    NARCIS (Netherlands)

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering syst

  4. Ontology-based representation and analysis of host-Brucella interactions

    OpenAIRE

    Lin, Yu; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Background Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this stu...

  5. OBIB-a novel ontology for biobanking.

    Science.gov (United States)

    Brochhausen, Mathias; Zheng, Jie; Birtwell, David; Williams, Heather; Masci, Anna Maria; Ellis, Helena Judge; Stoeckert, Christian J

    2016-01-01

    Biobanking necessitates extensive integration of data to allow data analysis and specimen sharing. Ontologies have been demonstrated to be a promising approach in fostering better semantic integration of biobank-related data. Hitherto no ontology provided the coverage needed to capture a broad spectrum of biobank user scenarios. Based in the principles laid out by the Open Biological and Biomedical Ontologies Foundry two biobanking ontologies have been developed. These two ontologies were merged using a modular approach consistent with the initial development principles. The merging was facilitated by the fact that both ontologies use the same Upper Ontology and re-use classes from a similar set of pre-existing ontologies. Based on the two previous ontologies the Ontology for Biobanking (http://purl.obolibrary.org/obo/obib.owl) was created. Due to the fact that there was no overlap between the two source ontologies the coverage of the resulting ontology is significantly larger than of the two source ontologies. The ontology is successfully used in managing biobank information of the Penn Medicine BioBank. Sharing development principles and Upper Ontologies facilitates subsequent merging of ontologies to achieve a broader coverage.

  6. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  7. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  8. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Directory of Open Access Journals (Sweden)

    Shibiao Wan

    Full Text Available Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  9. HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/. PMID:24647341

  10. The design ontology

    DEFF Research Database (Denmark)

    Storga, Mario; Andreasen, Mogens Myrup; Marjanovic, Dorian

    2010-01-01

    The article presents the research of the nature, building and practical role of a Design Ontology as a potential framework for the more efficient product development (PD) data-, information- and knowledge- description, -explanation, -understanding and -reusing. In the methodology for development...... of the ontology two steps could be identified: empirical research and computer implementation. Empirical research has included domain documentation analysis (Genetic Design Model System developed by Mortensen 1999), identification of the key concepts and relations between them, and categorisation of the concepts...... and relations into taxonomies. As an epistemological foundation for the concepts formalisation, The Suggested Upper Merged Ontology (SUMO) proposed by IEEE, was reused. As the result of the previously described process, the ontology content has been categorised into six main subcategories divided between...

  11. Ontology Research

    OpenAIRE

    Welty, Christopher

    2003-01-01

    In this issue, I have collected a fairly broad, although by no means exhaustive, sampling of work in the field of ontology research. To define a field is often quite difficult; it is more a collection of people and ideas than it is a specific technology. To represent our field, I present six articles that cover several of the major thrusts of ontology research from the past decade.

  12. Ontology of the False State

    Directory of Open Access Journals (Sweden)

    Testa Italo

    2015-09-01

    Full Text Available In this paper I will argue that critical theory needs to make its socio-ontological commitments explicit, whilst on the other hand I will posit that contemporary social ontology needs to amend its formalistic approach by embodying a critical theory perspective. In the first part of my paper I will discuss how the question was posed in Horkheimer’s essays of the 1930s, which leave open two options: (1 a constructive inclusion of social ontology within social philosophy, or else (2 a program of social philosophy that excludes social ontology. Option (2 corresponds to Adorno’s position, which I argue is forced to recur to a hidden social ontology. Following option (1, I first develop a meta-critical analysis of Searle, arguing that his social ontology presupposes a notion of ‘recognition’ which it cannot account for. Furthermore, by means of a critical reading of Honneth, I argue that critical theory could incorporate a socio-ontological approach, giving value to the constitutive socio-ontological role of recognition and to the socio-ontological role of objectification. I will finish with a proposal for a socio-ontological characterization of reification which involves that the basic occurrence of recognition is to be grasped at the level of background practices.

  13. Using Network Extracted Ontologies to Identify Novel Genes with Roles in Appressorium Development in the Rice Blast Fungus Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Ryan M. Ames

    2017-01-01

    Full Text Available Magnaporthe oryzae is the causal agent of rice blast disease, the most important infection of rice worldwide. Half the world’s population depends on rice for its primary caloric intake and, as such, rice blast poses a serious threat to food security. The stages of M. oryzae infection are well defined, with the formation of an appressorium, a cell type that allows penetration of the plant cuticle, particularly well studied. However, many of the key pathways and genes involved in this disease stage are yet to be identified. In this study, I have used network-extracted ontologies (NeXOs, hierarchical structures inferred from RNA-Seq data, to identify pathways involved in appressorium development, which in turn highlights novel genes with potential roles in this process. This study illustrates the use of NeXOs for pathway identification from large-scale genomics data and also identifies novel genes with potential roles in disease. The methods presented here will be useful to study disease processes in other pathogenic species and these data represent predictions of novel targets for intervention in M. oryzae.

  14. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    Science.gov (United States)

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  15. Generating Application Ontologies from Reference Ontologies

    OpenAIRE

    Shaw, Marianne; Detwiler, Landon T.; Brinkley, James F.; Suciu, Dan

    2008-01-01

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies.

  16. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data.

    Science.gov (United States)

    Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan

    2016-01-01

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

  17. Building Ontologies in DAML + OIL

    Science.gov (United States)

    Wroe, Chris; Bechhofer, Sean; Lord, Phillip; Rector, Alan; Goble, Carole

    2003-01-01

    In this article we describe an approach to representing and building ontologies advocated by the Bioinformatics and Medical Informatics groups at the University of Manchester. The hand-crafting of ontologies offers an easy and rapid avenue to delivering ontologies. Experience has shown that such approaches are unsustainable. Description logic approaches have been shown to offer computational support for building sound, complete and logically consistent ontologies. A new knowledge representation language, DAML + OIL, offers a new standard that is able to support many styles of ontology, from hand-crafted to full logic-based descriptions with reasoning support. We describe this language, the OilEd editing tool, reasoning support and a strategy for the language’s use. We finish with a current example, in the Gene Ontology Next Generation (GONG) project, that uses DAML + OIL as the basis for moving the Gene Ontology from its current hand-crafted, form to one that uses logical descriptions of a concept’s properties to deliver a more complete version of the ontology. PMID:18629114

  18. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  19. An analysis of fundamental concepts in the conceptual framework using ontology technologies

    CSIR Research Space (South Africa)

    Gerber, MC

    2014-01-01

    Full Text Available , which aim to address the problems of vagueness, inconsistency and ambiguity. This paper reports on the findings of a design science research (DSR) project that, as artefact, developed a first version ontology-based formal language representing...

  20. Using phylogenomic patterns and gene ontology to identify proteins of importance in plant evolution.

    Science.gov (United States)

    Cibrián-Jaramillo, Angélica; De la Torre-Bárcena, Jose E; Lee, Ernest K; Katari, Manpreet S; Little, Damon P; Stevenson, Dennis W; Martienssen, Rob; Coruzzi, Gloria M; DeSalle, Rob

    2010-07-12

    We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants.

  1. Role of Ontology in Information Retrieval

    Institute of Scientific and Technical Information of China (English)

    WU Dan; WANG Hui-lin

    2006-01-01

    Based on the comparison between ontology and thesaurus, and the analysis of an ontology-based Information Retrieval (IR) model, the potential advantages that ontology may contribute to IR are analyzed. Then a general architecture of ontology-based Information Retrieval System (IRS) and the approach of constructing it are presented. Based on the researches, the role of ontology in IR is summarized from four aspects and a typical system called Textpresso is analyzed. Finally, a conclusion is drawn that utilizing ontology is the trend of IR and can really improve the IRS.

  2. DESIGN AND IMPLEMENTATION OF ONTOLOGY BASED ON SEMANTIC ANALYSIS FOR GIS APPLICATION

    Directory of Open Access Journals (Sweden)

    S.S Mantha

    2011-09-01

    Full Text Available The Agricultural Census information is a leading source of facts and figures about a country’s agricultural development. Such information is used by many who provide services to farmers and rural communities including federal, state and local governments, agribusinesses etc. Also such information when integrated with other agricultural surveys and statistics can help in monitoring progress towards the achievement of Millennium Development Goals (MDGs of a country. But such huge volumes of census data are available at various geo-spatial portals either in proprietary formats like shape files, .dat files etc or in form of database tables, word documents, PDF’s etc. In order to do analysis or to just see the progress of a particular area such huge datasheets have to be scanned. This paper provides solutions to various problems related to Geo-spatial Agricultural Census data in three aspects: (1 Storage / Organization of census data using enhanced methods such as ontologies. (2 Visualization of data using Google Maps and Column Charts. (3 Analysis of data using interactive methods like Column Charts.

  3. SUGOI: automated ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2015-04-01

    Full Text Available A foundational ontology can solve interoperability issues among the domain ontologies aligned to it. However, several foundational ontologies have been developed, hence such interoperability issues exist among domain ontologies. The novel SUGOI tool...

  4. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

    Directory of Open Access Journals (Sweden)

    Xiaomei Wu

    Full Text Available BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS, which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC. RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS. HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.

  5. Ontology Localization

    OpenAIRE

    2009-01-01

    Nuestra meta principal en esta tesis es proponer una solución para construir una ontología multilingüe, a través de la localización automática de una ontología. La noción de localización viene del área de Desarrollo de Software que hace referencia a la adaptación de un producto de software a un ambiente no nativo. En la Ingeniería Ontológica, la localización de ontologías podría ser considerada como un subtipo de la localización de software en el cual el producto es un modelo compartido de un...

  6. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Directory of Open Access Journals (Sweden)

    José Cuenca

    Full Text Available Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR to map a genome region linked to Alternaria brown spot (ABS resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  7. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  8. Genetically Based Location from Triploid Populations and Gene Ontology of a 3.3-Mb Genome Region Linked to Alternaria Brown Spot Resistance in Citrus Reveal Clusters of Resistance Genes

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids. PMID:24116149

  9. The ontology of biological sequences

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2009-11-01

    Full Text Available Abstract Background Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike. Results We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving. Conclusion Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.

  10. Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available The current increase in Gene Ontology (GO annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

  11. Domain-Specific Ontology of Botany

    Institute of Scientific and Technical Information of China (English)

    Fang Gu; Cun-Gen Cao; Yue-Fei Sui; Wen Tian

    2004-01-01

    Domain-specific ontologies are greatly useful in knowledge acquisition, sharing and analysis. In this paper, botany-specific ontology for acquiring and analyzing botanical knowledge is presented. The ontology is represented in a set of well-defined categories, and each concept is viewed as an instance of certain category. The authors also introduce botany-specific axioms, an integral part of the ontology, for checking and reasoning with the acquired knowledge. Consistency, completeness and redundancy of the axioms are discussed.

  12. Analysis of Deviations in an Agent and Ontology-Based Dialogue Management System

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Algorithms of detecting dialogue deviations from a dialogue topic in an agent and ontology-based dialogue management system(AODMS) are proposed. In AODMS, agents and ontologies are introduced to represent domain knowledge. And general algorithms that model dialogue phenomena in different domains can be realized in that complex relationships between knowledge in different domains can be described by ontologies. An evaluation of the dialogue management system with deviation-judging algorithms on 736 utterances shows that the AODMS is able to talk about the given topic consistently and answer 86.6% of the utterances, while only 72.1% of the utterances can be responded correctly without deviation-judging module.

  13. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  14. Ontology Requirements Specification

    OpenAIRE

    Suárez-Figueroa, Mari Carmen; Gómez-Pérez, A.

    2012-01-01

    The goal of the ontology requirements specification activity is to state why the ontology is being built, what its intended uses are, who the end users are, and which requirements the ontology should fulfill. This chapter presents detailed methodological guidelines for specifying ontology requirements efficiently. These guidelines will help ontology engineers to capture ontology requirements and produce the ontology requirements specification document (ORSD). The ORSD will play a key role dur...

  15. Region Evolution eXplorer - A tool for discovering evolution trends in ontology regions.

    Science.gov (United States)

    Christen, Victor; Hartung, Michael; Groß, Anika

    2015-01-01

    A large number of life science ontologies has been developed to support different application scenarios such as gene annotation or functional analysis. The continuous accumulation of new insights and knowledge affects specific portions in ontologies and thus leads to their adaptation. Therefore, it is valuable to study which ontology parts have been extensively modified or remained unchanged. Users can monitor the evolution of an ontology to improve its further development or apply the knowledge in their applications. Here we present REX (Region Evolution eXplorer) a web-based system for exploring the evolution of ontology parts (regions). REX provides an analysis platform for currently about 1,000 versions of 16 well-known life science ontologies. Interactive workflows allow an explorative analysis of changing ontology regions and can be used to study evolution trends for long-term periods. REX is a web application providing an interactive and user-friendly interface to identify (un)stable regions in large life science ontologies. It is available at http://www.izbi.de/rex.

  16. Individual Building Extraction from TerraSAR-X Images Based on Ontological Semantic Analysis

    Directory of Open Access Journals (Sweden)

    Rong Gui

    2016-08-01

    Full Text Available Accurate building information plays a crucial role for urban planning, human settlements and environmental management. Synthetic aperture radar (SAR images, which deliver images with metric resolution, allow for analyzing and extracting detailed information on urban areas. In this paper, we consider the problem of extracting individual buildings from SAR images based on domain ontology. By analyzing a building scattering model with different orientations and structures, the building ontology model is set up to express multiple characteristics of individual buildings. Under this semantic expression framework, an object-based SAR image segmentation method is adopted to provide homogeneous image objects, and three categories of image object features are extracted. Semantic rules are implemented by organizing image object features, and the individual building objects expression based on an ontological semantic description is formed. Finally, the building primitives are used to detect buildings among the available image objects. Experiments on TerraSAR-X images of Foshan city, China, with a spatial resolution of 1.25 m × 1.25 m, have shown the total extraction rates are above 84%. The results indicate the ontological semantic method can exactly extract flat-roof and gable-roof buildings larger than 250 pixels with different orientations.

  17. An Analysis of the Ontological Causal Relation in Physics and Its Educational Implications

    Science.gov (United States)

    Cheong, Yong Wook

    2016-01-01

    An ontological causal relation is a quantified relation between certain interactions and changes in corresponding properties. Key ideas in physics, such as Newton's second law and the first law of thermodynamics, are representative examples of these relations. In connection with the teaching and learning of these relations, this study investigated…

  18. An Analysis of the Ontological Causal Relation in Physics and Its Educational Implications

    Science.gov (United States)

    Cheong, Yong Wook

    2016-01-01

    An ontological causal relation is a quantified relation between certain interactions and changes in corresponding properties. Key ideas in physics, such as Newton's second law and the first law of thermodynamics, are representative examples of these relations. In connection with the teaching and learning of these relations, this study investigated…

  19. Ontological backdrop

    DEFF Research Database (Denmark)

    Galle, Per

    2000-01-01

    In this report I keep track of ontological assumptions or implications of other OARs, introducing a system of categories and concepts that is compatible with them. The purpose was originally to keep terminology consistent throughout all OARs. However, the report also gives a condensed picture...... of the world view which underlies my current work on product modelling. It contains a justification of my view of concept exemplification, with lines traced back to Kant's work on epistemology....

  20. Sentiment analysis of Chinese microblogging based on sentiment ontology: a case study of `7.23 Wenzhou Train Collision'

    Science.gov (United States)

    Shi, Wei; Wang, Hongwei; He, Shaoyi

    2013-12-01

    Sentiment analysis of microblogging texts can facilitate both organisations' public opinion monitoring and governments' response strategies development. Nevertheless, most of the existing analysis methods are conducted on Twitter, lacking of sentiment analysis of Chinese microblogging (Weibo), and they generally rely on a large number of manually annotated training or machine learning to perform sentiment classification, yielding with difficulties in application. This paper addresses these problems and employs a sentiment ontology model to examine sentiment analysis of Chinese microblogging. We conduct a sentiment analysis of all public microblogging posts about '7.23 Wenzhou Train Collision' broadcasted by Sina microblogging users between 23 July and 1 August 2011. For every day in this time period, we first extract eight dimensions of sentiment (expect, joy, love, surprise, anxiety, sorrow, angry, and hate), and then build fuzzy sentiment ontology based on HowNet and semantic similarity for sentiment analysis; we also establish computing methods of influence and sentiment of microblogging texts; and we finally explore the change of public sentiment after '7.23 Wenzhou Train Collision'. The results show that the established sentiment analysis method has excellent application, and the change of different emotional values can reflect the success or failure of guiding the public opinion by the government.

  1. Building ontologies with basic formal ontology

    CERN Document Server

    Arp, Robert; Spear, Andrew D.

    2015-01-01

    In the era of "big data," science is increasingly information driven, and the potential for computers to store, manage, and integrate massive amounts of data has given rise to such new disciplinary fields as biomedical informatics. Applied ontology offers a strategy for the organization of scientific information in computer-tractable form, drawing on concepts not only from computer and information science but also from linguistics, logic, and philosophy. This book provides an introduction to the field of applied ontology that is of particular relevance to biomedicine, covering theoretical components of ontologies, best practices for ontology design, and examples of biomedical ontologies in use. After defining an ontology as a representation of the types of entities in a given domain, the book distinguishes between different kinds of ontologies and taxonomies, and shows how applied ontology draws on more traditional ideas from metaphysics. It presents the core features of the Basic Formal Ontology (BFO), now u...

  2. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

    Directory of Open Access Journals (Sweden)

    Tintle Nathan L

    2012-08-01

    Full Text Available Abstract Background Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. Results We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Conclusions Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  3. Microarray gene expression profiling and analysis in renal cell carcinoma

    Directory of Open Access Journals (Sweden)

    Sadhukhan Provash

    2004-06-01

    Full Text Available Abstract Background Renal cell carcinoma (RCC is the most common cancer in adult kidney. The accuracy of current diagnosis and prognosis of the disease and the effectiveness of the treatment for the disease are limited by the poor understanding of the disease at the molecular level. To better understand the genetics and biology of RCC, we profiled the expression of 7,129 genes in both clear cell RCC tissue and cell lines using oligonucleotide arrays. Methods Total RNAs isolated from renal cell tumors, adjacent normal tissue and metastatic RCC cell lines were hybridized to affymatrix HuFL oligonucleotide arrays. Genes were categorized into different functional groups based on the description of the Gene Ontology Consortium and analyzed based on the gene expression levels. Gene expression profiles of the tissue and cell line samples were visualized and classified by singular value decomposition. Reverse transcription polymerase chain reaction was performed to confirm the expression alterations of selected genes in RCC. Results Selected genes were annotated based on biological processes and clustered into functional groups. The expression levels of genes in each group were also analyzed. Seventy-four commonly differentially expressed genes with more than five-fold changes in RCC tissues were identified. The expression alterations of selected genes from these seventy-four genes were further verified using reverse transcription polymerase chain reaction (RT-PCR. Detailed comparison of gene expression patterns in RCC tissue and RCC cell lines shows significant differences between the two types of samples, but many important expression patterns were preserved. Conclusions This is one of the initial studies that examine the functional ontology of a large number of genes in RCC. Extensive annotation, clustering and analysis of a large number of genes based on the gene functional ontology revealed many interesting gene expression patterns in RCC. Most

  4. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

    Science.gov (United States)

    Diehl, Alexander D; Meehan, Terrence F; Bradford, Yvonne M; Brush, Matthew H; Dahdul, Wasila M; Dougall, David S; He, Yongqun; Osumi-Sutherland, David; Ruttenberg, Alan; Sarntivijai, Sirarat; Van Slyke, Ceri E; Vasilevsky, Nicole A; Haendel, Melissa A; Blake, Judith A; Mungall, Christopher J

    2016-07-04

    The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the

  5. Gastric Cancer Associated Genes Identified by an Integrative Analysis of Gene Expression Data

    Science.gov (United States)

    Jiang, Bing; Li, Shuwen; Jiang, Zhi

    2017-01-01

    Gastric cancer is one of the most severe complex diseases with high morbidity and mortality in the world. The molecular mechanisms and risk factors for this disease are still not clear since the cancer heterogeneity caused by different genetic and environmental factors. With more and more expression data accumulated nowadays, we can perform integrative analysis for these data to understand the complexity of gastric cancer and to identify consensus players for the heterogeneous cancer. In the present work, we screened the published gene expression data and analyzed them with integrative tool, combined with pathway and gene ontology enrichment investigation. We identified several consensus differentially expressed genes and these genes were further confirmed with literature mining; at last, two genes, that is, immunoglobulin J chain and C-X-C motif chemokine ligand 17, were screened as novel gastric cancer associated genes. Experimental validation is proposed to further confirm this finding. PMID:28232943

  6. Formalized Conflicts Detection Based on the Analysis of Multiple Emails: An Approach Combining Statistics and Ontologies

    Science.gov (United States)

    Zakaria, Chahnez; Curé, Olivier; Salzano, Gabriella; Smaïli, Kamel

    In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors involved in relational conflicts. Our approach detects conflicts in emails, probably the most popular type of documents in CSCW, but the methods used can handle other text-based documents. These methods rely on the combination of statistical and ontological operations. The proposed solution is decomposed in several steps: (i) we enrich a simple negative emotion ontology with terms occuring in the corpus of emails, (ii) we categorize each conflicting email according to the concepts of this ontology and (iii) we identify emails, subjects and team members involved in conflicting emails using possibilistic description logic and a set of proposed measures. Each of these steps are evaluated and validated on concrete examples. Moreover, this approach's framework is generic and can be easily adapted to domains other than conflicts, e.g. security issues, and extended with operations making use of our proposed set of measures.

  7. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior

  8. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior an

  9. Ontological Surprises

    DEFF Research Database (Denmark)

    Leahu, Lucian

    2016-01-01

    This paper investigates how we might rethink design as the technological crafting of human-machine relations in the context of a machine learning technique called neural networks. It analyzes Google’s Inceptionism project, which uses neural networks for image recognition. The surprising output of...... a hybrid approach where machine learning algorithms are used to identify objects as well as connections between them; finally, it argues for remaining open to ontological surprises in machine learning as they may enable the crafting of different relations with and through technologies....

  10. Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO.

    Directory of Open Access Journals (Sweden)

    Uma D Vempati

    Full Text Available Huge amounts of high-throughput screening (HTS data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO. BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint. Previously, we reported on the higher-level design of BAO and on the semantic querying capabilities offered by the ontology-indexed triple store of HTS data. Here, we report on our detailed design, annotation pipeline, substantially enlarged annotation knowledgebase, and analysis results. We used BAO to annotate assays from the largest public HTS data repository, PubChem, and demonstrate its utility to categorize and analyze diverse HTS results from numerous experiments. BAO is publically available from the NCBO BioPortal at http://bioportal.bioontology.org/ontologies/1533. BAO provides controlled terminology and uniform scope to report probe and drug discovery screening assays and results. BAO leverages description logic to formalize the domain knowledge and facilitate the semantic integration with diverse other resources. As a consequence, BAO offers the potential to infer new knowledge from a corpus of assay results, for example molecular mechanisms of action of perturbagens.

  11. Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO).

    Science.gov (United States)

    Vempati, Uma D; Przydzial, Magdalena J; Chung, Caty; Abeyruwan, Saminda; Mir, Ahsan; Sakurai, Kunie; Visser, Ubbo; Lemmon, Vance P; Schürer, Stephan C

    2012-01-01

    Huge amounts of high-throughput screening (HTS) data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO). BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint. Previously, we reported on the higher-level design of BAO and on the semantic querying capabilities offered by the ontology-indexed triple store of HTS data. Here, we report on our detailed design, annotation pipeline, substantially enlarged annotation knowledgebase, and analysis results. We used BAO to annotate assays from the largest public HTS data repository, PubChem, and demonstrate its utility to categorize and analyze diverse HTS results from numerous experiments. BAO is publically available from the NCBO BioPortal at http://bioportal.bioontology.org/ontologies/1533. BAO provides controlled terminology and uniform scope to report probe and drug discovery screening assays and results. BAO leverages description logic to formalize the domain knowledge and facilitate the semantic integration with diverse other resources. As a consequence, BAO offers the potential to infer new knowledge from a corpus of assay results, for example molecular mechanisms of action of perturbagens.

  12. Generating application ontologies from reference ontologies.

    Science.gov (United States)

    Shaw, Marianne; Detwiler, Landon T; Brinkley, James F; Suciu, Dan

    2008-11-06

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies. We present extensions to the semantic web query language SparQL that will allow researchers to develop application ontologies that are derived from reference ontologies. We have modified the ARQ query processor to support subqueries, recursive subqueries, and Skolem functions for node creation. We demonstrate the utility of these extensions by deriving an application ontology from the Foundational Model of Anatomy.

  13. On Automatic Modeling and Use of Domain-specific Ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Knappe, Rasmus; Bulskov, Henrik

    2005-01-01

    is a specific lattice-based concept algebraic language by which ontologies are inherently generative. The modeling of a domain specific ontology is based on a general ontology built upon common knowledge resources as dictionaries and thesauri. Based on analysis of concept occurrences in the object document......-based navigation. Finally, a measure of concept similarity is derived from the domain specific ontology based on occurrences, commonalities, and distances in the ontology....

  14. Using Ontology Fingerprints to Evaluate Genome-wide Association Results

    OpenAIRE

    Lam Tsoi; Michael Boehnke; Richard Klein; Jim Zheng

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  15. Integrated analysis of gene expression by association rules discovery

    Directory of Open Access Journals (Sweden)

    Carazo Jose M

    2006-02-01

    Full Text Available Abstract Background Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. Results In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work. Conclusion The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package.

  16. Anatomy Ontology Matching Using Markov Logic Networks

    Directory of Open Access Journals (Sweden)

    Chunhua Li

    2016-01-01

    Full Text Available The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relationships between ontologies describing different species. Ontology matching is a kind of solutions to find semantic correspondences between entities of different ontologies. Markov logic networks which unify probabilistic graphical model and first-order logic provide an excellent framework for ontology matching. We combine several different matching strategies through first-order logic formulas according to the structure of anatomy ontologies. Experiments on the adult mouse anatomy and the human anatomy have demonstrated the effectiveness of proposed approach in terms of the quality of result alignment.

  17. How Ontologies are Made: Studying the Hidden Social Dynamics Behind Collaborative Ontology Engineering Projects.

    Science.gov (United States)

    Strohmaier, Markus; Walk, Simon; Pöschko, Jan; Lamprecht, Daniel; Tudorache, Tania; Nyulas, Csongor; Musen, Mark A; Noy, Natalya F

    2013-05-01

    Traditionally, evaluation methods in the field of semantic technologies have focused on the end result of ontology engineering efforts, mainly, on evaluating ontologies and their corresponding qualities and characteristics. This focus has led to the development of a whole arsenal of ontology-evaluation techniques that investigate the quality of ontologies as a product. In this paper, we aim to shed light on the process of ontology engineering construction by introducing and applying a set of measures to analyze hidden social dynamics. We argue that especially for ontologies which are constructed collaboratively, understanding the social processes that have led to its construction is critical not only in understanding but consequently also in evaluating the ontology. With the work presented in this paper, we aim to expose the texture of collaborative ontology engineering processes that is otherwise left invisible. Using historical change-log data, we unveil qualitative differences and commonalities between different collaborative ontology engineering projects. Explaining and understanding these differences will help us to better comprehend the role and importance of social factors in collaborative ontology engineering projects. We hope that our analysis will spur a new line of evaluation techniques that view ontologies not as the static result of deliberations among domain experts, but as a dynamic, collaborative and iterative process that needs to be understood, evaluated and managed in itself. We believe that advances in this direction would help our community to expand the existing arsenal of ontology evaluation techniques towards more holistic approaches.

  18. An ontology for microbial phenotypes.

    Science.gov (United States)

    Chibucos, Marcus C; Zweifel, Adrienne E; Herrera, Jonathan C; Meza, William; Eslamfam, Shabnam; Uetz, Peter; Siegele, Deborah A; Hu, James C; Giglio, Michelle G

    2014-11-30

    Phenotypic data are routinely used to elucidate gene function in organisms amenable to genetic manipulation. However, previous to this work, there was no generalizable system in place for the structured storage and retrieval of phenotypic information for bacteria. The Ontology of Microbial Phenotypes (OMP) has been created to standardize the capture of such phenotypic information from microbes. OMP has been built on the foundations of the Basic Formal Ontology and the Phenotype and Trait Ontology. Terms have logical definitions that can facilitate computational searching of phenotypes and their associated genes. OMP can be accessed via a wiki page as well as downloaded from SourceForge. Initial annotations with OMP are being made for Escherichia coli using a wiki-based annotation capture system. New OMP terms are being concurrently developed as annotation proceeds. We anticipate that diverse groups studying microbial genetics and associated phenotypes will employ OMP for standardizing microbial phenotype annotation, much as the Gene Ontology has standardized gene product annotation. The resulting OMP resource and associated annotations will facilitate prediction of phenotypes for unknown genes and result in new experimental characterization of phenotypes and functions.

  19. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    Science.gov (United States)

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  20. Simple Ontology Format (SOFT)

    Energy Technology Data Exchange (ETDEWEB)

    2011-10-01

    Simple Ontology Format (SOFT) library and file format specification provides a set of simple tools for developing and maintaining ontologies. The library, implemented as a perl module, supports parsing and verification of the files in SOFt format, operations with ontologies (adding, removing, or filtering of entities), and converting of ontologies into other formats. SOFT allows users to quickly create ontologies using only a basic text editor, verify it, and portray it in a graph layout system using customized styles.

  1. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...

  2. An Ontology for Insider Threat Indicators Development and Applications

    Science.gov (United States)

    2014-11-01

    J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, et al., " Gene Ontology : tool for the unification of biology," Nature genetics, vol. 25, pp. 25-29...An Ontology for Insider Threat Indicators Development and Applications Daniel L. Costa, Matthew L. Collins, Samuel J. Perl, Michael J. Albrethsen...cert.org Abstract—We describe our ongoing development of an insider threat indicator ontology . Our ontology is intended to serve as a standardized

  3. Rice Transcriptome Analysis to Identify Possible Herbicide Quinclorac Detoxification Genes

    Directory of Open Access Journals (Sweden)

    Wenying eXu

    2015-09-01

    Full Text Available Quinclorac is a highly selective auxin-type herbicide, and is widely used in the effective control of barnyard grass in paddy rice fields, improving the world’s rice yield. The herbicide mode of action of quinclorac has been proposed and hormone interactions affect quinclorac signaling. Because of widespread use, quinclorac may be transported outside rice fields with the drainage waters, leading to soil and water pollution and environmental health problems.In this study, we used 57K Affymetrix rice whole-genome array to identify quinclorac signaling response genes to study the molecular mechanisms of action and detoxification of quinclorac in rice plants. Overall, 637 probe sets were identified with differential expression levels under either 6 or 24 h of quinclorac treatment. Auxin-related genes such as GH3 and OsIAAs responded to quinclorac treatment. Gene Ontology analysis showed that genes of detoxification-related family genes were significantly enriched, including cytochrome P450, GST, UGT, and ABC and drug transporter genes. Moreover, real-time RT-PCR analysis showed that top candidate P450 families such as CYP81, CYP709C and CYP72A genes were universally induced by different herbicides. Some Arabidopsis genes for the same P450 family were up-regulated under quinclorac treatment.We conduct rice whole-genome GeneChip analysis and the first global identification of quinclorac response genes. This work may provide potential markers for detoxification of quinclorac and biomonitors of environmental chemical pollution.

  4. Margin based ontology sparse vector learning algorithm and applied in biology science.

    Science.gov (United States)

    Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad

    2017-01-01

    In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  5. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation

    Science.gov (United States)

    2013-01-01

    Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/. PMID:23409969

  6. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms

    Directory of Open Access Journals (Sweden)

    Argraves W Scott

    2010-04-01

    Full Text Available Abstract Background An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Results Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc. or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.. Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. Conclusions GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies

  7. Ontology-Based Gap Analysis for Technology Selection: A Knowledge Management Framework for the Support of Equipment Purchasing Processes

    Science.gov (United States)

    Macris, Aristomenis M.; Georgakellos, Dimitrios A.

    Technology selection decisions such as equipment purchasing and supplier selection are decisions of strategic importance to companies. The nature of these decisions usually is complex, unstructured and thus, difficult to be captured in a way that will be efficiently reusable. Knowledge reusability is of paramount importance since it enables users participate actively in process design/redesign activities stimulated by the changing technology selection environment. This paper addresses the technology selection problem through an ontology-based approach that captures and makes reusable the equipment purchasing process and assists in identifying (a) the specifications requested by the users' organization, (b) those offered by various candidate vendors' organizations and (c) in performing specifications gap analysis as a prerequisite for effective and efficient technology selection. This approach has practical appeal, operational simplicity, and the potential for both immediate and long-term strategic impact. An example from the iron and steel industry is also presented to illustrate the approach.

  8. Use of the Protein Ontology (PRO for Multi-Faceted Analysis of Biological Processes: a Case Study of the Spindle Checkpoint

    Directory of Open Access Journals (Sweden)

    Karen E Ross

    2013-04-01

    Full Text Available As a member of the Open Biomedical Ontologies (OBO foundry, the Protein Ontology (PRO provides an ontological representation of protein forms and complexes and their relationships. Annotations in PRO can be assigned to individual protein forms and complexes, each distinguishable down to the level of post-translational modification, thereby allowing for a more precise depiction of protein function than is possible with annotations to the gene as a whole. Moreover, PRO is fully interoperable with other OBO ontologies and integrates knowledge from other protein-centric resources such as UniProt and Reactome. Here we demonstrate the value of the PRO framework in the investigation of the spindle checkpoint, a highly conserved biological process that relies extensively on protein modification and protein complex formation. The spindle checkpoint maintains genomic integrity by monitoring the attachment of chromosomes to spindle microtubules and delaying cell cycle progression until the spindle is fully assembled. Using PRO in conjunction with other bioinformatics tools, we explored the cross-species conservation of spindle checkpoint proteins, including phosphorylated forms and complexes; studied the impact of phosphorylation on spindle checkpoint function; and examined the interactions of spindle checkpoint proteins with the kinetochore, the site of checkpoint activation. Our approach can be generalized to any biological process of interest.

  9. Datamining with Ontologies.

    Science.gov (United States)

    Hoehndorf, Robert; Gkoutos, Georgios V; Schofield, Paul N

    2016-01-01

    The use of ontologies has increased rapidly over the past decade and they now provide a key component of most major databases in biology and biomedicine. Consequently, datamining over these databases benefits from considering the specific structure and content of ontologies, and several methods have been developed to use ontologies in datamining applications. Here, we discuss the principles of ontology structure, and datamining methods that rely on ontologies. The impact of these methods in the biological and biomedical sciences has been profound and is likely to increase as more datasets are becoming available using common, shared ontologies.

  10. Tutorial on Protein Ontology Resources.

    Science.gov (United States)

    Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R; Ross, Karen E; Natale, Darren A

    2017-01-01

    The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.

  11. Comparing Relational and Ontological Triple Stores in Healthcare Domain

    Directory of Open Access Journals (Sweden)

    Ozgu Can

    2017-01-01

    Full Text Available Today’s technological improvements have made ubiquitous healthcare systems that converge into smart healthcare applications in order to solve patients’ problems, to communicate effectively with patients, and to improve healthcare service quality. The first step of building a smart healthcare information system is representing the healthcare data as connected, reachable, and sharable. In order to achieve this representation, ontologies are used to describe the healthcare data. Combining ontological healthcare data with the used and obtained data can be maintained by storing the entire health domain data inside big data stores that support both relational and graph-based ontological data. There are several big data stores and different types of big data sets in the healthcare domain. The goal of this paper is to determine the most applicable ontology data store for storing the big healthcare data. For this purpose, AllegroGraph and Oracle 12c data stores are compared based on their infrastructural capacity, loading time, and query response times. Hence, healthcare ontologies (GENE Ontology, Gene Expression Ontology (GEXO, Regulation of Transcription Ontology (RETO, Regulation of Gene Expression Ontology (REXO are used to measure the ontology loading time. Thereafter, various queries are constructed and executed for GENE ontology in order to measure the capacity and query response times for the performance comparison between AllegroGraph and Oracle 12c triple stores.

  12. The foundational ontology library ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available A purpose of a foundational ontology is to solve interoperability issues among domain ontologies and they are used for ontology- driven conceptual data modelling. Multiple foundational ontologies have been developed in recent years, and most of them...

  13. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  14. Investigation of gene expression profiles in coronary heart disease and functional analysis of target gene

    Institute of Scientific and Technical Information of China (English)

    YIN HuiJun; MA Xiaoduan; JIANG YueRong; SHI DaZhuo; CHEN KeJi

    2009-01-01

    The research outlined here includes constitution of the differential gene expression profile by means of oligonucleotide gene microarray and functional analysis of the target gene for coronary heart disease (CHD). In a microarray screening experiment, the predominance of inflammation-and immune-related genes is presented in the expression profile of 107 differential genes based on the analysis of gene ontology and gene pathway. IL-8, an inflammatory factor, is identified as one of the genes that were markedly up-regulated in CHD. The plasma level of IL-8 is significantly raised in patients with CHD (n = 30) compared with healthy controls (n = 40), which underscores the clinical relevance of the in vitro finding. The further functional analysis shows that IL-8 affects platelet aggregation percentage, ex-pression of CD62p and platelet aggregation morphology in 12 healthy volunteers to some extent. These findings suggest the relevance of inflammation and immune responses to CHD at the DNA level. Moreover, IL-8 may be involved in the pathogenesis of CHD through the pathway of platelet activation.

  15. Ontology for Genome Comparison and Genomic Rearrangements

    Directory of Open Access Journals (Sweden)

    Anil Wipat

    2006-04-01

    Full Text Available We present an ontology for describing genomes, genome comparisons, their evolution and biological function. This ontology will support the development of novel genome comparison algorithms and aid the community in discussing genomic evolution. It provides a framework for communication about comparative genomics, and a basis upon which further automated analysis can be built. The nomenclature defined by the ontology will foster clearer communication between biologists, and also standardize terms used by data publishers in the results of analysis programs. The overriding aim of this ontology is the facilitation of consistent annotation of genomes through computational methods, rather than human annotators. To this end, the ontology includes definitions that support computer analysis and automated transfer of annotations between genomes, rather than relying upon human mediation.

  16. Ontology-based, Tissue MicroArray oriented, image centered tissue bank

    Directory of Open Access Journals (Sweden)

    Viti Federica

    2008-04-01

    Full Text Available Abstract Background Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. Results In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. Conclusions Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes.

  17. Microarray analysis of gene expression profiles in the bovine mammary gland during lactation

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Mammary glands undergo functional and metabolic changes during virgin,lactation and dry periods.A total of 122 genes were identified as differentially expressed,including 79 up-regulated and 43 down-regulated genes during lactation compared with virgin and dry periods.Gene ontology analysis showed the functional classification of the up-regulated genes in lactation,including transport,biosynthetic process,signal transduction,catalytic activity,immune system process,cell death,and positive regulation of the developmental process.Microarray data clarified molecular events in bovine mammary gland lactation.

  18. Análisis de terminologías de salud para su utilización como ontologías computacionales en los sistemas de información clínicos Analysis of health terminologies for use as ontologies in healthcare information systems

    Directory of Open Access Journals (Sweden)

    Maria Teresa Romá-Ferri

    2008-10-01

    limitations imposed by standardized terms. The objective of this study was to establish the extent to which terminologies could be used for the design of ontologies, which could be serve as an aid to resolve problems such as semantic interoperability and knowledge reusability in healthcare information systems. Methods: To determine the extent to which terminologies could be used as ontologies, six of the most important terminologies in clinical, epidemiologic, documentation and administrative-economic contexts were analyzed. The following characteristics were verified: conceptual coverage, hierarchical structure, conceptual granularity of the categories, conceptual relations, and the language used for conceptual representation. Results: MeSH, DeCS and UMLS ontologies were considered lightweight. The main differences among these ontologies concern conceptual specification, the types of relation and the restrictions among the associated concepts. SNOMED and GALEN ontologies have declaratory formalism, based on logical descriptions. These ontologies include explicit qualities and show greater restrictions among associated concepts and rule combinations and were consequently considered as heavyweight. Conclusions: Analysis of the declared representation of the terminologies shows the extent to which they could be reused as ontologies. Their degree of usability depends on whether the aim is for healthcare information systems to solve problems of semantic interoperability (lightweight ontologies or to reuse the systems' knowledge as an aid to decision making (heavyweight ontologies and for non-structured information retrieval, extraction, and classification.

  19. GIFtS: annotation landscape analysis with GeneCards

    Directory of Open Access Journals (Sweden)

    Dalah Irina

    2009-10-01

    Full Text Available Abstract Background Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards® is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO, pathways, interactions, phenotypes, publications and many more. Results We present the GeneCards Inferred Functionality Score (GIFtS which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25 between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a

  20. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  1. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  2. dcGOR: an R package for analysing ontologies and protein domain annotations.

    Directory of Open Access Journals (Sweden)

    Hai Fang

    2014-10-01

    Full Text Available I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i domain-based enrichment analysis and visualisation; (ii construction of a domain (semantic similarity network according to ontology annotations; and (iii significance analysis for estimating a contact (statistical significance network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro and RNAs (from Rfam as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

  3. dcGOR: an R package for analysing ontologies and protein domain annotations.

    Science.gov (United States)

    Fang, Hai

    2014-10-01

    I introduce an open-source R package 'dcGOR' to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

  4. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta d...... classification systems and meta data taxonomies, should be based on ontologies.......What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...

  5. Identification of potential transcriptomic markers in developing ankylosing spondylitis: a meta-analysis of gene expression profiles.

    Science.gov (United States)

    Fang, Fang; Pan, Jian; Xu, Lixiao; Li, Gang; Wang, Jian

    2015-01-01

    The goal of this study was to identify potential transcriptomic markers in developing ankylosing spondylitis by a meta-analysis of multiple public microarray datasets. Using the INMEX (integrative meta-analysis of expression data) program, we performed the meta-analysis to identify consistently differentially expressed (DE) genes in ankylosing spondylitis and further performed functional interpretation (gene ontology analysis and pathway analysis) of the DE genes identified in the meta-analysis. Three microarray datasets (26 cases and 29 controls in total) were collected for meta-analysis. 905 consistently DE genes were identified in ankylosing spondylitis, among which 482 genes were upregulated and 423 genes were downregulated. The upregulated gene with the smallest combined rank product (RP) was GNG11 (combined RP=299.64). The downregulated gene with the smallest combined RP was S100P (combined RP=335.94). In the gene ontology (GO) analysis, the most significantly enriched GO term was "immune system process" (P=3.46×10(-26)). The most significant pathway identified in the pathway analysis was antigen processing and presentation (P=8.40×10(-5)). The consistently DE genes in ankylosing spondylitis and biological pathways associated with those DE genes identified provide valuable information for studying the pathophysiology of ankylosing spondylitis.

  6. Identification of Potential Transcriptomic Markers in Developing Ankylosing Spondylitis: A Meta-Analysis of Gene Expression Profiles

    Science.gov (United States)

    Fang, Fang; Pan, Jian; Xu, Lixiao; Li, Gang; Wang, Jian

    2015-01-01

    The goal of this study was to identify potential transcriptomic markers in developing ankylosing spondylitis by a meta-analysis of multiple public microarray datasets. Using the INMEX (integrative meta-analysis of expression data) program, we performed the meta-analysis to identify consistently differentially expressed (DE) genes in ankylosing spondylitis and further performed functional interpretation (gene ontology analysis and pathway analysis) of the DE genes identified in the meta-analysis. Three microarray datasets (26 cases and 29 controls in total) were collected for meta-analysis. 905 consistently DE genes were identified in ankylosing spondylitis, among which 482 genes were upregulated and 423 genes were downregulated. The upregulated gene with the smallest combined rank product (RP) was GNG11 (combined RP = 299.64). The downregulated gene with the smallest combined RP was S100P (combined RP = 335.94). In the gene ontology (GO) analysis, the most significantly enriched GO term was “immune system process” (P = 3.46 × 10−26). The most significant pathway identified in the pathway analysis was antigen processing and presentation (P = 8.40 × 10−5). The consistently DE genes in ankylosing spondylitis and biological pathways associated with those DE genes identified provide valuable information for studying the pathophysiology of ankylosing spondylitis. PMID:25688367

  7. Statistical and Ontological Analysis of Adverse Events Associated with Monovalent and Combination Vaccines against Hepatitis A and B Diseases

    Science.gov (United States)

    Xie, Jiangan; Zhao, Lili; Zhou, Shangbo; He, Yongqun

    2016-01-01

    Vaccinations often induce various adverse events (AEs), and sometimes serious AEs (SAEs). While many vaccines are used in combination, the effects of vaccine-vaccine interactions (VVIs) on vaccine AEs are rarely studied. In this study, AE profiles induced by hepatitis A vaccine (Havrix), hepatitis B vaccine (Engerix-B), and hepatitis A and B combination vaccine (Twinrix) were studied using the VAERS data. From May 2001 to January 2015, VAERS recorded 941, 3,885, and 1,624 AE case reports where patients aged at least 18 years old were vaccinated with only Havrix, Engerix-B, and Twinrix, respectively. Using these data, our statistical analysis identified 46, 69, and 82 AEs significantly associated with Havrix, Engerix-B, and Twinrix, respectively. Based on the Ontology of Adverse Events (OAE) hierarchical classification, these AEs were enriched in the AEs related to behavioral and neurological conditions, immune system, and investigation results. Twenty-nine AEs were classified as SAEs and mainly related to immune conditions. Using a logistic regression model accompanied with MCMC sampling, 13 AEs (e.g., hepatosplenomegaly) were identified to result from VVI synergistic effects. Classifications of these 13 AEs using OAE and MedDRA hierarchies confirmed the advantages of the OAE-based method over MedDRA in AE term hierarchical analysis. PMID:27694888

  8. Evolution of biomedical ontologies and mappings: Overview of recent approaches.

    Science.gov (United States)

    Groß, Anika; Pruski, Cédric; Rahm, Erhard

    2016-01-01

    Biomedical ontologies are heavily used to annotate data, and different ontologies are often interlinked by ontology mappings. These ontology-based mappings and annotations are used in many applications and analysis tasks. Since biomedical ontologies are continuously updated dependent artifacts can become outdated and need to undergo evolution as well. Hence there is a need for largely automated approaches to keep ontology-based mappings up-to-date in the presence of evolving ontologies. In this article, we survey current approaches and novel directions in the context of ontology and mapping evolution. We will discuss requirements for mapping adaptation and provide a comprehensive overview on existing approaches. We will further identify open challenges and outline ideas for future developments.

  9. A Posteriori Ontology Engineering for Data-Driven Science

    Energy Technology Data Exchange (ETDEWEB)

    Gessler, Damian Dg; Joslyn, Cliff A.; Verspoor, Karin M.

    2013-05-28

    Science—and biology in particular—has a rich tradition in categorical knowledge management. This continues today in the generation and use of formal ontologies. Unfortunately, the link between hard data and ontological content is predominately qualitative, not quantitative. The usual approach is to construct ontologies of qualitative concepts, and then annotate the data to the ontologies. This process has seen great value, yet it is laborious, and the success to which ontologies are managing and organizing the full information content of the data is uncertain. An alternative approach is the converse: use the data itself to quantitatively drive ontology creation. Under this model, one generates ontologies at the time they are needed, allowing them to change as more data influences both their topology and their concept space. We outline a combined approach to achieve this, taking advantage of two technologies, the mathematical approach of Formal Concept Analysis (FCA) and the semantic web technologies of the Web Ontology Language (OWL).

  10. ExAtlas: An interactive online tool for meta-analysis of gene expression data.

    Science.gov (United States)

    Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

    2015-12-01

    We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.

  11. Students' Ontological Security and Agency in Science Education—An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-09-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from the societal structures described by Giddens was used to structure the analysis. The results illustrate how the participating students used both modalities for 'Legitimation' and 'Domination' to justify positions that accept or reject new technology. The analysis also showed how norms and knowledge can be used to justify opposing positions in relation to building trust in science and technology, or in democratic decisions expected to favour personal norms. Here, students accepted or rejected the authority of experts based on perceptions of the knowledge base that the authority was seen to be anchored in. Difficulty in discerning between material risks (reduced safety) and immaterial risks (loss of norms) was also found. These outcomes are used to draw attention to the educational challenges associated with students' using knowledge claims (Domination) to support norms (Legitimation) and how this is related to the development of a sense of agency in terms of sharing norms with experts or with laymen.

  12. A Simple Strategy to Start Domain Ontology from Scratch

    Directory of Open Access Journals (Sweden)

    Ivo Wolff Gersberg

    2014-01-01

    Full Text Available Aiming the usage of Domain Ontology as an educational tool for neophyte students and focusing in a fast and easy way to start Domain Ontology from scratch, the semantics are set aside to identify contexts of concepts (terms to build the ontology. Text Mining, Link Analysis and Graph Analysis create an abstract rough sketch of interactions between terms. This first rough sketch is presented to the expert providing insights into and inspires him to inform or communicate knowledge, through assertive sentences. Those assertive sentences subsidize the creation of the ontology. A web prototype tool to visualize the ontology and retrieve book contents is also presented.

  13. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  14. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  15. Geo-Ontologies Are Scale Dependent

    Science.gov (United States)

    Frank, A. U.

    2009-04-01

    Philosophers aim at a single ontology that describes "how the world is"; for information systems we aim only at ontologies that describe a conceptualization of reality (Guarino 1995; Gruber 2005). A conceptualization of the world implies a spatial and temporal scale: what are the phenomena, the objects and the speed of their change? Few articles (Reitsma et al. 2003) seem to address that an ontology is scale specific (but many articles indicate that ontologies are scale-free in another sense namely that they are scale free in the link densities between concepts). The scale in the conceptualization can be linked to the observation process. The extent of the support of the physical observation instrument and the sampling theorem indicate what level of detail we find in a dataset. These rules apply for remote sensing or sensor networks alike. An ontology of observations must include scale or level of detail, and concepts derived from observations should carry this relation forward. A simple example: in high resolution remote sensing image agricultural plots and roads between them are shown, at lower resolution, only the plots and not the roads are visible. This gives two ontologies, one with plots and roads, the other with plots only. Note that a neighborhood relation in the two different ontologies also yield different results. References Gruber, T. (2005). "TagOntology - a way to agree on the semantics of tagging data." Retrieved October 29, 2005., from http://tomgruber.org/writing/tagontology-tagcapm-talk.pdf. Guarino, N. (1995). "Formal Ontology, Conceptual Analysis and Knowledge Representation." International Journal of Human and Computer Studies. Special Issue on Formal Ontology, Conceptual Analysis and Knowledge Representation, edited by N. Guarino and R. Poli 43(5/6). Reitsma, F. and T. Bittner (2003). Process, Hierarchy, and Scale. Spatial Information Theory. Cognitive and Computational Foundations of Geographic Information ScienceInternational Conference

  16. 基于知识本体的过程安全分析信息标准化%Standardized information for process hazard analysis based on ontology

    Institute of Scientific and Technical Information of China (English)

    吴重光; 许欣; 纳永良; 张卫华

    2012-01-01

    过程危险分析的主要目标是识别危险剧情.危险剧情能够表达团队“头脑风暴”安全评价过程也能表达评价结论.危险剧情的知识本体是标准化过程安全分析信息的准确描述.知识本体是概念表达的明确规范.依据设计知识本体所遵循的规则提出了一种过程安全分析信息标准化方法,称为剧情对象模型(scenario objectmodel,SOM).SOM能够表达安全分析信息的内容和结构,能够实施计算机自动推理和半定量计算.应用知识本体SOM有效实现了计算机辅助自动安全评价和安全信息的传递、复查和共享.%The principal objective of process hazard analysis is to identify hazard scenarios. Both the course of team brainstorming hazard evaluation and its result information can be expressed as hazard scenarios. Ontology of hazard scenarios is accurate expression of standardized process hazard analysis information. An ontology is an explicit specification of a conceptualization. According to design criteria for ontologies, a standardized process hazard analysis information called scenario object model (SOM) was proposed. SOM was used to represent contents and structures of hazard evaluation information. Computer automatic reasoning and semi-quantitative algorithms could be implemented on SOM. Computer-aided automatic hazard evaluation and transfer, auditing and sharing of safety information were realized effectively by using ontology SOM.

  17. The pathway ontology - updates and applications.

    Science.gov (United States)

    Petri, Victoria; Jayaraman, Pushkala; Tutaj, Marek; Hayman, G Thomas; Smith, Jennifer R; De Pons, Jeff; Laulederkind, Stanley Jf; Lowry, Timothy F; Nigam, Rajni; Wang, Shur-Jen; Shimoyama, Mary; Dwinell, Melinda R; Munzenmaier, Diane H; Worthey, Elizabeth A; Jacob, Howard J

    2014-02-05

    The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. The two released pipelines - the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD "Immune and Inflammatory Disease Portal" at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the 'infectious disease pathway' parent term category. The 'drug pathway' node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by over 75%. Ongoing development of

  18. The pathway ontology – updates and applications

    Science.gov (United States)

    2014-01-01

    Background The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. Results The two released pipelines – the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD “Immune and Inflammatory Disease Portal” at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the ‘infectious disease pathway’ parent term category. The ‘drug pathway’ node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by

  19. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    Science.gov (United States)

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  1. Video Content Analysis Framework Based on Concept Ontology%基于概念本体的视频内容分析框架

    Institute of Scientific and Technical Information of China (English)

    张良; 周长胜

    2011-01-01

    In order to solve the video structure problem of semantic analysis,created a video content analysis framework based on concept ontology. The video is divided by the concept ontology,it summarized the concept of four typical video;category concept,object concepts, property concepts and emotional concepts. It also discusses the relationship between various concepts :instance-of,anribute-of, kind -of ,part-of. The video structured by concept ontology consistent with human cognitive processes and cognitive rules of things. The concept ontology as the center can establish good contact between the underlying feature vector and high-level semantic, to the easy mapping ,organization and processing.%为解决视频内容分析过程中对视频组织结构划分的问题,建立了一个基于概念本体的视频分析模型,将视频以概念本体的方式进行划分,归纳出视频中四种典型的概念:类别概念、对象概念、属性概念以及情感概念.论述了各类概念之间的关系:实例关系、属性关系、归类关系、组成关系.采用人工定义规则与学习方法相结合的方法,实现对视频概念本体的识别和划分.以概念本体为中心划分视频结构,符合人类认知事物过程和认知规律,便于将底层特征向量与高层语义进行映射、组织和处理.

  2. Conservativity Principle Violations for Ontology Alignment: Survey and Trends

    Directory of Open Access Journals (Sweden)

    Yahia Atig

    2016-07-01

    Full Text Available Ontology matching techniques are a solution to overcome the problem of interoperability between ontologies. However, the generated mappings suffer from logical defects that influence their usefulness. In this paper we present a detailed analysis of the problem socalled conservativity principle; alignment between ontologies should never generate new knowledge compared to those generated by reasoning solely on ontologies. We also study the sub-problems; Ontology change and Satisfiability preservation problems and compare the related works and their way to detect and repair conservativity principle. At the end we present a set of open research issues

  3. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....

  4. ErmineJ: Tool for functional analysis of gene expression data sets

    Directory of Open Access Journals (Sweden)

    Braynen William

    2005-11-01

    Full Text Available Abstract Background It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported. Results We introduce ErmineJ, a multiplatform user-friendly stand-alone software tool for the analysis of functionally-relevant sets of genes in the context of microarray gene expression data. ErmineJ implements multiple algorithms for gene set analysis, including over-representation and resampling-based methods that focus on gene scores or correlation of gene expression profiles. In addition to a graphical user interface, ErmineJ has a command line interface and an application programming interface that can be used to automate analyses. The graphical user interface includes tools for creating and modifying gene sets, visualizing the Gene Ontology as a table or tree, and visualizing gene expression data. ErmineJ comes with a complete user manual, and is open-source software licensed under the Gnu Public License. Conclusion The availability of multiple analysis algorithms, together with a rich feature set and simple graphical interface, should make ErmineJ a useful addition to the biologist's informatics toolbox. ErmineJ is available from http://microarray.cu.genome.org.

  5. SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING

    Directory of Open Access Journals (Sweden)

    Siham AMROUCH

    2013-11-01

    Full Text Available In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used to face the great challenge of representing the semantics of data, in order to bring the actual web to its full power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is it self based on the combination of lexical and semantic similarity measures.

  6. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome

    OpenAIRE

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2015-01-01

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈7–10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate ...

  7. Gene set analysis for GWAS

    DEFF Research Database (Denmark)

    Debrabant, Birgit; Soerensen, Mette

    2014-01-01

    Abstract We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic...... parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example....

  8. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

      This thesis entitled "Function analysis of unknown genes" presents the use of proteome analysis for the characterisation of yeast (Saccharomyces cerevisiae) genes and their products (proteins especially those of unknown function). This study illustrates that proteome analysis can be used...... to describe different aspects of molecular biology of the cell, to study changes that occur in the cell due to overexpression or deletion of a gene and to identify various protein modifications. The biological questions and the results of the described studies show the diversity of the information that can...... genes and proteins. It reports the first global proteome database collecting 36 yeast single gene deletion mutants and selecting over 650 differences between analysed mutants and the wild type strain. The obtained results show that two-dimensional gel electrophoresis and mass spectrometry based proteome...

  9. Exploring biomedical ontology mappings with graph theory methods.

    Science.gov (United States)

    Kocbek, Simon; Kim, Jin-Dong

    2017-01-01

    In the era of semantic web, life science ontologies play an important role in tasks such as annotating biological objects, linking relevant data pieces, and verifying data consistency. Understanding ontology structures and overlapping ontologies is essential for tasks such as ontology reuse and development. We present an exploratory study where we examine structure and look for patterns in BioPortal, a comprehensive publicly available repository of live science ontologies. We report an analysis of biomedical ontology mapping data over time. We apply graph theory methods such as Modularity Analysis and Betweenness Centrality to analyse data gathered at five different time points. We identify communities, i.e., sets of overlapping ontologies, and define similar and closest communities. We demonstrate evolution of identified communities over time and identify core ontologies of the closest communities. We use BioPortal project and category data to measure community coherence. We also validate identified communities with their mutual mentions in scientific literature. With comparing mapping data gathered at five different time points, we identified similar and closest communities of overlapping ontologies, and demonstrated evolution of communities over time. Results showed that anatomy and health ontologies tend to form more isolated communities compared to other categories. We also showed that communities contain all or the majority of ontologies being used in narrower projects. In addition, we identified major changes in mapping data after migration to BioPortal Version 4.

  10. A Survey on how Description Logic Ontologies Benefit from Formal Concept Analysis

    CERN Document Server

    Sertkaya, Baris

    2011-01-01

    Although the notion of a concept as a collection of objects sharing certain properties, and the notion of a conceptual hierarchy are fundamental to both Formal Concept Analysis and Description Logics, the ways concepts are described and obtained differ significantly between these two research areas. Despite these differences, there have been several attempts to bridge the gap between these two formalisms, and attempts to apply methods from one field in the other. The present work aims to give an overview on the research done in combining Description Logics and Formal Concept Analysis.

  11. Cosmological Ontology and Epistemology

    CERN Document Server

    Page, Don N

    2014-01-01

    In cosmology, we would like to explain our observations and predict future observations from theories of the entire universe. Such cosmological theories make ontological assumptions of what entities exist and what their properties and relationships are. One must also make epistemological assumptions or metatheories of how one can test cosmological theories. Here I shall propose a Bayesian analysis in which the likelihood of a complete theory is given by the normalized measure it assigns to the observation used to test the theory. In this context, a discussion is given of the trade-off between prior probabilities and likelihoods, of the measure problem of cosmology, of the death of Born's rule, of the Boltzmann brain problem, of whether there is a better principle for prior probabilities than mathematical simplicity, and of an Optimal Argument for the Existence of God.

  12. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  13. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO Cellular Component curation

    Directory of Open Access Journals (Sweden)

    Chan Juancarlos

    2009-07-01

    Full Text Available Abstract Background Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%, when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%. From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%. Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given

  14. An Ontological and Epistemological Analysis of the Presentation of the First Law of Thermodynamics in School and University Textbooks

    Science.gov (United States)

    Poblete, Joaquin Castillo; Rojas, Rocio Ogaz; Merino, Cristian; Quiroz, Waldo

    2016-01-01

    Considering the relevance of thermodynamics to the scientific discipline of chemistry and the curriculum of the Western school system, the philosophical system of Mario Bunge, particularly his ontology and epistemology, is used herein to analyze the presentation of the first law of thermodynamics in 15 school and university textbooks. The…

  15. Research on the complex network of the UNSPSC ontology

    Science.gov (United States)

    Xu, Yingying; Zou, Shengrong; Gu, Aihua; Wei, Li; Zhou, Ta

    The UNSPSC ontology mainly applies to the classification system of the e-business and governments buying the worldwide products and services, and supports the logic structure of classification of the products and services. In this paper, the related technologies of the complex network were applied to analyzing the structure of the ontology. The concept of the ontology was corresponding to the node of the complex network, and the relationship of the ontology concept was corresponding to the edge of the complex network. With existing methods of analysis and performance indicators in the complex network, analyzing the degree distribution and community of the ontology, and the research will help evaluate the concept of the ontology, classify the concept of the ontology and improve the efficiency of semantic matching.

  16. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  17. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  18. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  19. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  20. Statistical mechanics of ontology based annotations

    CERN Document Server

    Hoyle, David C

    2016-01-01

    We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotate...

  1. COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance.

    Science.gov (United States)

    Cui, Licong

    Biomedical ontologies play a vital role in healthcare information management, data integration, and decision support. Ontology quality assurance (OQA) is an indispensable part of the ontology engineering cycle. Most existing OQA methods are based on the knowledge provided within the targeted ontology. This paper proposes a novel cross-ontology analysis method, Cross-Ontology Hierarchical Relation Examination (COHeRE), to detect inconsistencies and possible errors in hierarchical relations across multiple ontologies. COHeRE leverages the Unified Medical Language System (UMLS) knowledge source and the MapReduce cloud computing technique for systematic, large-scale ontology quality assurance work. COHeRE consists of three main steps with the UMLS concepts and relations as the input. First, the relations claimed in source vocabularies are filtered and aggregated for each pair of concepts. Second, inconsistent relations are detected if a concept pair is related by different types of relations in different source vocabularies. Finally, the uncovered inconsistent relations are voted according to their number of occurrences across different source vocabularies. The voting result together with the inconsistent relations serve as the output of COHeRE for possible ontological change. The highest votes provide initial suggestion on how such inconsistencies might be fixed. In UMLS, 138,987 concept pairs were found to have inconsistent relationships across multiple source vocabularies. 40 inconsistent concept pairs involving hierarchical relationships were randomly selected and manually reviewed by a human expert. 95.8% of the inconsistent relations involved in these concept pairs indeed exist in their source vocabularies rather than being introduced by mistake in the UMLS integration process. 73.7% of the concept pairs with suggested relationship were agreed by the human expert. The effectiveness of COHeRE indicates that UMLS provides a promising environment to enhance

  2. Students' Ontological Security and Agency in Science Education--An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-01-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from…

  3. Using Ontology Fingerprints to evaluate genome-wide association study results

    OpenAIRE

    Tsoi, Lam C.; Michael Boehnke; Klein, Richard L.; Jim Zheng, W.

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  4. Ayurveda research: Ontological challenges.

    Science.gov (United States)

    Nayak, Jayakrishna

    2012-01-01

    Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the 'ASIIA (A Science Initiative In Ayurveda)' projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  5. Ayurveda research: Ontological challenges

    Directory of Open Access Journals (Sweden)

    Jayakrishna Nayak

    2012-01-01

    Full Text Available Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the ′ASIIA (A Science Initiative In Ayurveda′ projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  6. Gene family level comparative analysis of gene expression in mammals validates the ortholog conjecture.

    Science.gov (United States)

    Rogozin, Igor B; Managadze, David; Shabalina, Svetlana A; Koonin, Eugene V

    2014-04-01

    The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank's, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.

  7. Chromatin analysis of occluded genes

    Science.gov (United States)

    Lee, Jae Hyun; Gaetz, Jedidiah; Bugarija, Branimir; Fernandes, Croydon J.; Snyder, Gregory E.; Bush, Eliot C.; Lahn, Bruce T.

    2009-01-01

    We recently described two opposing states of transcriptional competency. One is termed ‘competent’ whereby a gene is capable of responding to trans-acting transcription factors of the cell, such that it is active if appropriate transcriptional activators are present, though it can also be silent if activators are absent or repressors are present. The other is termed ‘occluded’ whereby a gene is silenced by cis-acting, chromatin-based mechanisms in a manner that blocks it from responding to trans-acting factors, such that it is silent even when activators are present in the cellular milieu. We proposed that gene occlusion is a mechanism by which differentiated cells stably maintain their phenotypic identities. Here, we describe chromatin analysis of occluded genes. We found that DNA methylation plays a causal role in maintaining occlusion for a subset of occluded genes. We further examined a variety of other chromatin marks typically associated with transcriptional silencing, including histone variants, covalent histone modifications and chromatin-associated proteins. Surprisingly, we found that although many of these marks are robustly linked to silent genes (which include both occluded genes and genes that are competent but silent), none is linked specifically to occluded genes. Although the observation does not rule out a possible causal role of these chromatin marks in occlusion, it does suggest that these marks might be secondary effect rather than primary cause of the silent state in many genes. PMID:19380460

  8. Practical ontologies for information professionals

    CERN Document Server

    AUTHOR|(CDS)2071712

    2016-01-01

    Practical Ontologies for Information Professionals provides an introduction to ontologies and their development, an essential tool for fighting back against information overload. The development of robust and widely used ontologies is an increasingly important tool in the fight against information overload. The publishing and sharing of explicit explanations for a wide variety of conceptualizations, in a machine readable format, has the power to both improve information retrieval and identify new knowledge. This new book provides an accessible introduction to the following: * What is an ontology? Defining the concept and why it is increasingly important to the information professional * Ontologies and the semantic web * Existing ontologies, such as SKOS, OWL, FOAF, schema.org, and the DBpedia Ontology * Adopting and building ontologies, showing how to avoid repetition of work and how to build a simple ontology with Protege * Interrogating semantic web ontologies * The future of ontologies and the role of the ...

  9. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms

    Directory of Open Access Journals (Sweden)

    Yang Xiang

    2015-01-01

    Full Text Available The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.

  10. Modular Ontology Techniques and their Applications in the Biomedical Domain.

    Science.gov (United States)

    Pathak, Jyotishman; Johnson, Thomas M; Chute, Christopher G

    2008-08-05

    In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing "modules" of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.

  11. Quality control for terms and definitions in ontologies and taxonomies

    Directory of Open Access Journals (Sweden)

    Rüegg Alexander

    2006-04-01

    Full Text Available Abstract Background Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO, the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. Results We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. Conclusion Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.

  12. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    Directory of Open Access Journals (Sweden)

    Robert Hoehndorf

    Full Text Available Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  13. Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base

    Directory of Open Access Journals (Sweden)

    Smith Barry

    2010-06-01

    Full Text Available Abstract Background The Salivaomics Knowledge Base (SKB is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1 the saliva biomarker discovery and validation resources at UCLA with (2 the ontology resources developed by the OBO (Open Biomedical Ontologies Foundry, including a new Saliva Ontology (SALO. Results We define the Saliva Ontology (SALO; http://www.skb.ucla.edu/SALO/ as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics domain and to saliva-related diagnostics following the principles of the OBO (Open Biomedical Ontologies Foundry. Conclusions The Saliva Ontology is an ongoing exploratory initiative. The ontology will be used to facilitate salivaomics data retrieval and integration across multiple fields of research together with data analysis and data mining. The ontology will be tested through its ability to serve the annotation ('tagging' of a representative corpus of salivaomics research literature that is to be incorporated into the SKB.

  14. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    Science.gov (United States)

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V

    2011-01-01

    Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  15. Ontological foundations for evolutionary economics: A Darwinian social ontology

    NARCIS (Netherlands)

    J.W. Stoelhorst

    2008-01-01

    The purpose of this paper is to further the project of generalized Darwinism by developing a social ontology on the basis of a combined commitment to ontological continuity and ontological commonality. Three issues that are central to the development of a social ontology are addressed: (1) the speci

  16. Multi-species Ontologies of the Craniofacial Musculoskeletal System

    Science.gov (United States)

    Mejino, Jose L.V.; Detwiler, Landon T.; Cox, Timothy C.; Brinkley, James F.

    2017-01-01

    We created the Ontology of Craniofacial Development and Malformation (OCDM) [1] to provide a unifying framework for organizing and integrating craniofacial data ranging from genes to clinical phenotypes from multi-species. Within this framework we focused on spatio-structural representation of anatomical entities related to craniofacial development and malformation, such as craniosynostosis and midface hypoplasia. Animal models are used to support human studies and so we built multi-species ontologies that would allow for cross-species correlation of anatomical information. For this purpose we first developed and enhanced the craniofacial component of the human musculoskeletal system in the Foundational Model of Anatomy Ontology (FMA)[2], and then imported this component, which we call the Craniofacial Human Ontology (CHO), into the OCDM. The CHO was then used as a template to create the anatomy for the mouse, the Craniofacial Mouse Ontology (CMO) as well as for the zebrafish, the Craniofacial Zebrafish Ontology (CZO).

  17. [ ] Toward an Ontology of Finitude

    Directory of Open Access Journals (Sweden)

    Julia Hölzl

    2011-09-01

    Full Text Available Hölzl palpates an ontology of fracture. Unlike original ontologies that are concerned with essence rather than being, the ontology proposed here does not believe in its originality. This project is concerned with becoming as such rather than with its Wesen. With the indefinite striving for remaining in itself. This ontology is a fissure, fissuring itself.

  18. Perspectives on ontology learning

    CERN Document Server

    Lehmann, J

    2014-01-01

    Perspectives on Ontology Learning brings together researchers and practitioners from different communities − natural language processing, machine learning, and the semantic web − in order to give an interdisciplinary overview of recent advances in ontology learning.Starting with a comprehensive introduction to the theoretical foundations of ontology learning methods, the edited volume presents the state-of-the-start in automated knowledge acquisition and maintenance. It outlines future challenges in this area with a special focus on technologies suitable for pushing the boundaries beyond the c

  19. The sexual and ontology

    Directory of Open Access Journals (Sweden)

    Zupančič Alenka

    2014-01-01

    Full Text Available This paper explores some of the crucial ontological implications of the psychoanalytic theory of sexuality in its Freudo-Lacanian orientation. As irreducible to different sexual practices and contents, the concept of sexuality obtains conceptual weight that makes it particularly relevant for philosophical ontological thinking. Starting from the hypothesis that something about sexuality is constitutively unconscious - that is to say, existing only in the form of the unconscious - the paper points at the singular short-circuit of the epistemological and ontological level which is at work in psychoanalytic theory, and which cannot be neglected in philosophical examination of the relation between knowledge and being.

  20. Biclustering methods: biological relevance and application in gene expression analysis.

    Directory of Open Access Journals (Sweden)

    Ali Oghabian

    Full Text Available DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering methods where genes (or respectively samples are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes. An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1 we examine how well the considered (biclustering methods differentiate various sample types; (2 we evaluate how well the groups of genes discovered by the (biclustering methods are annotated with similar Gene Ontology categories; (3 we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4 we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.

  1. Towards automated biomedical ontology harmonization.

    Science.gov (United States)

    Uribe, Gustavo A; Lopez, Diego M; Blobel, Bernd

    2014-01-01

    The use of biomedical ontologies is increasing, especially in the context of health systems interoperability. Ontologies are key pieces to understand the semantics of information exchanged. However, given the diversity of biomedical ontologies, it is essential to develop tools that support harmonization processes amongst them. Several algorithms and tools are proposed by computer scientist for partially supporting ontology harmonization. However, these tools face several problems, especially in the biomedical domain where ontologies are large and complex. In the harmonization process, matching is a basic task. This paper explains the different ontology harmonization processes, analyzes existing matching tools, and proposes a prototype of an ontology harmonization service. The results demonstrate that there are many open issues in the field of biomedical ontology harmonization, such as: overcoming structural discrepancies between ontologies; the lack of semantic algorithms to automate the process; the low matching efficiency of existing algorithms; and the use of domain and top level ontologies in the matching process.

  2. Identification of candidate genes in osteoporosis by integrated microarray analysis

    Science.gov (United States)

    Li, J. J.; Wang, B. Q.; Yang, Y.; Li, D.

    2016-01-01

    Objectives In order to screen the altered gene expression profile in peripheral blood mononuclear cells of patients with osteoporosis, we performed an integrated analysis of the online microarray studies of osteoporosis. Methods We searched the Gene Expression Omnibus (GEO) database for microarray studies of peripheral blood mononuclear cells in patients with osteoporosis. Subsequently, we integrated gene expression data sets from multiple microarray studies to obtain differentially expressed genes (DEGs) between patients with osteoporosis and normal controls. Gene function analysis was performed to uncover the functions of identified DEGs. Results A total of three microarray studies were selected for integrated analysis. In all, 1125 genes were found to be significantly differentially expressed between osteoporosis patients and normal controls, with 373 upregulated and 752 downregulated genes. Positive regulation of the cellular amino metabolic process (gene ontology (GO): 0033240, false discovery rate (FDR) = 1.00E + 00) was significantly enriched under the GO category for biological processes, while for molecular functions, flavin adenine dinucleotide binding (GO: 0050660, FDR = 3.66E-01) and androgen receptor binding (GO: 0050681, FDR = 6.35E-01) were significantly enriched. DEGs were enriched in many osteoporosis-related signalling pathways, including those of mitogen-activated protein kinase (MAPK) and calcium. Protein-protein interaction (PPI) network analysis showed that the significant hub proteins contained ubiquitin specific peptidase 9, X-linked (Degree = 99), ubiquitin specific peptidase 19 (Degree = 57) and ubiquitin conjugating enzyme E2 B (Degree = 57). Conclusion Analysis of gene function of identified differentially expressed genes may expand our understanding of fundamental mechanisms leading to osteoporosis. Moreover, significantly enriched pathways, such as MAPK and calcium, may involve in osteoporosis through osteoblastic differentiation and

  3. Exercise-associated DNA methylation change in skeletal muscle and the importance of imprinted genes: a bioinformatics meta-analysis.

    Science.gov (United States)

    Brown, William M

    2015-12-01

    Epigenetics is the study of processes--beyond DNA sequence alteration--producing heritable characteristics. For example, DNA methylation modifies gene expression without altering the nucleotide sequence. A well-studied DNA methylation-based phenomenon is genomic imprinting (ie, genotype-independent parent-of-origin effects). We aimed to elucidate: (1) the effect of exercise on DNA methylation and (2) the role of imprinted genes in skeletal muscle gene networks (ie, gene group functional profiling analyses). Gene ontology (ie, gene product elucidation)/meta-analysis. 26 skeletal muscle and 86 imprinted genes were subjected to g:Profiler ontology analysis. Meta-analysis assessed exercise-associated DNA methylation change. g:Profiler found four muscle gene networks with imprinted loci. Meta-analysis identified 16 articles (387 genes/1580 individuals) associated with exercise. Age, method, sample size, sex and tissue variation could elevate effect size bias. Only skeletal muscle gene networks including imprinted genes were reported. Exercise-associated effect sizes were calculated by gene. Age, method, sample size, sex and tissue variation were moderators. Six imprinted loci (RB1, MEG3, UBE3A, PLAGL1, SGCE, INS) were important for muscle gene networks, while meta-analysis uncovered five exercise-associated imprinted loci (KCNQ1, MEG3, GRB10, L3MBTL1, PLAGL1). DNA methylation decreased with exercise (60% of loci). Exercise-associated DNA methylation change was stronger among older people (ie, age accounted for 30% of the variation). Among older people, genes exhibiting DNA methylation decreases were part of a microRNA-regulated gene network functioning to suppress cancer. Imprinted genes were identified in skeletal muscle gene networks and exercise-associated DNA methylation change. Exercise-associated DNA methylation modification could rewind the 'epigenetic clock' as we age. CRD42014009800. Published by the BMJ Publishing Group Limited. For permission to use (where

  4. Digital gene expression analysis of Microsporum canis exposed to berberine chloride.

    Directory of Open Access Journals (Sweden)

    Chen-Wen Xiao

    Full Text Available Berberine, a natural isoquinoline alkaloid of many medicinal herbs, has an active function against a variety of microbial infections including Microsporum canis (M. canis. However, the underlying mechanisms are poorly understood. To study the effect of berberine chloride on M. canis infection, a Digital Gene Expression (DGE tag profiling was constructed and a transcriptome analysis of the M. canis cellular responses upon berberine treatment was performed. Illumina/Hisseq sequencing technique was used to generate the data of gene expression profile, and the following enrichment analysis of Gene Ontology (GO and Pathway function were conducted based on the data of transcriptome. The results of DGE showed that there were 8476945, 14256722, 7708575, 5669955, 6565513 and 9303468 tags respectively, which was obtained from M. canis incubated with berberine or control DMSO. 8,783 genes were totally mapped, and 1,890 genes have shown significant changes between the two groups. 1,030 genes were up-regulated and 860 genes were down-regulated (P<0.05 in berberine treated group compared to the control group. Besides, twenty-three GO terms were identified by Gene Ontology functional enrichment analysis, such as calcium-transporting ATPase activity, 2-oxoglutarate metabolic process, valine catabolic process, peroxisome and unfolded protein binding. Pathway significant enrichment analysis indicated 6 signaling pathways that are significant, including steroid biosynthesis, steroid hormone biosynthesis, Parkinson's disease, 2,4-Dichlorobenzoate degradation, and tropane, piperidine and Isoquinoline alkaloid biosynthesis. Among these, eleven selected genes were further verified by qRT-PCR. Our findings provide a comprehensive view on the gene expression profile of M. canis upon berberine treatment, and shed light on its complicated effects on M. canis.

  5. The Usability-Error Ontology

    DEFF Research Database (Denmark)

    2013-01-01

    ability to do systematic reviews and meta-analyses. In an effort to support improved and more interoperable data capture regarding Usability Errors, we have created the Usability Error Ontology (UEO) as a classification method for representing knowledge regarding Usability Errors. We expect the UEO...... in patients coming to harm. Often the root cause analysis of these adverse events can be traced back to Usability Errors in the Health Information Technology (HIT) or its interaction with users. Interoperability of the documentation of HIT related Usability Errors in a consistent fashion can improve our...... will grow over time to support an increasing number of HIT system types. In this manuscript, we present this Ontology of Usability Error Types and specifically address Computerized Physician Order Entry (CPOE), Electronic Health Records (EHR) and Revenue Cycle HIT systems....

  6. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  7. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  8. Ontologies for Bioinformatics

    Directory of Open Access Journals (Sweden)

    Agnieszka Leszczynski

    2008-01-01

    Full Text Available The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.

  9. Mechanisms in biomedical ontology

    National Research Council Canada - National Science Library

    Röhl, Johannes

    2012-01-01

    .... Taking some hints from an "ontology of devices" I suggest as a general approach for this task the introduction of functional kinds and functional parts by which the particular relations between a mechanism and its components can be captured.

  10. Manufacturing ontology through templates

    Directory of Open Access Journals (Sweden)

    Diciuc Vlad

    2017-01-01

    Full Text Available The manufacturing industry contains a high volume of knowhow and of high value, much of it being held by key persons in the company. The passing of this know-how is the basis of manufacturing ontology. Among other methods like advanced filtering and algorithm based decision making, one way of handling the manufacturing ontology is via templates. The current paper tackles this approach and highlights the advantages concluding with some recommendations.

  11. Ontology alignment with OLA

    OpenAIRE

    Euzenat, Jérôme; Loup, David; Touzani, Mohamed; Valtchev, Petko

    2004-01-01

    euzenat2004d; International audience; Using ontologies is the standard way to achieve interoperability of heterogeneous systems within the Semantic web. However, as the ontologies underlying two systems are not necessarily compatible, they may in turn need to be aligned. Similarity-based approaches to alignment seems to be both powerful and flexible enough to match the expressive power of languages like OWL. We present an alignment tool that follows the similarity-based paradigm, called OLA. ...

  12. Ontology Usage at ZFIN

    CERN Document Server

    Howe, Doug

    2010-01-01

    The Zebrafish Model Organism Database (ZFIN) provides a Web resource of zebrafish genomic, genetic, developmental, and phenotypic data. Four different ontologies are currently used to annotate data to the most specific term available facilitating a better comparison between inter-species data. In addition, ontologies are used to help users find and cluster data more quickly without the need of knowing the exact technical name for a term.

  13. Applications of ontology design patterns in biomedical ontologies.

    Science.gov (United States)

    Mortensen, Jonathan M; Horridge, Matthew; Musen, Mark A; Noy, Natalya F

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited.

  14. Applications of Ontology Design Patterns in Biomedical Ontologies

    Science.gov (United States)

    Mortensen, Jonathan M.; Horridge, Matthew; Musen, Mark A.; Noy, Natalya F.

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited. PMID:23304337

  15. Transcriptome analysis reveals key differentially expressed genes involved in wheat grain development

    Institute of Scientific and Technical Information of China (English)

    Yonglong Yu; Dong Zhu; Chaoying Ma; Hui Cao; Yaping Wang; Yanhao Xu; Wenying Zhang; Yueming Yan

    2016-01-01

    Wheat seed development is an important physiological process of seed maturation and directly affects wheat yield and quality. In this study, we performed dynamic transcriptome microarray analysis of an elite Chinese bread wheat cultivar (Jimai 20) during grain development using the GeneChip Wheat Genome Array. Grain morphology and scanning electron microscope observations showed that the period of 11–15 days post-anthesis (DPA) was a key stage for the synthesis and accumulation of seed starch. Genome-wide transcriptional profiling and significance analysis of microarrays revealed that the period from 11 to 15 DPA was more important than the 15–20 DPA stage for the synthesis and accumulation of nutritive reserves. Series test of cluster analysis of differential genes revealed five statistically significant gene expression profiles. Gene ontology annotation and enrichment analysis gave further informa-tion about differentially expressed genes