WorldWideScience

Sample records for gene ontology functional

  1. Defining functional distances over Gene Ontology

    Directory of Open Access Journals (Sweden)

    del Pozo Angela

    2008-01-01

    Full Text Available Abstract Background A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-. However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. Results We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model Df which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. Conclusion The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.

  2. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

    Science.gov (United States)

    Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

    2018-02-23

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Gene Ontology

    Directory of Open Access Journals (Sweden)

    Gaston K. Mazandu

    2012-01-01

    Full Text Available The wide coverage and biological relevance of the Gene Ontology (GO, confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.

  4. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  5. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  6. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  7. A new measure for functional similarity of gene products based on Gene Ontology

    Directory of Open Access Journals (Sweden)

    Lengauer Thomas

    2006-06-01

    Full Text Available Abstract Background Gene Ontology (GO is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role. Results We present a new method for comparing sets of GO terms and for assessing the functional similarity of gene products. The method relies on two semantic similarity measures; simRel and funSim. One measure (simRel is applied in the comparison of the biological processes found in different groups of organisms. The other measure (funSim is used to find functionally related gene products within the same or between different genomes. Results indicate that the method, in addition to being in good agreement with established sequence similarity approaches, also provides a means for the identification of functionally related proteins independent of evolutionary relationships. The method is also applied to estimating functional similarity between all proteins in Saccharomyces cerevisiae and to visualizing the molecular function space of yeast in a map of the functional space. A similar approach is used to visualize the functional relationships between protein families. Conclusion The approach enables the comparison of the underlying molecular biology of different taxonomic groups and provides a new comparative genomics tool identifying functionally related gene products independent of homology. The proposed map of the functional space provides a new global view on the functional relationships between gene products or protein families.

  8. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  9. A multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors for functional gene analysis.

    Science.gov (United States)

    Weber, Kristoffer; Bartsch, Udo; Stocking, Carol; Fehse, Boris

    2008-04-01

    Functional gene analysis requires the possibility of overexpression, as well as downregulation of one, or ideally several, potentially interacting genes. Lentiviral vectors are well suited for this purpose as they ensure stable expression of complementary DNAs (cDNAs), as well as short-hairpin RNAs (shRNAs), and can efficiently transduce a wide spectrum of cell targets when packaged within the coat proteins of other viruses. Here we introduce a multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors designed according to the "building blocks" principle. Using a wide spectrum of different fluorescent markers, including drug-selectable enhanced green fluorescent protein (eGFP)- and dTomato-blasticidin-S resistance fusion proteins, LeGO vectors allow simultaneous analysis of multiple genes and shRNAs of interest within single, easily identifiable cells. Furthermore, each functional module is flanked by unique cloning sites, ensuring flexibility and individual optimization. The efficacy of these vectors for analyzing multiple genes in a single cell was demonstrated in several different cell types, including hematopoietic, endothelial, and neural stem and progenitor cells, as well as hepatocytes. LeGO vectors thus represent a valuable tool for investigating gene networks using conditional ectopic expression and knock-down approaches simultaneously.

  10. Gene Ontology Consortium: going forward.

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

    Science.gov (United States)

    Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J

    2016-02-01

    Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    Science.gov (United States)

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  13. Exploring autophagy with Gene Ontology

    Science.gov (United States)

    2018-01-01

    ABSTRACT Autophagy is a fundamental cellular process that is well conserved among eukaryotes. It is one of the strategies that cells use to catabolize substances in a controlled way. Autophagy is used for recycling cellular components, responding to cellular stresses and ridding cells of foreign material. Perturbations in autophagy have been implicated in a number of pathological conditions such as neurodegeneration, cardiac disease and cancer. The growing knowledge about autophagic mechanisms needs to be collected in a computable and shareable format to allow its use in data representation and interpretation. The Gene Ontology (GO) is a freely available resource that describes how and where gene products function in biological systems. It consists of 3 interrelated structured vocabularies that outline what gene products do at the biochemical level, where they act in a cell and the overall biological objectives to which their actions contribute. It also consists of ‘annotations’ that associate gene products with the terms. Here we describe how we represent autophagy in GO, how we create and define terms relevant to autophagy researchers and how we interrelate those terms to generate a coherent view of the process, therefore allowing an interoperable description of its biological aspects. We also describe how annotation of gene products with GO terms improves data analysis and interpretation, hence bringing a significant benefit to this field of study. PMID:29455577

  14. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  15. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community. PMID:24093723

  16. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.

    Directory of Open Access Journals (Sweden)

    Paul D Thomas

    Full Text Available A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011 has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis". First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1 that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function, and 2 that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the

  17. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.

    Science.gov (United States)

    Falda, Marco; Toppo, Stefano; Pescarolo, Alessandro; Lavezzo, Enrico; Di Camillo, Barbara; Facchinetti, Andrea; Cilia, Elisa; Velasco, Riccardo; Fontana, Paolo

    2012-03-28

    Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be easily exploited by computational methods. Argot2 is a web-based function prediction tool able to annotate nucleic or protein sequences from small datasets up to entire genomes. It accepts as input a list of sequences in FASTA format, which are processed using BLAST and HMMER searches vs UniProKB and Pfam databases respectively; these sequences are then annotated with GO terms retrieved from the UniProtKB-GOA database and the terms are weighted using the e-values from BLAST and HMMER. The weighted GO terms are processed according to both their semantic similarity relations described by the Gene Ontology and their associated score. The algorithm is based on the original idea developed in a previous tool called Argot. The entire engine has been completely rewritten to improve both accuracy and computational efficiency, thus allowing for the annotation of complete genomes. The revised algorithm has been already employed and successfully tested during in-house genome projects of grape and apple, and has proven to have a high precision and recall in all our benchmark conditions. It has also been successfully compared with Blast2GO, one of the methods most commonly employed for sequence annotation. The server is freely accessible at http://www.medcomp.medicina.unipd.it/Argot2.

  18. Fast gene ontology based clustering for microarray experiments.

    Science.gov (United States)

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  19. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-10-01

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  20. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    Science.gov (United States)

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  1. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  2. Fast Gene Ontology based clustering for microarray experiments

    Directory of Open Access Journals (Sweden)

    Ovaska Kristian

    2008-11-01

    Full Text Available Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Conclusion Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  3. [Key effect genes responding to nerve injury identified by gene ontology and computer pattern recognition].

    Science.gov (United States)

    Pan, Qian; Peng, Jin; Zhou, Xue; Yang, Hao; Zhang, Wei

    2012-07-01

    In order to screen out important genes from large gene data of gene microarray after nerve injury, we combine gene ontology (GO) method and computer pattern recognition technology to find key genes responding to nerve injury, and then verify one of these screened-out genes. Data mining and gene ontology analysis of gene chip data GSE26350 was carried out through MATLAB software. Cd44 was selected from screened-out key gene molecular spectrum by comparing genes' different GO terms and positions on score map of principal component. Function interferences were employed to influence the normal binding of Cd44 and one of its ligands, chondroitin sulfate C (CSC), to observe neurite extension. Gene ontology analysis showed that the first genes on score map (marked by red *) mainly distributed in molecular transducer activity, receptor activity, protein binding et al molecular function GO terms. Cd44 is one of six effector protein genes, and attracted us with its function diversity. After adding different reagents into the medium to interfere the normal binding of CSC and Cd44, varying-degree remissions of CSC's inhibition on neurite extension were observed. CSC can inhibit neurite extension through binding Cd44 on the neuron membrane. This verifies that important genes in given physiological processes can be identified by gene ontology analysis of gene chip data.

  4. Protein Annotation from Protein Interaction Networks and Gene Ontology

    OpenAIRE

    Nguyen, Cao D.; Gardiner, Katheleen J.; Cios, Krzysztof J.

    2011-01-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precis...

  5. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  6. OAHG: an integrated resource for annotating human genes with multi-level ontologies.

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-10-05

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ 2  = 0.2428, p < 2.2e-16).

  7. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    Science.gov (United States)

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  8. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. Conclusions The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl. PMID:23895341

  9. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    Science.gov (United States)

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  10. Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology.

    Science.gov (United States)

    Lovering, Ruth C; Roncaglia, Paola; Howe, Douglas G; Laulederkind, Stanley J F; Khodiyar, Varsha K; Berardini, Tanya Z; Tweedie, Susan; Foulger, Rebecca E; Osumi-Sutherland, David; Campbell, Nancy H; Huntley, Rachael P; Talmud, Philippa J; Blake, Judith A; Breckenridge, Ross; Riley, Paul R; Lambiase, Pier D; Elliott, Perry M; Clapp, Lucie; Tinker, Andrew; Hill, David P

    2018-02-01

    A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects. © 2018 The Authors.

  11. A methodology to migrate the gene ontology to a description logic environment using DAML+OIL.

    Science.gov (United States)

    Wroe, C J; Stevens, R; Goble, C A; Ashburner, M

    2003-01-01

    The Gene Ontology Next Generation Project (GONG) is developing a staged methodology to evolve the current representation of the Gene Ontology into DAML+OIL in order to take advantage of the richer formal expressiveness and the reasoning capabilities of the underlying description logic. Each stage provides a step level increase in formal explicit semantic content with a view to supporting validation, extension and multiple classification of the Gene Ontology. The paper introduces DAML+OIL and demonstrates the activity within each stage of the methodology and the functionality gained.

  12. Protein annotation from protein interaction networks and Gene Ontology.

    Science.gov (United States)

    Nguyen, Cao D; Gardiner, Katheleen J; Cios, Krzysztof J

    2011-10-01

    We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively. Copyright © 2011 Elsevier Inc. All rights reserved.

  13. Towards refactoring the Molecular Function Ontology with a UML profile for function modeling.

    Science.gov (United States)

    Burek, Patryk; Loebe, Frank; Herre, Heinrich

    2017-10-04

    Gene Ontology (GO) is the largest resource for cataloging gene products. This resource grows steadily and, naturally, this growth raises issues regarding the structure of the ontology. Moreover, modeling and refactoring large ontologies such as GO is generally far from being simple, as a whole as well as when focusing on certain aspects or fragments. It seems that human-friendly graphical modeling languages such as the Unified Modeling Language (UML) could be helpful in connection with these tasks. We investigate the use of UML for making the structural organization of the Molecular Function Ontology (MFO), a sub-ontology of GO, more explicit. More precisely, we present a UML dialect, called the Function Modeling Language (FueL), which is suited for capturing functions in an ontologically founded way. FueL is equipped, among other features, with language elements that arise from studying patterns of subsumption between functions. We show how to use this UML dialect for capturing the structure of molecular functions. Furthermore, we propose and discuss some refactoring options concerning fragments of MFO. FueL enables the systematic, graphical representation of functions and their interrelations, including making information explicit that is currently either implicit in MFO or is mainly captured in textual descriptions. Moreover, the considered subsumption patterns lend themselves to the methodical analysis of refactoring options with respect to MFO. On this basis we argue that the approach can increase the comprehensibility of the structure of MFO for humans and can support communication, for example, during revision and further development.

  14. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  15. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  16. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  17. Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Tratz, Stephen C.; Gregory, Michelle L.

    2006-06-08

    With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  18. Bayesian assignment of gene ontology terms to gene expression experiments.

    Science.gov (United States)

    Sykacek, P

    2012-09-15

    Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Source code under GPL license is available from the author. peter.sykacek@boku.ac.at.

  19. Bayesian assignment of gene ontology terms to gene expression experiments

    Science.gov (United States)

    Sykacek, P.

    2012-01-01

    Motivation: Gene expression assays allow for genome scale analyses of molecular biological mechanisms. State-of-the-art data analysis provides lists of involved genes, either by calculating significance levels of mRNA abundance or by Bayesian assessments of gene activity. A common problem of such approaches is the difficulty of interpreting the biological implication of the resulting gene lists. This lead to an increased interest in methods for inferring high-level biological information. A common approach for representing high level information is by inferring gene ontology (GO) terms which may be attributed to the expression data experiment. Results: This article proposes a probabilistic model for GO term inference. Modelling assumes that gene annotations to GO terms are available and gene involvement in an experiment is represented by a posterior probabilities over gene-specific indicator variables. Such probability measures result from many Bayesian approaches for expression data analysis. The proposed model combines these indicator probabilities in a probabilistic fashion and provides a probabilistic GO term assignment as a result. Experiments on synthetic and microarray data suggest that advantages of the proposed probabilistic GO term inference over statistical test-based approaches are in particular evident for sparsely annotated GO terms and in situations of large uncertainty about gene activity. Provided that appropriate annotations exist, the proposed approach is easily applied to inferring other high level assignments like pathways. Availability: Source code under GPL license is available from the author. Contact: peter.sykacek@boku.ac.at PMID:22962488

  20. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  1. The representation of heart development in the gene ontology.

    Science.gov (United States)

    Khodiyar, Varsha K; Hill, David P; Howe, Doug; Berardini, Tanya Z; Tweedie, Susan; Talmud, Philippa J; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C

    2011-06-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. Copyright © 2011

  2. The Representation of Heart Development in the Gene Ontology

    Science.gov (United States)

    Khodiyar, Varsha K.; Hill, David P.; Howe, Doug; Berardini, Tanya Z.; Tweedie, Susan; Talmud, Philippa J.; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C.

    2012-01-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. PMID:21419760

  3. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  4. GO(vis), a gene ontology visualization tool based on multi-dimensional values.

    Science.gov (United States)

    Ning, Zi; Jiang, Zhenran

    2010-05-01

    Most of gene product similarity measurements concentrate on the information content of Gene Ontology (GO) terms or use a path-based similarity between GO terms, which may ignore other important information contained in the structure of the ontology. In our study, we integrate different GO similarity measure approaches to analyze the functional relationship of genes and gene products with a new triangle-based visualization tool called GO(Vis). The purpose of this tool is to demonstrate the effect of three important information factors when measuring the similarity between gene products. One advantage of this tool is that its important ratio can be adjusted to meet different measuring requirements according to the biological knowledge of each factor. The experimental results demonstrate that GO(Vis) can display diagrams of the functional relationship for gene products effectively.

  5. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  6. Muscle Research and Gene Ontology: New standards for improved data integration.

    Science.gov (United States)

    Feltrin, Erika; Campanaro, Stefano; Diehl, Alexander D; Ehler, Elisabeth; Faulkner, Georgine; Fordham, Jennifer; Gardin, Chiara; Harris, Midori; Hill, David; Knoell, Ralph; Laveder, Paolo; Mittempergher, Lorenza; Nori, Alessandra; Reggiani, Carlo; Sorrentino, Vincenzo; Volpe, Pompeo; Zara, Ivano; Valle, Giorgio; Deegan, Jennifer

    2009-01-29

    The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO) Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community. The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature. The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic) experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.

  7. Muscle Research and Gene Ontology: New standards for improved data integration

    Directory of Open Access Journals (Sweden)

    Nori Alessandra

    2009-01-01

    Full Text Available Abstract Background The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community. Results The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature. Conclusion The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.

  8. Gene Ontology Terms and Automated Annotation for Energy-Related Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mukhopadhyay, Biswarup [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States); Tyler, Brett M. [Oregon State Univ., Corvallis, OR (United States); Setubal, Joao [Univ. of Sao Paulo (Brazil); Murali, T. M. [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)

    2017-11-03

    Gene Ontology (GO) is one of the more widely used functional ontologies for describing gene functions at various levels. The project developed 660 GO terms for describing energy-related microbial processes and filled the known gaps in this area of the GO system, and then used these terms to describe functions of 179 genes to showcase the utilities of the new resources. It hosted a series of workshops and made presentations at key meetings to inform and train scientific community members on these terms and to receive inputs from them for the GO term generation efforts. The project has developed a website for storing and displaying the resources (http://www.mengo.biochem.vt.edu/). The outcome of the project was further disseminated through peer-reviewed publications and poster and seminar presentations.

  9. The mammalian adult neurogenesis gene ontology (MANGO provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Directory of Open Access Journals (Sweden)

    Rupert W Overall

    Full Text Available BACKGROUND: Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. METHODOLOGY/PRINCIPAL FINDINGS: We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. CONCLUSIONS/SIGNIFICANCE: The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already

  10. Protein-Protein Interaction Network and Gene Ontology

    Science.gov (United States)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  11. GOseek: a gene ontology search engine using enhanced keywords.

    Science.gov (United States)

    Taha, Kamal

    2013-01-01

    We propose in this paper a biological search engine called GOseek, which overcomes the limitation of current gene similarity tools. Given a set of genes, GOseek returns the most significant genes that are semantically related to the given genes. These returned genes are usually annotated to one of the Lowest Common Ancestors (LCA) of the Gene Ontology (GO) terms annotating the given genes. Most genes have several annotation GO terms. Therefore, there may be more than one LCA for the GO terms annotating the given genes. The LCA annotating the genes that are most semantically related to the given gene is the one that receives the most aggregate semantic contribution from the GO terms annotating the given genes. To identify this LCA, GOseek quantifies the contribution of the GO terms annotating the given genes to the semantics of their LCAs. That is, it encodes the semantic contribution into a numeric format. GOseek uses microarray experiment data to rank result genes based on their significance. We evaluated GOseek experimentally and compared it with a comparable gene prediction tool. Results showed marked improvement over the tool.

  12. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.

  13. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

    Science.gov (United States)

    Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

    2016-03-11

    Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.

  14. Determining the semantic similarities among Gene Ontology terms.

    Science.gov (United States)

    Taha, Kamal

    2013-05-01

    We present in this paper novel techniques that determine the semantic relationships among GeneOntology (GO) terms. We implemented these techniques in a prototype system called GoSE, which resides between user application and GO database. Given a set S of GO terms, GoSE would return another set S' of GO terms, where each term in S' is semantically related to each term in S. Most current research is focused on determining the semantic similarities among GO ontology terms based solely on their IDs and proximity to one another in the GO graph structure, while overlooking the contexts of the terms, which may lead to erroneous results. The context of a GO term T is the set of other terms, whose existence in the GO graph structure is dependent on T. We propose novel techniques that determine the contexts of terms based on the concept of existence dependency. We present a stack-based sort-merge algorithm employing these techniques for determining the semantic similarities among GO terms.We evaluated GoSE experimentally and compared it with three existing methods. The results of measuring the semantic similarities among genes in KEGG and Pfam pathways retrieved from the DBGET and Sanger Pfam databases, respectively, have shown that our method outperforms the other three methods in recall and precision.

  15. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  16. The MGED Ontology: A Framework for Describing Functional Genomics Experiments

    OpenAIRE

    Stoeckert, Christian J.; Parkinson, Helen

    2003-01-01

    The Microarray Gene Expression Data (MGED) society was formed with an initial focus on experiments involving microarray technology. Despite the diversity of applications, there are common concepts used and a common need to capture experimental information in a standardized manner. In building the MGED ontology, it was recognized that it would be impractical to cover all the different types of experiments on all the different types of organisms by listing and defining all the types of organism...

  17. Codon bias and gene ontology in holometabolous and hemimetabolous insects.

    Science.gov (United States)

    Carlini, David B; Makowski, Matthew

    2015-12-01

    The relationship between preferred codon use (PCU), developmental mode, and gene ontology (GO) was investigated in a sample of nine insect species with sequenced genomes. These species were selected to represent two distinct modes of insect development, holometabolism and hemimetabolism, with an aim toward determining whether the differences in developmental timing concomitant with developmental mode would be mirrored by differences in PCU in their developmental genes. We hypothesized that the developmental genes of holometabolous insects should be under greater selective pressure for efficient translation, manifest as increased PCU, than those of hemimetabolous insects because holometabolism requires abundant protein expression over shorter time intervals than hemimetabolism, where proteins are required more uniformly in time. Preferred codon sets were defined for each species, from which the frequency of PCU for each gene was obtained. Although there were substantial differences in the genomic base composition of holometabolous and hemimetabolous insects, both groups exhibited a general preference for GC-ending codons, with the former group having higher PCU averaged across all genes. For each species, the biological process GO term for each gene was assigned that of its Drosophila homolog(s), and PCU was calculated for each GO term category. The top two GO term categories for PCU enrichment in the holometabolous insects were anatomical structure development and cell differentiation. The increased PCU in the developmental genes of holometabolous insects may reflect a general strategy to maximize the protein production of genes expressed in bursts over short time periods, e.g., heat shock proteins. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 686-698, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  18. Genetic Resources for Advanced Biofuel Production Described with the Gene Ontology

    Directory of Open Access Journals (Sweden)

    Trudy eTorto-Alalibo

    2014-10-01

    Full Text Available Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial Energy Gene Ontology (MENGO: http://www.mengo.biochem.vt.edu project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat, can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  19. Networks in biological systems: An investigation of the Gene Ontology as an evolving network

    International Nuclear Information System (INIS)

    Coronnello, C; Tumminello, M; Micciche, S; Mantegna, R.N.

    2009-01-01

    Many biological systems can be described as networks where different elements interact, in order to perform biological processes. We introduce a network associated with the Gene Ontology. Specifically, we construct a correlation-based network where the vertices are the terms of the Gene Ontology and the link between each two terms is weighted on the basis of the number of genes that they have in common. We analyze a filtered network obtained from the correlation-based network and we characterize its evolution over different releases of the Gene Ontology.

  20. Systematically characterizing and prioritizing chemosensitivity related gene based on Gene Ontology and protein interaction network

    Directory of Open Access Journals (Sweden)

    Chen Xin

    2012-10-01

    Full Text Available Abstract Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN. Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable

  1. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    Science.gov (United States)

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  2. Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application.

    Science.gov (United States)

    Roubelakis, Maria G; Zotos, Pantelis; Papachristoudis, Georgios; Michalopoulos, Ioannis; Pappa, Kalliopi I; Anagnou, Nicholas P; Kossida, Sophia

    2009-06-16

    microRNAs (miRNAs) are single-stranded RNA molecules of about 20-23 nucleotides length found in a wide variety of organisms. miRNAs regulate gene expression, by interacting with target mRNAs at specific sites in order to induce cleavage of the message or inhibit translation. Predicting or verifying mRNA targets of specific miRNAs is a difficult process of great importance. GOmir is a novel stand-alone application consisting of two separate tools: JTarget and TAGGO. JTarget integrates miRNA target prediction and functional analysis by combining the predicted target genes from TargetScan, miRanda, RNAhybrid and PicTar computational tools as well as the experimentally supported targets from TarBase and also providing a full gene description and functional analysis for each target gene. On the other hand, TAGGO application is designed to automatically group gene ontology annotations, taking advantage of the Gene Ontology (GO), in order to extract the main attributes of sets of proteins. GOmir represents a new tool incorporating two separate Java applications integrated into one stand-alone Java application. GOmir (by using up to five different databases) introduces miRNA predicted targets accompanied by (a) full gene description, (b) functional analysis and (c) detailed gene ontology clustering. Additionally, a reverse search initiated by a potential target can also be conducted. GOmir can freely be downloaded BRFAA.

  3. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  4. Zebrafish Expression Ontology of Gene Sets (ZEOGS): A Tool to Analyze Enrichment of Zebrafish Anatomical Terms in Large Gene Sets

    Science.gov (United States)

    Marsico, Annalisa

    2013-01-01

    Abstract The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene

  5. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    Science.gov (United States)

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  6. Integration of the Gene Ontology into an object-oriented architecture

    Directory of Open Access Journals (Sweden)

    Zheng W Jim

    2005-05-01

    Full Text Available Abstract Background To standardize gene product descriptions, a formal vocabulary defined as the Gene Ontology (GO has been developed. GO terms have been categorized into biological processes, molecular functions, and cellular components. However, there is no single representation that integrates all the terms into one cohesive model. Furthermore, GO definitions have little information explaining the underlying architecture that forms these terms, such as the dynamic and static events occurring in a process. In contrast, object-oriented models have been developed to show dynamic and static events. A portion of the TGF-beta signaling pathway, which is involved in numerous cellular events including cancer, differentiation and development, was used to demonstrate the feasibility of integrating the Gene Ontology into an object-oriented model. Results Using object-oriented models we have captured the static and dynamic events that occur during a representative GO process, "transforming growth factor-beta (TGF-beta receptor complex assembly" (GO:0007181. Conclusion We demonstrate that the utility of GO terms can be enhanced by object-oriented technology, and that the GO terms can be integrated into an object-oriented model by serving as a basis for the generation of object functions and attributes.

  7. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  8. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  9. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    Science.gov (United States)

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  10. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  11. Collapse of the wave function models, ontology, origin, and implications

    CERN Document Server

    2018-01-01

    This is the first single volume about the collapse theories of quantum mechanics, which is becoming a very active field of research in both physics and philosophy. In standard quantum mechanics, it is postulated that when the wave function of a quantum system is measured, it no longer follows the Schrödinger equation, but instantaneously and randomly collapses to one of the wave functions that correspond to definite measurement results. However, why and how a definite measurement result appears is unknown. A promising solution to this problem are collapse theories in which the collapse of the wave function is spontaneous and dynamical. Chapters written by distinguished physicists and philosophers of physics discuss the origin and implications of wave-function collapse, the controversies around collapse models and their ontologies, and new arguments for the reality of wave function collapse. This is an invaluable resource for students and researchers interested in the philosophy of physics and foundations of ...

  12. The Proteasix Ontology.

    Science.gov (United States)

    Arguello Casteleiro, Mercedes; Klein, Julie; Stevens, Robert

    2016-06-04

    The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.

  13. Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification.

    Science.gov (United States)

    Zhang, Jingpu; Zhang, Zuping; Wang, Zixiang; Liu, Yuting; Deng, Lei

    2018-05-15

    Long non-coding RNAs (lncRNAs) are an enormous collection of functional non-coding RNAs. Over the past decades, a large number of novel lncRNA genes have been identified. However, most of the lncRNAs remain function uncharacterized at present. Computational approaches provide a new insight to understand the potential functional implications of lncRNAs. Considering that each lncRNA may have multiple functions and a function may be further specialized into sub-functions, here we describe NeuraNetL2GO, a computational ontological function prediction approach for lncRNAs using hierarchical multi-label classification strategy based on multiple neural networks. The neural networks are incrementally trained level by level, each performing the prediction of gene ontology (GO) terms belonging to a given level. In NeuraNetL2GO, we use topological features of the lncRNA similarity network as the input of the neural networks and employ the output results to annotate the lncRNAs. We show that NeuraNetL2GO achieves the best performance and the overall advantage in maximum F-measure and coverage on the manually annotated lncRNA2GO-55 dataset compared to other state-of-the-art methods. The source code and data are available at http://denglab.org/NeuraNetL2GO/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.

  14. Length bias correction in gene ontology enrichment analysis using logistic regression.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

    2012-01-01

    When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

  15. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  16. Methodology for the inference of gene function from phenotype data.

    Science.gov (United States)

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and

  17. University of Texas Southwestern Medical Center (UTSW): Functional Signature Ontology Tool: Triplicate Measurements of Reporter Gene Expression in Response to Individual Genetic and Chemical Perturbations in HCT116 Cells | Office of Cancer Genomics

    Science.gov (United States)

    The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets.  Read the abstract

  18. University of Texas Southwestern Medical Center: Functional Signature Ontology Tool: Triplicate Measurements of Reporter Gene Expression in Response to Individual Genetic and Chemical Perturbations in HCT116 Cells | Office of Cancer Genomics

    Science.gov (United States)

    The goal of this project is to use an eight-gene expression profile to define functional signatures for small molecules and natural products with heretofore undefined mechanism of action. Two genes in the eight gene set are used as internal controls and do not vary across gene expression array data collected from the public domain. The remaining six genes are found to vary independently across a large collection of publically available gene expression array datasets.  Read the abstract

  19. SoFoCles: feature filtering for microarray classification based on gene ontology.

    Science.gov (United States)

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  20. GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach.

    Science.gov (United States)

    Zhang, Song; Cao, Jing; Kong, Y Megan; Scheuermann, Richard H

    2010-04-01

    A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy. We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.

  1. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  2. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications.

    Science.gov (United States)

    Whetzel, Patricia L; Noy, Natalya F; Shah, Nigam H; Alexander, Paul R; Nyulas, Csongor; Tudorache, Tania; Musen, Mark A

    2011-07-01

    The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

  3. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  4. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  5. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  7. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology

    OpenAIRE

    Caniza, Horacio; Romero, Alfonso E.; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-01-01

    Summary: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve...

  8. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    Science.gov (United States)

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

  9. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Science.gov (United States)

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  11. Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

    Science.gov (United States)

    Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

    2016-04-01

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  13. Annotating activation/inhibition relationships to protein-protein interactions using gene ontology relations.

    Science.gov (United States)

    Yim, Soorin; Yu, Hasun; Jang, Dongjin; Lee, Doheon

    2018-04-11

    Signaling pathways can be reconstructed by identifying 'effect types' (i.e. activation/inhibition) of protein-protein interactions (PPIs). Effect types are composed of 'directions' (i.e. upstream/downstream) and 'signs' (i.e. positive/negative), thereby requiring directions as well as signs of PPIs to predict signaling events from PPI networks. Here, we propose a computational method for systemically annotating effect types to PPIs using relations between functional information of proteins. We used regulates, positively regulates, and negatively regulates relations in Gene Ontology (GO) to predict directions and signs of PPIs. These relations indicate both directions and signs between GO terms so that we can project directions and signs between relevant GO terms to PPIs. Independent test results showed that our method is effective for predicting both directions and signs of PPIs. Moreover, our method outperformed a previous GO-based method that did not consider the relations between GO terms. We annotated effect types to human PPIs and validated several highly confident effect types against literature. The annotated human PPIs are available in Additional file 2 to aid signaling pathway reconstruction and network biology research. We annotated effect types to PPIs by using regulates, positively regulates, and negatively regulates relations in GO. We demonstrated that those relations are effective for predicting not only signs, but also directions of PPIs. The usefulness of those relations suggests their potential applications to other types of interactions such as protein-DNA interactions.

  14. GGDonto ontology as a knowledge-base for genetic diseases and disorders of glycan metabolism and their causative genes.

    Science.gov (United States)

    Solovieva, Elena; Shikanai, Toshihide; Fujita, Noriaki; Narimatsu, Hisashi

    2018-04-18

    Inherited mutations in glyco-related genes can affect the biosynthesis and degradation of glycans and result in severe genetic diseases and disorders. The Glyco-Disease Genes Database (GDGDB), which provides information about these diseases and disorders as well as their causative genes, has been developed by the Research Center for Medical Glycoscience (RCMG) and released in April 2010. GDGDB currently provides information on about 80 genetic diseases and disorders caused by single-gene mutations in glyco-related genes. Many biomedical resources provide information about genetic disorders and genes involved in their pathogenesis, but resources focused on genetic disorders known to be related to glycan metabolism are lacking. With the aim of providing more comprehensive knowledge on genetic diseases and disorders of glycan biosynthesis and degradation, we enriched the content of the GDGDB database and improved the methods for data representation. We developed the Genetic Glyco-Diseases Ontology (GGDonto) and a RDF/SPARQL-based user interface using Semantic Web technologies. In particular, we represented the GGDonto content using Semantic Web languages, such as RDF, RDFS, SKOS, and OWL, and created an interactive user interface based on SPARQL queries. This user interface provides features to browse the hierarchy of the ontology, view detailed information on diseases and related genes, and find relevant background information. Moreover, it provides the ability to filter and search information by faceted and keyword searches. Focused on the molecular etiology, pathogenesis, and clinical manifestations of genetic diseases and disorders of glycan metabolism and developed as a knowledge-base for this scientific field, GGDonto provides comprehensive information on various topics, including links to aid the integration with other scientific resources. The availability and accessibility of this knowledge will help users better understand how genetic defects impact the

  15. The role of ontologies in biological and biomedical research: a functional perspective

    KAUST Repository

    Hoehndorf, Robert

    2015-04-10

    Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.

  16. The role of ontologies in biological and biomedical research: a functional perspective

    KAUST Repository

    Hoehndorf, Robert; Schofield, P. N.; Gkoutos, G. V.

    2015-01-01

    Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.

  17. The meaning of the wave function in search of the ontology of quantum mechanics

    CERN Document Server

    Gao, Shan

    2017-01-01

    At the heart of quantum mechanics lies the wave function, a powerful but mysterious mathematical object which has been a hot topic of debate from its earliest stages. Covering much of the recent debate and providing a comprehensive and critical review of competing approaches, this ambitious text provides new, decisive proof of the reality of the wave function. Aiming to make sense of the wave function in quantum mechanics and to find the ontological content of the theory, this book explores new ontological interpretations of the wave function in terms of random discontinuous motion of particles. Finally, the book investigates whether the suggested quantum ontology is complete in solving the measurement problem and if it should be revised in the relativistic domain. A timely addition to the literature on the foundations of quantum mechanics, this book is of value to students and researchers with an interest in the philosophy of physics. Presents a concise introduction to quantum mechanics, including the c...

  18. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  19. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  20. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    Energy Technology Data Exchange (ETDEWEB)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G. [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada); Law, R. David, E-mail: dlaw@lakeheadu.ca [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada)

    2012-10-15

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  1. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    International Nuclear Information System (INIS)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G.; Law, R. David

    2012-01-01

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  2. Closing the loop: from paper to protein annotation using supervised Gene Ontology classification.

    Science.gov (United States)

    Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

    2014-01-01

    Gene function curation of the literature with Gene Ontology (GO) concepts is one particularly time-consuming task in genomics, and the help from bioinformatics is highly requested to keep up with the flow of publications. In 2004, the first BioCreative challenge already designed a task of automatic GO concepts assignment from a full text. At this time, results were judged far from reaching the performances required by real curation workflows. In particular, supervised approaches produced the most disappointing results because of lack of training data. Ten years later, the available curation data have massively grown. In 2013, the BioCreative IV GO task revisited the automatic GO assignment task. For this issue, we investigated the power of our supervised classifier, GOCat. GOCat computes similarities between an input text and already curated instances contained in a knowledge base to infer GO concepts. The subtask A consisted in selecting GO evidence sentences for a relevant gene in a full text. For this, we designed a state-of-the-art supervised statistical approach, using a naïve Bayes classifier and the official training set, and obtained fair results. The subtask B consisted in predicting GO concepts from the previous output. For this, we applied GOCat and reached leading results, up to 65% for hierarchical recall in the top 20 outputted concepts. Contrary to previous competitions, machine learning has this time outperformed standard dictionary-based approaches. Thanks to BioCreative IV, we were able to design a complete workflow for curation: given a gene name and a full text, this system is able to select evidence sentences for curation and to deliver highly relevant GO concepts. Contrary to previous competitions, machine learning this time outperformed dictionary-based systems. Observed performances are sufficient for being used in a real semiautomatic curation workflow. GOCat is available at http://eagl.unige.ch/GOCat/. http://eagl.unige.ch/GOCat4FT/.

  3. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    Science.gov (United States)

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  4. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data.

    Science.gov (United States)

    Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan

    2016-01-01

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

  5. A formal ontological perspective on the behaviors and functions of technical artifacts

    NARCIS (Netherlands)

    Borgo, S.; Carrara, M.; Garbacz, P.; Vermaas, P.E.

    2008-01-01

    In this paper we present a formal characterization of the engineering concepts of behavior and function of technical artifacts. We capture the meanings that engineers attach to these concepts by formalizing, within the formal ontology DOLCE, the five meanings of artifact behavior and the two

  6. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  7. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  8. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  9. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  10. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer.

    Directory of Open Access Journals (Sweden)

    Malin Lando

    2009-11-01

    Full Text Available Integrative analysis of gene dosage, expression, and ontology (GO data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1 and 13q (FAM48A, MED4 correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.

  11. Development of FuGO: An Ontology for Functional Genomics Investigations

    Science.gov (United States)

    Whetzel, Patricia L.; Brinkman, Ryan R.; Causton, Helen C.; Fan, Liju; Field, Dawn; Fostel, Jennifer; Fragoso, Gilberto; Gray, Tanya; Heiskanen, Mervi; Hernandez-Boussard, Tina; Morrison, Norman; Parkinson, Helen; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Schober, Daniel; Smith, Barry; Stevens, Robert; Stoeckert, Christian J.; Taylor, Chris; White, Joe; Wood, Andrew

    2009-01-01

    The development of the Functional Genomics Investigation Ontology (FuGO) is a collaborative, international effort that will provide a resource for annotating functional genomics investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. FuGO will contain both terms that are universal to all functional genomics investigations and those that are domain specific. In this way, the ontology will serve as the “semantic glue” to provide a common understanding of data from across these disparate data sources. In addition, FuGO will reference out to existing mature ontologies to avoid the need to duplicate these resources, and will do so in such a way as to enable their ease of use in annotation. This project is in the early stages of development; the paper will describe efforts to initiate the project, the scope and organization of the project, the work accomplished to date, and the challenges encountered, as well as future plans. PMID:16901226

  12. The functional landscape of mouse gene expression

    Directory of Open Access Journals (Sweden)

    Zhang Wen

    2004-12-01

    Full Text Available Abstract Background Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function. Results We generated microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide arrays. We show that quantitative transcriptional co-expression is a powerful predictor of gene function. Hundreds of functional categories, as defined by Gene Ontology 'Biological Processes', are associated with characteristic expression patterns across all tissues, including categories that bear no overt relationship to the tissue of origin. In contrast, simple tissue-specific restriction of expression is a poor predictor of which genes are in which functional categories. As an example, the highly conserved mouse gene PWP1 is widely expressed across different tissues but is co-expressed with many RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1 is required for rRNA biogenesis. Conclusions We conclude that 'functional genomics' strategies based on quantitative transcriptional co-expression will be as fruitful in mammals as they have been in simpler organisms, and that transcriptional control of mammalian physiology is more modular than is generally appreciated. Our data and analyses provide a public resource for mammalian functional genomics.

  13. Epistemic Function and Ontology of Analog and Digital Images

    Directory of Open Access Journals (Sweden)

    Aleksandra Łukaszewicz Alcaraz

    2016-01-01

    Full Text Available The important epistemic function of photographic images is their active role in construction and reconstruction of our beliefs concerning the world and human identity, since we often consider photographs as presenting reality or even the Real itself. Because photography can convince people of how different social and ethnic groups and even they themselves look, documentary projects and the dissemination of photographic practices supported the transition from disciplinary society to the present-day society of control. While both analog and digital images are formed from the same basic materia, the ways in which this matter appears are distinctive. In the case of analog photography, we deal with physical and chemical matter, whereas with digital images we face electronic matter. Because digital photography allows endless modification of the image, we can no longer believe in the truthfulness of digital images.

  14. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

    Science.gov (United States)

    Diehl, Alexander D; Meehan, Terrence F; Bradford, Yvonne M; Brush, Matthew H; Dahdul, Wasila M; Dougall, David S; He, Yongqun; Osumi-Sutherland, David; Ruttenberg, Alan; Sarntivijai, Sirarat; Van Slyke, Ceri E; Vasilevsky, Nicole A; Haendel, Melissa A; Blake, Judith A; Mungall, Christopher J

    2016-07-04

    The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the

  15. An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    Science.gov (United States)

    Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P

    2008-10-01

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/

  16. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat

    2017-09-27

    Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.

  17. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics

    Science.gov (United States)

    Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth

    2018-01-01

    Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578

  18. The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    2014-08-01

    Full Text Available With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.

  19. Lentiviral gene ontology (LeGO) vectors equipped with novel drug-selectable fluorescent proteins: new building blocks for cell marking and multi-gene analysis.

    Science.gov (United States)

    Weber, K; Mock, U; Petrowitz, B; Bartsch, U; Fehse, B

    2010-04-01

    Vector-encoded fluorescent proteins (FPs) facilitate unambiguous identification or sorting of gene-modified cells by fluorescence-activated cell sorting (FACS). Exploiting this feature, we have recently developed lentiviral gene ontology (LeGO) vectors (www.LentiGO-Vectors.de) for multi-gene analysis in different target cells. In this study, we extend the LeGO principle by introducing 10 different drug-selectable FPs created by fusing one of the five selection marker (protecting against blasticidin, hygromycin, neomycin, puromycin and zeocin) and one of the five FP genes (Cerulean, eGFP, Venus, dTomato and mCherry). All tested fusion proteins allowed both fluorescence-mediated detection and drug-mediated selection of LeGO-transduced cells. Newly generated codon-optimized hygromycin- and neomycin-resistance genes showed improved expression as compared with their ancestors. New LeGO constructs were produced at titers >10(6) per ml (for non-concentrated supernatants). We show efficient combinatorial marking and selection of various cells, including mesenchymal stem cells, simultaneously transduced with different LeGO constructs. Inclusion of the cytomegalovirus early enhancer/chicken beta-actin promoter into LeGO vectors facilitated robust transgene expression in and selection of neural stem cells and their differentiated progeny. We suppose that the new drug-selectable markers combining advantages of FACS and drug selection are well suited for numerous applications and vector systems. Their inclusion into LeGO vectors opens new possibilities for (stem) cell tracking and functional multi-gene analysis.

  20. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data

    Directory of Open Access Journals (Sweden)

    Domont Gilberto B

    2009-02-01

    Full Text Available Abstract Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx, that leverages the gene ontology (GO to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172. We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  1. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.

    Science.gov (United States)

    Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C

    2009-02-24

    Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  2. Reveal genes functionally associated with ACADS by a network study.

    Science.gov (United States)

    Chen, Yulong; Su, Zhiguang

    2015-09-15

    Establishing a systematic network is aimed at finding essential human gene-gene/gene-disease pathway by means of network inter-connecting patterns and functional annotation analysis. In the present study, we have analyzed functional gene interactions of short-chain acyl-coenzyme A dehydrogenase gene (ACADS). ACADS plays a vital role in free fatty acid β-oxidation and regulates energy homeostasis. Modules of highly inter-connected genes in disease-specific ACADS network are derived by integrating gene function and protein interaction data. Among the 8 genes in ACADS web retrieved from both STRING and GeneMANIA, ACADS is effectively conjoined with 4 genes including HAHDA, HADHB, ECHS1 and ACAT1. The functional analysis is done via ontological briefing and candidate disease identification. We observed that the highly efficient-interlinked genes connected with ACADS are HAHDA, HADHB, ECHS1 and ACAT1. Interestingly, the ontological aspect of genes in the ACADS network reveals that ACADS, HAHDA and HADHB play equally vital roles in fatty acid metabolism. The gene ACAT1 together with ACADS indulges in ketone metabolism. Our computational gene web analysis also predicts potential candidate disease recognition, thus indicating the involvement of ACADS, HAHDA, HADHB, ECHS1 and ACAT1 not only with lipid metabolism but also with infant death syndrome, skeletal myopathy, acute hepatic encephalopathy, Reye-like syndrome, episodic ketosis, and metabolic acidosis. The current study presents a comprehensible layout of ACADS network, its functional strategies and candidate disease approach associated with ACADS network. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    Science.gov (United States)

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  4. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  5. A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

    Science.gov (United States)

    Taşan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P

    2012-02-01

    The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

  6. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...

  7. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

    Science.gov (United States)

    Cheng, Liang; Jiang, Yue; Ju, Hong; Sun, Jie; Peng, Jiajie; Zhou, Meng; Hu, Yang

    2018-01-19

    Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.

  8. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

    Science.gov (United States)

    Caniza, Horacio; Romero, Alfonso E; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-08-01

    We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. alberto@cs.rhul.ac.uk GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines. © The Author 2014. Published by Oxford University Press.

  9. Using Gene Ontology to describe the role of the neurexin-neuroligin-SHANK complex in human, mouse and rat and its relevance to autism.

    Science.gov (United States)

    Patel, Sejal; Roncaglia, Paola; Lovering, Ruth C

    2015-06-06

    People with an autistic spectrum disorder (ASD) display a variety of characteristic behavioral traits, including impaired social interaction, communication difficulties and repetitive behavior. This complex neurodevelopment disorder is known to be associated with a combination of genetic and environmental factors. Neurexins and neuroligins play a key role in synaptogenesis and neurexin-neuroligin adhesion is one of several processes that have been implicated in autism spectrum disorders. In this report we describe the manual annotation of a selection of gene products known to be associated with autism and/or the neurexin-neuroligin-SHANK complex and demonstrate how a focused annotation approach leads to the creation of more descriptive Gene Ontology (GO) terms, as well as an increase in both the number of gene product annotations and their granularity, thus improving the data available in the GO database. The manual annotations we describe will impact on the functional analysis of a variety of future autism-relevant datasets. Comprehensive gene annotation is an essential aspect of genomic and proteomic studies, as the quality of gene annotations incorporated into statistical analysis tools affects the effective interpretation of data obtained through genome wide association studies, next generation sequencing, proteomic and transcriptomic datasets.

  10. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction.

  11. Quantum ontologies

    International Nuclear Information System (INIS)

    Stapp, H.P.

    1988-12-01

    Quantum ontologies are conceptions of the constitution of the universe that are compatible with quantum theory. The ontological orientation is contrasted to the pragmatic orientation of science, and reasons are given for considering quantum ontologies both within science, and in broader contexts. The principal quantum ontologies are described and evaluated. Invited paper at conference: Bell's Theorem, Quantum Theory, and Conceptions of the Universe, George Mason University, October 20-21, 1988. 16 refs

  12. Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes.

    Science.gov (United States)

    Osato, Naoki

    2018-01-19

    Transcriptional target genes show functional enrichment of genes. However, how many and how significantly transcriptional target genes include functional enrichments are still unclear. To address these issues, I predicted human transcriptional target genes using open chromatin regions, ChIP-seq data and DNA binding sequences of transcription factors in databases, and examined functional enrichment and gene expression level of putative transcriptional target genes. Gene Ontology annotations showed four times larger numbers of functional enrichments in putative transcriptional target genes than gene expression information alone, independent of transcriptional target genes. To compare the number of functional enrichments of putative transcriptional target genes between cells or search conditions, I normalized the number of functional enrichment by calculating its ratios in the total number of transcriptional target genes. With this analysis, native putative transcriptional target genes showed the largest normalized number of functional enrichments, compared with target genes including 5-60% of randomly selected genes. The normalized number of functional enrichments was changed according to the criteria of enhancer-promoter interactions such as distance from transcriptional start sites and orientation of CTCF-binding sites. Forward-reverse orientation of CTCF-binding sites showed significantly higher normalized number of functional enrichments than the other orientations. Journal papers showed that the top five frequent functional enrichments were related to the cellular functions in the three cell types. The median expression level of transcriptional target genes changed according to the criteria of enhancer-promoter assignments (i.e. interactions) and was correlated with the changes of the normalized number of functional enrichments of transcriptional target genes. Human putative transcriptional target genes showed significant functional enrichments. Functional

  13. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

    Science.gov (United States)

    Masino, Aaron J; Dechene, Elizabeth T; Dulik, Matthew C; Wilkens, Alisha; Spinner, Nancy B; Krantz, Ian D; Pennington, Jeffrey W; Robinson, Peter N; White, Peter S

    2014-07-21

    Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content. Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3. Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for

  14. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  15. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  16. Function analysis of unknown genes

    DEFF Research Database (Denmark)

    Rogowska-Wrzesinska, A.

    2002-01-01

      This thesis entitled "Function analysis of unknown genes" presents the use of proteome analysis for the characterisation of yeast (Saccharomyces cerevisiae) genes and their products (proteins especially those of unknown function). This study illustrates that proteome analysis can be used...... to describe different aspects of molecular biology of the cell, to study changes that occur in the cell due to overexpression or deletion of a gene and to identify various protein modifications. The biological questions and the results of the described studies show the diversity of the information that can...... genes and proteins. It reports the first global proteome database collecting 36 yeast single gene deletion mutants and selecting over 650 differences between analysed mutants and the wild type strain. The obtained results show that two-dimensional gel electrophoresis and mass spectrometry based proteome...

  17. Large-scale gene function analysis with the PANTHER classification system.

    Science.gov (United States)

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  18. fabp4 is central to eight obesity associated genes: a functional gene network-based polymorphic study.

    Science.gov (United States)

    Bag, Susmita; Ramaiah, Sudha; Anbarasu, Anand

    2015-01-07

    Network study on genes and proteins offers functional basics of the complexity of gene and protein, and its interacting partners. The gene fatty acid-binding protein 4 (fabp4) is found to be highly expressed in adipose tissue, and is one of the most abundant proteins in mature adipocytes. Our investigations on functional modules of fabp4 provide useful information on the functional genes interacting with fabp4, their biochemical properties and their regulatory functions. The present study shows that there are eight set of candidate genes: acp1, ext2, insr, lipe, ostf1, sncg, usp15, and vim that are strongly and functionally linked up with fabp4. Gene ontological analysis of network modules of fabp4 provides an explicit idea on the functional aspect of fabp4 and its interacting nodes. The hierarchal mapping on gene ontology indicates gene specific processes and functions as well as their compartmentalization in tissues. The fabp4 along with its interacting genes are involved in lipid metabolic activity and are integrated in multi-cellular processes of tissues and organs. They also have important protein/enzyme binding activity. Our study elucidated disease-associated nsSNP prediction for fabp4 and it is interesting to note that there are four rsID׳s (rs1051231, rs3204631, rs140925685 and rs141169989) with disease allelic variation (T104P, T126P, G27D and G90V respectively). On the whole, our gene network analysis presents a clear insight about the interactions and functions associated with fabp4 gene network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. PDON: Parkinson's disease ontology for representation and modeling of the Parkinson's disease knowledge domain.

    Science.gov (United States)

    Younesi, Erfan; Malhotra, Ashutosh; Gündel, Michaela; Scordis, Phil; Kodamullil, Alpha Tom; Page, Matt; Müller, Bernd; Springstubbe, Stephan; Wüllner, Ullrich; Scheller, Dieter; Hofmann-Apitius, Martin

    2015-09-22

    Despite the unprecedented and increasing amount of data, relatively little progress has been made in molecular characterization of mechanisms underlying Parkinson's disease. In the area of Parkinson's research, there is a pressing need to integrate various pieces of information into a meaningful context of presumed disease mechanism(s). Disease ontologies provide a novel means for organizing, integrating, and standardizing the knowledge domains specific to disease in a compact, formalized and computer-readable form and serve as a reference for knowledge exchange or systems modeling of disease mechanism. The Parkinson's disease ontology was built according to the life cycle of ontology building. Structural, functional, and expert evaluation of the ontology was performed to ensure the quality and usability of the ontology. A novelty metric has been introduced to measure the gain of new knowledge using the ontology. Finally, a cause-and-effect model was built around PINK1 and two gene expression studies from the Gene Expression Omnibus database were re-annotated to demonstrate the usability of the ontology. The Parkinson's disease ontology with a subclass-based taxonomic hierarchy covers the broad spectrum of major biomedical concepts from molecular to clinical features of the disease, and also reflects different views on disease features held by molecular biologists, clinicians and drug developers. The current version of the ontology contains 632 concepts, which are organized under nine views. The structural evaluation showed the balanced dispersion of concept classes throughout the ontology. The functional evaluation demonstrated that the ontology-driven literature search could gain novel knowledge not present in the reference Parkinson's knowledge map. The ontology was able to answer specific questions related to Parkinson's when evaluated by experts. Finally, the added value of the Parkinson's disease ontology is demonstrated by ontology-driven modeling of PINK1

  20. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence

    Science.gov (United States)

    Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.

    2008-01-01

    Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495

  1. Ontological Planning

    Directory of Open Access Journals (Sweden)

    Ahmet Alkan

    2017-12-01

    • Is it possible to redefine ontology within the hierarchical structure of planning? We are going to seek answers to some of these questions within the limited scope of this paper and we are going to offer the rest for discussion by just asking them. In light of these assessments, drawing attention, based on ontological knowledge relying on the wholeness of universe, to the question, on macro level planning, of whether or not the ontological realities of man, energy and movements of thinking can provide macro data for planning on a universal level as important factors affecting mankind will be one of the limited objectives of the paper.

  2. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  3. Identification of protein features encoded by alternative exons using Exon Ontology.

    Science.gov (United States)

    Tranchevent, Léon-Charles; Aubé, Fabien; Dulaurier, Louis; Benoit-Pilven, Clara; Rey, Amandine; Poret, Arnaud; Chautard, Emilie; Mortada, Hussein; Desmet, François-Olivier; Chakrama, Fatima Zahra; Moreno-Garcia, Maira Alejandra; Goillot, Evelyne; Janczarski, Stéphane; Mortreux, Franck; Bourgeois, Cyril F; Auboeuf, Didier

    2017-06-01

    Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named "Exon Ontology," based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information. © 2017 Tranchevent et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    Science.gov (United States)

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  5. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    Science.gov (United States)

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  6. The Use of Gene Ontology Term and KEGG Pathway Enrichment for Analysis of Drug Half-Life.

    Directory of Open Access Journals (Sweden)

    Yu-Hang Zhang

    Full Text Available A drug's biological half-life is defined as the time required for the human body to metabolize or eliminate 50% of the initial drug dosage. Correctly measuring the half-life of a given drug is helpful for the safe and accurate usage of the drug. In this study, we investigated which gene ontology (GO terms and biological pathways were highly related to the determination of drug half-life. The investigated drugs, with known half-lives, were analyzed based on their enrichment scores for associated GO terms and KEGG pathways. These scores indicate which GO terms or KEGG pathways the drug targets. The feature selection method, minimum redundancy maximum relevance, was used to analyze these GO terms and KEGG pathways and to identify important GO terms and pathways, such as sodium-independent organic anion transmembrane transporter activity (GO:0015347, monoamine transmembrane transporter activity (GO:0008504, negative regulation of synaptic transmission (GO:0050805, neuroactive ligand-receptor interaction (hsa04080, serotonergic synapse (hsa04726, and linoleic acid metabolism (hsa00591, among others. This analysis confirmed our results and may show evidence for a new method in studying drug half-lives and building effective computational methods for the prediction of drug half-lives.

  7. The Gene Ontology Differs in Bursa of Fabricius Between Two Breeds of Ducks Post Hatching by Enriching the Differentially Expressed Genes

    Directory of Open Access Journals (Sweden)

    H Liu

    Full Text Available ABSTRACT The bursa of Fabricius (BF is the central humoral immune organ unique to birds. The present study investigated the possible difference on a molecular level between two duck breeds. The digital gene expression profiling (DGE technology was used to enrich the differentially expressed genes (DEGs in BF between the Jianchang and Nonghua-P strains of ducks. DGE data identified 195 DEGs in the bursa. Gene Ontology (GO analysis suggested that DEGs were mainly enriched in the metabolic pathways and ribosome components. Pathways analysis identified the spliceosome, RNA transport, RNA degradation process, Jak-STAT signaling pathway, TNF signaling pathway and B cell receptor signaling pathway. The results indicated that the main difference in the BF between the two duck strains was in the capabilities of protein formation and B cell development. These data have revealed the main divergence in the BF on a molecular level between genetically different duck breeds and may help to perform molecular breeding programs in poultry in the future.

  8. Gene analogue finder: a GRID solution for finding functionally analogous gene products

    Directory of Open Access Journals (Sweden)

    Licciulli Flavio

    2007-09-01

    Full Text Available Abstract Background To date more than 2,1 million gene products from more than 100000 different species have been described specifying their function, the processes they are involved in and their cellular localization using a very well defined and structured vocabulary, the gene ontology (GO. Such vast, well defined knowledge opens the possibility of compare gene products at the level of functionality, finding gene products which have a similar function or are involved in similar biological processes without relying on the conventional sequence similarity approach. Comparisons within such a large space of knowledge are highly data and computing intensive. For this reason this project was based upon the use of the computational GRID, a technology offering large computing and storage resources. Results We have developed a tool, GENe AnaloGue FINdEr (ENGINE that parallelizes the search process and distributes the calculation and data over the computational GRID, splitting the process into many sub-processes and joining the calculation and the data on the same machine and therefore completing the whole search in about 3 days instead of occupying one single machine for more than 5 CPU years. The results of the functional comparison contain potential functional analogues for more than 79000 gene products from the most important species. 46% of the analyzed gene products are well enough described for such an analysis to individuate functional analogues, such as well-known members of the same gene family, or gene products with similar functions which would never have been associated by standard methods. Conclusion ENGINE has produced a list of potential functionally analogous relations between gene products within and between species using, in place of the sequence, the gene description of the GO, thus demonstrating the potential of the GO. However, the current limiting factor is the quality of the associations of many gene products from non

  9. [Using (1)H-nuclear magnetic resonance metabolomics and gene ontology to establish pathological staging model for esophageal cancer patients].

    Science.gov (United States)

    Chen, X; Wang, K; Chen, W; Jiang, H; Deng, P C; Li, Z J; Peng, J; Zhou, Z Y; Yang, H; Huang, G X; Zeng, J

    2016-07-01

    (ethanol amine, hydroxy-propionic acid, homocysteine and estriol) were eventually selected. gene ontology analysis showed that 54 enzymes and genes regulated the 4 key metabolic markers. The quantitative prediction model of esophageal cancer staging based on esophageal cancer NMR spectrum were established. Cross-validation results showed that the predicted effect was good (root mean square error=5.3, R(2)=0.47, P=0.036). The systems biology approaches based on metabolomics and enzyme-gene regulatory network analysis can be used to quantify the metabolic network disturbance of patients with advanced esophageal cancer, and to predict preoperative clinical staging of esophageal cancer patients by plasma NMR metabolomics.

  10. Delineation and interpretation of gene networks towards their effect in cellular physiology- a reverse engineering approach for the identification of critical molecular players, through the use of ontologies.

    Science.gov (United States)

    Moutselos, K; Maglogiannis, I; Chatziioannou, A

    2010-01-01

    Exploiting ontologies, provides clues regarding the involvement of certain molecular processes in the cellular phenotypic manifestation. However, identifying individual molecular actors (genes, proteins, etc.) for targeted biological validation in a generic, prioritized, fashion, based in objective measures of their effects in the cellular physiology, remains a challenge. In this work, a new meta-analysis algorithm is proposed for the holistic interpretation of the information captured in -omic experiments, that is showcased in a transcriptomic, dynamic, DNA microarray dataset, which examines the effect of mastic oil treatment in Lewis lung carcinoma cells. Through the use of the Gene Ontology this algorithm relates genes to specific cellular pathways and vice versa in order to further reverse engineer the critical role of specific genes, starting from the results of various statistical enrichment analyses. The algorithm is able to discriminate candidate hub-genes, implying critical biochemical cross-talk. Moreover, performance measures of the algorithm are derived, when evaluated with respect to the differential expression gene list of the dataset.

  11. SUGOI: automated ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2015-04-01

    Full Text Available A foundational ontology can solve interoperability issues among the domain ontologies aligned to it. However, several foundational ontologies have been developed, hence such interoperability issues exist among domain ontologies. The novel SUGOI tool...

  12. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies\\' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies\\' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph .Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  13. Inferring ontology graph structures using OWL reasoning.

    Science.gov (United States)

    Rodríguez-García, Miguel Ángel; Hoehndorf, Robert

    2018-01-05

    Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.

  14. The Ontology for Biomedical Investigations.

    Science.gov (United States)

    Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias; Brush, Matthew H; Bug, Bill; Chibucos, Marcus C; Clancy, Kevin; Courtot, Mélanie; Derom, Dirk; Dumontier, Michel; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Gibson, Frank; Gonzalez-Beltran, Alejandra; Haendel, Melissa A; He, Yongqun; Heiskanen, Mervi; Hernandez-Boussard, Tina; Jensen, Mark; Lin, Yu; Lister, Allyson L; Lord, Phillip; Malone, James; Manduchi, Elisabetta; McGee, Monnie; Morrison, Norman; Overton, James A; Parkinson, Helen; Peters, Bjoern; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H; Schober, Daniel; Smith, Barry; Soldatova, Larisa N; Stoeckert, Christian J; Taylor, Chris F; Torniai, Carlo; Turner, Jessica A; Vita, Randi; Whetzel, Patricia L; Zheng, Jie

    2016-01-01

    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed

  15. Functional characterization of endogenous siRNA target genes in Caenorhabditis elegans

    Directory of Open Access Journals (Sweden)

    Heikkinen Liisa

    2008-06-01

    Full Text Available Abstract Background Small interfering RNA (siRNA molecules mediate sequence specific silencing in RNA interference (RNAi, a gene regulatory phenomenon observed in almost all organisms. Large scale sequencing of small RNA libraries obtained from C. elegans has revealed that a broad spectrum of siRNAs is endogenously transcribed from genomic sequences. The biological role and molecular diversity of C. elegans endogenous siRNA (endo-siRNA molecules, nonetheless, remain poorly understood. In order to gain insight into their biological function, we annotated two large libraries of endo-siRNA sequences, identified their cognate targets, and performed gene ontology analysis to identify enriched functional categories. Results Systematic trends in categorization of target genes according to the specific length of siRNA sequences were observed: 18- to 22-mer siRNAs were associated with genes required for embryonic development; 23-mers were associated uniquely with post-embryonic development; 24–26-mers were associated with phosphorus metabolism or protein modification. Moreover, we observe that some argonaute related genes associate with siRNAs with multiple reads. Sequence frequency graphs suggest that different lengths of siRNAs share similarities in overall sequence structure: the 5' end begins with G, while the body predominates with U and C. Conclusion These results suggest that the lengths of endogenous siRNA molecules are consequential to their biological functions since the gene ontology categories for their cognate mRNA targets vary depending upon their lengths.

  16. Protein complex prediction in large ontology attributed protein-protein interaction networks.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo

    2013-01-01

    Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.

  17. Cross-organism learning method to discover new gene functionalities.

    Science.gov (United States)

    Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro

    2016-04-01

    Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones

  18. Automatic assignment of prokaryotic genes to functional categories using literature profiling.

    Directory of Open Access Journals (Sweden)

    Raul Torrieri

    Full Text Available In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotations are based solely on sequence similarity to a previously known gene, which was most probably annotated in the same way. As a result, a large number of predicted genes remain unassigned to any functional category despite the fact that there is enough evidence in the literature to predict their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of an ensemble of genes representative of each functional category of the J. Craig Venter Institute Comprehensive Microbial Resource (JCVI-CMR ontology. The classifier achieved up to 84% precision with 68% recall (for confidence≥0.4, F-measure 0.76 (recall and precision equally weighted in an independent set of 2,220 genes, from 13 bacterial species, previously classified by JCVI-CMR into unambiguous categories of its ontology. Finally, the classifier assigned (confidence≥0.7 to functional categories a total of 5,235 out of the ∼24 thousand genes previously in categories "Unknown function" or "Unclassified" for which there is literature in MEDLINE. Two biologists reviewed the literature of 100 of these genes, randomly picket, and assigned them to the same functional categories predicted by the automatic classifier. Our results confirmed the hypothesis that it is possible to confidently assign genes of a real world repository to functional categories, based exclusively on the automatic profiling of its associated literature. The LitProf--Gene Classifier web server is accessible at: www.cebio.org/litprofGC.

  19. An ontological system based on MODIS images to assess ecosystem functioning of Natura 2000 habitats: A case study for Quercus pyrenaica forests

    Science.gov (United States)

    Pérez-Luque, A. J.; Pérez-Pérez, R.; Bonet-García, F. J.; Magaña, P. J.

    2015-05-01

    The implementation of the Natura 2000 network requires methods to assess the conservation status of habitats. This paper shows a methodological approach that combines the use of (satellite) Earth observation with ontologies to monitor Natura 2000 habitats and assess their functioning. We have created an ontological system called Savia that can describe both the ecosystem functioning and the behaviour of abiotic factors in a Natura 2000 habitat. This system is able to automatically download images from MODIS products, create indicators and compute temporal trends for them. We have developed an ontology that takes into account the different concepts and relations about indicators and temporal trends, and the spatio-temporal components of the datasets. All the information generated from datasets and MODIS images, is stored into a knowledge base according to the ontology. Users can formulate complex questions using a SPARQL end-point. This system has been tested and validated in a case study that uses Quercus pyrenaica Willd. forests as a target habitat in Sierra Nevada (Spain), a Natura 2000 site. We assess ecosystem functioning using NDVI. The selected abiotic factor is snow cover. Savia provides useful data regarding these two variables and reflects relationships between them.

  20. Ontology evolution in physics

    OpenAIRE

    Chan, Michael

    2013-01-01

    With the advent of reasoning problems in dynamic environments, there is an increasing need for automated reasoning systems to automatically adapt to unexpected changes in representations. In particular, the automation of the evolution of their ontologies needs to be enhanced without substantially sacrificing expressivity in the underlying representation. Revision of beliefs is not enough, as adding to or removing from beliefs does not change the underlying formal language. Gene...

  1. Comparison of lists of genes based on functional profiles

    Directory of Open Access Journals (Sweden)

    Salicrú Miquel

    2011-10-01

    Full Text Available Abstract Background How to compare studies on the basis of their biological significance is a problem of central importance in high-throughput genomics. Many methods for performing such comparisons are based on the information in databases of functional annotation, such as those that form the Gene Ontology (GO. Typically, they consist of analyzing gene annotation frequencies in some pre-specified GO classes, in a class-by-class way, followed by p-value adjustment for multiple testing. Enrichment analysis, where a list of genes is compared against a wider universe of genes, is the most common example. Results A new global testing procedure and a method incorporating it are presented. Instead of testing separately for each GO class, a single global test for all classes under consideration is performed. The test is based on the distance between the functional profiles, defined as the joint frequencies of annotation in a given set of GO classes. These classes may be chosen at one or more GO levels. The new global test is more powerful and accurate with respect to type I errors than the usual class-by-class approach. When applied to some real datasets, the results suggest that the method may also provide useful information that complements the tests performed using a class-by-class approach if gene counts are sparse in some classes. An R library, goProfiles, implements these methods and is available from Bioconductor, http://bioconductor.org/packages/release/bioc/html/goProfiles.html. Conclusions The method provides an inferential basis for deciding whether two lists are functionally different. For global comparisons it is preferable to the global chi-square test of homogeneity. Furthermore, it may provide additional information if used in conjunction with class-by-class methods.

  2. FunGene: the functional gene pipeline and repository.

    Science.gov (United States)

    Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

    2013-01-01

    Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.

  3. FunGene: the Functional Gene Pipeline and Repository

    Directory of Open Access Journals (Sweden)

    Jordan A. Fish

    2013-10-01

    Full Text Available Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer.While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/ offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.

  4. Genes2FANs: connecting genes through functional association networks

    Science.gov (United States)

    2012-01-01

    Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in

  5. Building ontologies with basic formal ontology

    CERN Document Server

    Arp, Robert; Spear, Andrew D.

    2015-01-01

    In the era of "big data," science is increasingly information driven, and the potential for computers to store, manage, and integrate massive amounts of data has given rise to such new disciplinary fields as biomedical informatics. Applied ontology offers a strategy for the organization of scientific information in computer-tractable form, drawing on concepts not only from computer and information science but also from linguistics, logic, and philosophy. This book provides an introduction to the field of applied ontology that is of particular relevance to biomedicine, covering theoretical components of ontologies, best practices for ontology design, and examples of biomedical ontologies in use. After defining an ontology as a representation of the types of entities in a given domain, the book distinguishes between different kinds of ontologies and taxonomies, and shows how applied ontology draws on more traditional ideas from metaphysics. It presents the core features of the Basic Formal Ontology (BFO), now u...

  6. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition.

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2013-04-21

    Prediction of protein subcellular localization is an important yet challenging problem. Recently, several computational methods based on Gene Ontology (GO) have been proposed to tackle this problem and have demonstrated superiority over methods based on other features. Existing GO-based methods, however, do not fully use the GO information. This paper proposes an efficient GO method called GOASVM that exploits the information from the GO term frequencies and distant homologs to represent a protein in the general form of Chou's pseudo-amino acid composition. The method first selects a subset of relevant GO terms to form a GO vector space. Then for each protein, the method uses the accession number (AC) of the protein or the ACs of its homologs to find the number of occurrences of the selected GO terms in the Gene Ontology annotation (GOA) database as a means to construct GO vectors for support vector machines (SVMs) classification. With the advantages of GO term frequencies and a new strategy to incorporate useful homologous information, GOASVM can achieve a prediction accuracy of 72.2% on a new independent test set comprising novel proteins that were added to Swiss-Prot six years later than the creation date of the training set. GOASVM and Supplementary materials are available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/GOASVM.html. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Ontology authoring with Forza

    CSIR Research Space (South Africa)

    Keet, CM

    2014-11-01

    Full Text Available Generic, reusable ontology elements, such as a foundational ontology's categories and part-whole relations, are essential for good and interoperable knowledge representation. Ontology developers, which include domain experts and novices, face...

  8. Ontological Surprises

    DEFF Research Database (Denmark)

    Leahu, Lucian

    2016-01-01

    a hybrid approach where machine learning algorithms are used to identify objects as well as connections between them; finally, it argues for remaining open to ontological surprises in machine learning as they may enable the crafting of different relations with and through technologies.......This paper investigates how we might rethink design as the technological crafting of human-machine relations in the context of a machine learning technique called neural networks. It analyzes Google’s Inceptionism project, which uses neural networks for image recognition. The surprising output...

  9. EWS and FUS bind a subset of transcribed genes encoding proteins enriched in RNA regulatory functions

    DEFF Research Database (Denmark)

    Luo, Yonglun; Friis, Jenny Blechingberg; Fernandes, Ana Miguel

    2015-01-01

    at different levels. Gene Ontology analyses showed that FUS and EWS target genes preferentially encode proteins involved in regulatory processes at the RNA level. Conclusions The presented results yield new insights into gene interactions of EWS and FUS and have identified a set of FUS and EWS target genes...... involved in pathways at the RNA regulatory level with potential to mediate normal and disease-associated functions of the FUS and EWS proteins.......Background FUS (TLS) and EWS (EWSR1) belong to the FET-protein family of RNA and DNA binding proteins. FUS and EWS are structurally and functionally related and participate in transcriptional regulation and RNA processing. FUS and EWS are identified in translocation generated cancer fusion proteins...

  10. Identifying arsenic trioxide (ATO) functions in leukemia cells by using time series gene expression profiles.

    Science.gov (United States)

    Yang, Hong; Lin, Shan; Cui, Jingru

    2014-02-10

    Arsenic trioxide (ATO) is presently the most active single agent in the treatment of acute promyelocytic leukemia (APL). In order to explore the molecular mechanism of ATO in leukemia cells with time series, we adopted bioinformatics strategy to analyze expression changing patterns and changes in transcription regulation modules of time series genes filtered from Gene Expression Omnibus database (GSE24946). We totally screened out 1847 time series genes for subsequent analysis. The KEGG (Kyoto encyclopedia of genes and genomes) pathways enrichment analysis of these genes showed that oxidative phosphorylation and ribosome were the top 2 significantly enriched pathways. STEM software was employed to compare changing patterns of gene expression with assigned 50 expression patterns. We screened out 7 significantly enriched patterns and 4 tendency charts of time series genes. The result of Gene Ontology showed that functions of times series genes mainly distributed in profiles 41, 40, 39 and 38. Seven genes with positive regulation of cell adhesion function were enriched in profile 40, and presented the same first increased model then decreased model as profile 40. The transcription module analysis showed that they mainly involved in oxidative phosphorylation pathway and ribosome pathway. Overall, our data summarized the gene expression changes in ATO treated K562-r cell lines with time and suggested that time series genes mainly regulated cell adhesive. Furthermore, our result may provide theoretical basis of molecular biology in treating acute promyelocytic leukemia. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Towards precise classification of cancers based on robust gene functional expression profiles

    Directory of Open Access Journals (Sweden)

    Zhu Jing

    2005-03-01

    Full Text Available Abstract Background Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level. Results Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles. Conclusion This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge

  12. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

    Science.gov (United States)

    He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

    2018-02-20

    Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as

  13. Building a biomedical ontology recommender web service

    Directory of Open Access Journals (Sweden)

    Jonquet Clement

    2010-06-01

    Full Text Available Abstract Background Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use. Methods We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal. Results We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated ‘very relevant’ by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version is available to the community and is embedded into BioPortal.

  14. Anatomy Ontology Matching Using Markov Logic Networks

    Directory of Open Access Journals (Sweden)

    Chunhua Li

    2016-01-01

    Full Text Available The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relationships between ontologies describing different species. Ontology matching is a kind of solutions to find semantic correspondences between entities of different ontologies. Markov logic networks which unify probabilistic graphical model and first-order logic provide an excellent framework for ontology matching. We combine several different matching strategies through first-order logic formulas according to the structure of anatomy ontologies. Experiments on the adult mouse anatomy and the human anatomy have demonstrated the effectiveness of proposed approach in terms of the quality of result alignment.

  15. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk

    OpenAIRE

    Cheng, Liang; Jiang, Yue; Ju, Hong; Sun, Jie; Peng, Jiajie; Zhou, Meng; Hu, Yang

    2018-01-01

    Background Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each o...

  16. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2016-11-11

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies

  17. Didactical Ontologies

    Directory of Open Access Journals (Sweden)

    Steffen Mencke, Reiner Dumke

    2008-03-01

    Full Text Available Ontologies are a fundamental concept of theSemantic Web envisioned by Tim Berners-Lee [1]. Togetherwith explicit representation of the semantics of data formachine-accessibility such domain theories are the basis forintelligent next generation applications for the web andother areas of interest [2]. Their application for specialaspects within the domain of e-learning is often proposed tosupport the increasing complexity ([3], [4], [5], [6]. So theycan provide a better support for course generation orlearning scenario description [7]. By the modeling ofdidactics-related expertise and their provision for thecreators of courses many improvements like reuse, rapiddevelopment and of course increased learning performancebecome possible due to the separation from other aspects ofe-learning platforms as already proposed in [8].

  18. Gene expression profiling in susceptible interaction of grapevine with its fungal pathogen Eutypa lata: Extending MapMan ontology for grapevine

    Directory of Open Access Journals (Sweden)

    Usadel Björn

    2009-08-01

    Full Text Available Abstract Background Whole genome transcriptomics analysis is a very powerful approach because it gives an overview of the activity of genes in certain cells or tissue types. However, biological interpretation of such results can be rather tedious. MapMan is a software tool that displays large datasets (e.g. gene expression data onto diagrams of metabolic pathways or other processes and thus enables easier interpretation of results. The grapevine (Vitis vinifera genome sequence has recently become available bringing a new dimension into associated research. Two microarray platforms were designed based on the TIGR Gene Index database and used in several physiological studies. Results To enable easy and effective visualization of those and further experiments, annotation of Vitis vinifera Gene Index (VvGI version 5 to MapMan ontology was set up. Due to specificities of grape physiology, we have created new pictorial representations focusing on three selected pathways: carotenoid pathway, terpenoid pathway and phenylpropanoid pathway, the products of these pathways being important for wine aroma, flavour and colour, as well as plant defence against pathogens. This new tool was validated on Affymetrix microarrays data obtained during berry ripening and it allowed the discovery of new aspects in process regulation. We here also present results on transcriptional profiling of grape plantlets after exposal to the fungal pathogen Eutypa lata using Operon microarrays including visualization of results with MapMan. The data show that the genes induced in infected plants, encode pathogenesis related proteins and enzymes of the flavonoid metabolism, which are well known as being responsive to fungal infection. Conclusion The extension of MapMan ontology to grapevine together with the newly constructed pictorial representations for carotenoid, terpenoid and phenylpropanoid metabolism provide an alternative approach to the analysis of grapevine gene expression

  19. Markov Chain Ontology Analysis (MCOA).

    Science.gov (United States)

    Frost, H Robert; McCray, Alexa T

    2012-02-03

    Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.

  20. EWS and FUS bind a subset of transcribed genes encoding proteins enriched in RNA regulatory functions.

    Science.gov (United States)

    Luo, Yonglun; Blechingberg, Jenny; Fernandes, Ana Miguel; Li, Shengting; Fryland, Tue; Børglum, Anders D; Bolund, Lars; Nielsen, Anders Lade

    2015-11-14

    FUS (TLS) and EWS (EWSR1) belong to the FET-protein family of RNA and DNA binding proteins. FUS and EWS are structurally and functionally related and participate in transcriptional regulation and RNA processing. FUS and EWS are identified in translocation generated cancer fusion proteins and involved in the human neurological diseases amyotrophic lateral sclerosis and fronto-temporal lobar degeneration. To determine the gene regulatory functions of FUS and EWS at the level of chromatin, we have performed chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq). Our results show that FUS and EWS bind to a subset of actively transcribed genes, that binding often is downstream the poly(A)-signal, and that binding overlaps with RNA polymerase II. Functional examinations of selected target genes identified that FUS and EWS can regulate gene expression at different levels. Gene Ontology analyses showed that FUS and EWS target genes preferentially encode proteins involved in regulatory processes at the RNA level. The presented results yield new insights into gene interactions of EWS and FUS and have identified a set of FUS and EWS target genes involved in pathways at the RNA regulatory level with potential to mediate normal and disease-associated functions of the FUS and EWS proteins.

  1. New Genes and Functional Innovation in Mammals.

    Science.gov (United States)

    Luis Villanueva-Cañas, José; Ruiz-Orera, Jorge; Agea, M Isabel; Gallo, Maria; Andreu, David; Albà, M Mar

    2017-07-01

    The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. OntologyWidget – a reusable, embeddable widget for easily locating ontology terms

    Directory of Open Access Journals (Sweden)

    Skene JH Pate

    2007-09-01

    Widget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website 1, as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from http://smd.stanford.edu/ontologyWidget/.

  3. Drug target ontology to classify and integrate drug discovery data.

    Science.gov (United States)

    Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande; Turner, John Paul; Vidovic, Dusica; Forlin, Michele; Koleti, Amar; Nguyen, Dac-Trung; Jensen, Lars Juhl; Guha, Rajarshi; Mathias, Stephen L; Ursu, Oleg; Stathias, Vasileios; Duan, Jianbin; Nabizadeh, Nooshin; Chung, Caty; Mader, Christopher; Visser, Ubbo; Yang, Jeremy J; Bologa, Cristian G; Oprea, Tudor I; Schürer, Stephan C

    2017-11-09

    model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/ , Github ( http://github.com/DrugTargetOntology/DTO ), and the NCBO Bioportal ( http://bioportal.bioontology.org/ontologies/DTO ). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource.

  4. Transcriptome Analysis of Porcine PBMCs Reveals the Immune Cascade Response and Gene Ontology Terms Related to Cell Death and Fibrosis in the Progression of Liver Failure

    Directory of Open Access Journals (Sweden)

    YiMin Zhang

    2018-01-01

    Full Text Available Background. The key gene sets involved in the progression of acute liver failure (ALF, which has a high mortality rate, remain unclear. This study aims to gain a deeper understanding of the transcriptional response of peripheral blood mononuclear cells (PBMCs following ALF. Methods. ALF was induced by D-galactosamine (D-gal in a porcine model. PBMCs were separated at time zero (baseline group, 36 h (failure group, and 60 h (dying group after D-gal injection. Transcriptional profiling was performed using RNA sequencing and analysed using DAVID bioinformatics resources. Results. Compared with the baseline group, 816 and 1,845 differentially expressed genes (DEGs were identified in the failure and dying groups, respectively. A total of five and two gene ontology (GO term clusters were enriched in 107 GO terms in the failure group and 154 GO terms in the dying group. These GO clusters were primarily immune-related, including genes regulating the inflammasome complex and toll-like receptor signalling pathways. Specifically, GO terms related to cell death, including apoptosis, pyroptosis, and autophagy, and those related to fibrosis, coagulation dysfunction, and hepatic encephalopathy were enriched. Seven Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, cytokine-cytokine receptor interaction, hematopoietic cell lineage, lysosome, rheumatoid arthritis, malaria, and phagosome and pertussis pathways were mapped for DEGs in the failure group. All of these seven KEGG pathways were involved in the 19 KEGG pathways mapped in the dying group. Conclusion. We found that the dramatic PBMC transcriptome changes triggered by ALF progression was predominantly related to immune responses. The enriched GO terms related to cell death, fibrosis, and so on, as indicated by PBMC transcriptome analysis, seem to be useful in elucidating potential key gene sets in the progression of ALF. A better understanding of these gene sets might be of preventive or

  5. Gene2Function: An Integrated Online Resource for Gene Function Discovery

    Directory of Open Access Journals (Sweden)

    Yanhui Hu

    2017-08-01

    Full Text Available One of the most powerful ways to develop hypotheses regarding the biological functions of conserved genes in a given species, such as humans, is to first look at what is known about their function in another species. Model organism databases and other resources are rich with functional information but difficult to mine. Gene2Function addresses a broad need by integrating information about conserved genes in a single online resource.

  6. An ontology approach to comparative phenomics in plants

    KAUST Repository

    Oellrich, Anika

    2015-02-25

    Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics

  7. An ontology approach to comparative phenomics in plants

    KAUST Repository

    Oellrich, Anika; Walls, Ramona L; Cannon, Ethalinda KS; Cannon, Steven B; Cooper, Laurel; Gardiner, Jack; Gkoutos, Georgios V; Harper, Lisa; He, Mingze; Hoehndorf, Robert; Jaiswal, Pankaj; Kalberer, Scott R; Lloyd, John P; Meinke, David; Menda, Naama; Moore, Laura; Nelson, Rex T; Pujar, Anuradha; Lawrence, Carolyn J; Huala, Eva

    2015-01-01

    Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics

  8. Ontology-aided Data Fusion (Invited)

    Science.gov (United States)

    Raskin, R.

    2009-12-01

    An ontology provides semantic descriptions that are analogous to those in a dictionary, but are readable by both computers and humans. A data or service is semantically annotated when it is formally associated with elements of an ontology. The ESIP Federation Semantic Web Cluster has developed a set of ontologies to describe datatypes and data services that can be used to support automated data fusion. The service ontology includes descriptors of the service function, its inputs/outputs, and its invocation method. The datatype descriptors resemble typical metadata fields (data format, data model, data structure, originator, etc.) augmented with descriptions of the meaning of the data. These ontologies, in combination with the SWEET science ontology, enable a registered data fusion service to be chained together and implemented that is scientifically meaningful based on machine understanding of the associated data and services. This presentation describes initial results and experiences in automated data fusion.

  9. Functional Module Analysis for Gene Coexpression Networks with Network Integration.

    Science.gov (United States)

    Zhang, Shuqin; Zhao, Hongyu; Ng, Michael K

    2015-01-01

    Network has been a general tool for studying the complex interactions between different genes, proteins, and other small molecules. Module as a fundamental property of many biological networks has been widely studied and many computational methods have been proposed to identify the modules in an individual network. However, in many cases, a single network is insufficient for module analysis due to the noise in the data or the tuning of parameters when building the biological network. The availability of a large amount of biological networks makes network integration study possible. By integrating such networks, more informative modules for some specific disease can be derived from the networks constructed from different tissues, and consistent factors for different diseases can be inferred. In this paper, we have developed an effective method for module identification from multiple networks under different conditions. The problem is formulated as an optimization model, which combines the module identification in each individual network and alignment of the modules from different networks together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying modules in multiple networks are different in simulation studies. We also applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. We identified 13 modules with three complete subgraphs, and 11 modules with two complete subgraphs, respectively. The modules were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most modules for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further studying the modules experimentally.

  10. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....

  11. Functional dissection of drought-responsive gene expression patterns in Cynodon dactylon L.

    Science.gov (United States)

    Kim, Changsoo; Lemke, Cornelia; Paterson, Andrew H

    2009-05-01

    Water deficit is one of the main abiotic factors that affect plant productivity in subtropical regions. To identify genes induced during the water stress response in Bermudagrass (Cynodon dactylon), cDNA macroarrays were used. The macroarray analysis identified 189 drought-responsive candidate genes from C. dactylon, of which 120 were up-regulated and 69 were down-regulated. The candidate genes were classified into seven groups by cluster analysis of expression levels across two intensities and three durations of imposed stress. Annotation using BLASTX suggested that up-regulated genes may be involved in proline biosynthesis, signal transduction pathways, protein repair systems, and removal of toxins, while down-regulated genes were mostly related to basic plant metabolism such as photosynthesis and glycolysis. The functional classification of gene ontology (GO) was consistent with the BLASTX results, also suggesting some crosstalk between abiotic and biotic stress. Comparative analysis of cis-regulatory elements from the candidate genes implicated specific elements in drought response in Bermudagrass. Although only a subset of genes was studied, Bermudagrass shared many drought-responsive genes and cis-regulatory elements with other botanical models, supporting a strategy of cross-taxon application of drought-responsive genes, regulatory cues, and physiological-genetic information.

  12. Comparing Relational and Ontological Triple Stores in Healthcare Domain

    Directory of Open Access Journals (Sweden)

    Ozgu Can

    2017-01-01

    Full Text Available Today’s technological improvements have made ubiquitous healthcare systems that converge into smart healthcare applications in order to solve patients’ problems, to communicate effectively with patients, and to improve healthcare service quality. The first step of building a smart healthcare information system is representing the healthcare data as connected, reachable, and sharable. In order to achieve this representation, ontologies are used to describe the healthcare data. Combining ontological healthcare data with the used and obtained data can be maintained by storing the entire health domain data inside big data stores that support both relational and graph-based ontological data. There are several big data stores and different types of big data sets in the healthcare domain. The goal of this paper is to determine the most applicable ontology data store for storing the big healthcare data. For this purpose, AllegroGraph and Oracle 12c data stores are compared based on their infrastructural capacity, loading time, and query response times. Hence, healthcare ontologies (GENE Ontology, Gene Expression Ontology (GEXO, Regulation of Transcription Ontology (RETO, Regulation of Gene Expression Ontology (REXO are used to measure the ontology loading time. Thereafter, various queries are constructed and executed for GENE ontology in order to measure the capacity and query response times for the performance comparison between AllegroGraph and Oracle 12c triple stores.

  13. Knowledge Management Framework for Emerging Infectious Diseases Preparedness and Response: Design and Development of Public Health Document Ontology.

    Science.gov (United States)

    Zhang, Zhizun; Gonzalez, Mila C; Morse, Stephen S; Venkatasubramanian, Venkat

    2017-10-11

    There are increasing concerns about our preparedness and timely coordinated response across the globe to cope with emerging infectious diseases (EIDs). This poses practical challenges that require exploiting novel knowledge management approaches effectively. This work aims to develop an ontology-driven knowledge management framework that addresses the existing challenges in sharing and reusing public health knowledge. We propose a systems engineering-inspired ontology-driven knowledge management approach. It decomposes public health knowledge into concepts and relations and organizes the elements of knowledge based on the teleological functions. Both knowledge and semantic rules are stored in an ontology and retrieved to answer queries regarding EID preparedness and response. A hybrid concept extraction was implemented in this work. The quality of the ontology was evaluated using the formal evaluation method Ontology Quality Evaluation Framework. Our approach is a potentially effective methodology for managing public health knowledge. Accuracy and comprehensiveness of the ontology can be improved as more knowledge is stored. In the future, a survey will be conducted to collect queries from public health practitioners. The reasoning capacity of the ontology will be evaluated using the queries and hypothetical outbreaks. We suggest the importance of developing a knowledge sharing standard like the Gene Ontology for the public health domain. ©Zhizun Zhang, Mila C Gonzalez, Stephen S Morse, Venkat Venkatasubramanian. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 11.10.2017.

  14. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Science.gov (United States)

    Zhang, Ping; Wu, Linwei; Rocha, Andrea M.; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D.; Wu, Liyou; Watson, David B.; Adams, Michael W. W.; Alm, Eric J.; Adams, Paul D.; Arkin, Adam P.

    2018-01-01

    ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly (P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. PMID:29463661

  15. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Directory of Open Access Journals (Sweden)

    Zhili He

    2018-02-01

    Full Text Available Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN, representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5 increased significantly (P < 0.05 as uranium or nitrate increased, and their changes could be used to successfully predict uranium and nitrate contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning.

  16. Gene, environment and cognitive function

    DEFF Research Database (Denmark)

    Xu, Chunsheng; Sun, Jianping; Duan, Haiping

    2015-01-01

    BACKGROUND: the genetic and environmental contributions to cognitive function in the old people have been well addressed for the Western populations using twin modelling showing moderate to high heritability. No similar study has been conducted in the world largest and rapidly ageing Chinese...... population living under distinct environmental condition as the Western populations. OBJECTIVE: this study aims to explore the genetic and environmental impact on normal cognitive ageing in the Chinese twins. DESIGN/SETTING: cognitive function was measured on 384 complete twin pairs with median age of 50...... years for seven cognitive measurements including visuospatial, linguistic skills, naming, memory, attention, abstraction and orientation abilities. Data were analysed by fitting univariate and bivariate twin models to estimate the genetic and environmental components in the variance and co...

  17. XML, Ontologies, and Their Clinical Applications.

    Science.gov (United States)

    Yu, Chunjiang; Shen, Bairong

    2016-01-01

    The development of information technology has resulted in its penetration into every area of clinical research. Various clinical systems have been developed, which produce increasing volumes of clinical data. However, saving, exchanging, querying, and exploiting these data are challenging issues. The development of Extensible Markup Language (XML) has allowed the generation of flexible information formats to facilitate the electronic sharing of structured data via networks, and it has been used widely for clinical data processing. In particular, XML is very useful in the fields of data standardization, data exchange, and data integration. Moreover, ontologies have been attracting increased attention in various clinical fields in recent years. An ontology is the basic level of a knowledge representation scheme, and various ontology repositories have been developed, such as Gene Ontology and BioPortal. The creation of these standardized repositories greatly facilitates clinical research in related fields. In this chapter, we discuss the basic concepts of XML and ontologies, as well as their clinical applications.

  18. Array2BIO: from microarray expression data to functional annotation of co-regulated genes

    Directory of Open Access Journals (Sweden)

    Rasley Amy

    2006-06-01

    Full Text Available Abstract Background There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility. Results Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1 comparative analysis of signal versus control and (2 clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns. Conclusion We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available at http://array2bio.dcode.org.

  19. The ontological model and the hybrid expert system for products and processes quality identification involving the approach based on system analysis and quality function deployment

    Directory of Open Access Journals (Sweden)

    Dmitriev Aleksandr

    2016-01-01

    Full Text Available Discussed model of quality of identification has improved mathematical tools and allows you to use a variety of additional information. The proposed robust method is a matrix MTQFD (Matrix Technique Quality Function Deployment allows you to determine not only the priorities but also the assessment of the target values of the product characteristics and process parameters, with the possible use of the information on the negative relationship. Designed ontological model, method and model of expert system versatile and can be used to identify the quality of services.

  20. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  1. Sugarcane genes related to mitochondrial function

    Directory of Open Access Journals (Sweden)

    Fonseca Ghislaine V.

    2001-01-01

    Full Text Available Mitochondria function as metabolic powerhouses by generating energy through oxidative phosphorylation and have become the focus of renewed interest due to progress in understanding the subtleties of their biogenesis and the discovery of the important roles which these organelles play in senescence, cell death and the assembly of iron-sulfur (Fe/S centers. Using proteins from the yeast Saccharomyces cerevisiae, Homo sapiens and Arabidopsis thaliana we searched the sugarcane expressed sequence tag (SUCEST database for the presence of expressed sequence tags (ESTs with similarity to nuclear genes related to mitochondrial functions. Starting with 869 protein sequences, we searched for sugarcane EST counterparts to these proteins using the basic local alignment search tool TBLASTN similarity searching program run against 260,781 sugarcane ESTs contained in 81,223 clusters. We were able to recover 367 clusters likely to represent sugarcane orthologues of the corresponding genes from S. cerevisiae, H. sapiens and A. thaliana with E-value <= 10-10. Gene products belonging to all functional categories related to mitochondrial functions were found and this allowed us to produce an overview of the nuclear genes required for sugarcane mitochondrial biogenesis and function as well as providing a starting point for detailed analysis of sugarcane gene structure and physiology.

  2. FunGeneNet: a web tool to estimate enrichment of functional interactions in experimental gene sets.

    Science.gov (United States)

    Tiys, Evgeny S; Ivanisenko, Timofey V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2018-02-09

    Estimation of functional connectivity in gene sets derived from genome-wide or other biological experiments is one of the essential tasks of bioinformatics. A promising approach for solving this problem is to compare gene networks built using experimental gene sets with random networks. One of the resources that make such an analysis possible is CrossTalkZ, which uses the FunCoup database. However, existing methods, including CrossTalkZ, do not take into account individual types of interactions, such as protein/protein interactions, expression regulation, transport regulation, catalytic reactions, etc., but rather work with generalized types characterizing the existence of any connection between network members. We developed the online tool FunGeneNet, which utilizes the ANDSystem and STRING to reconstruct gene networks using experimental gene sets and to estimate their difference from random networks. To compare the reconstructed networks with random ones, the node permutation algorithm implemented in CrossTalkZ was taken as a basis. To study the FunGeneNet applicability, the functional connectivity analysis of networks constructed for gene sets involved in the Gene Ontology biological processes was conducted. We showed that the method sensitivity exceeds 0.8 at a specificity of 0.95. We found that the significance level of the difference between gene networks of biological processes and random networks is determined by the type of connections considered between objects. At the same time, the highest reliability is achieved for the generalized form of connections that takes into account all the individual types of connections. By taking examples of the thyroid cancer networks and the apoptosis network, it is demonstrated that key participants in these processes are involved in the interactions of those types by which these networks differ from random ones. FunGeneNet is a web tool aimed at proving the functionality of networks in a wide range of sizes of

  3. Studying Functions of All Yeast Genes Simultaneously

    Science.gov (United States)

    Stolc, Viktor; Eason, Robert G.; Poumand, Nader; Herman, Zelek S.; Davis, Ronald W.; Anthony Kevin; Jejelowo, Olufisayo

    2006-01-01

    A method of studying the functions of all the genes of a given species of microorganism simultaneously has been developed in experiments on Saccharomyces cerevisiae (commonly known as baker's or brewer's yeast). It is already known that many yeast genes perform functions similar to those of corresponding human genes; therefore, by facilitating understanding of yeast genes, the method may ultimately also contribute to the knowledge needed to treat some diseases in humans. Because of the complexity of the method and the highly specialized nature of the underlying knowledge, it is possible to give only a brief and sketchy summary here. The method involves the use of unique synthetic deoxyribonucleic acid (DNA) sequences that are denoted as DNA bar codes because of their utility as molecular labels. The method also involves the disruption of gene functions through deletion of genes. Saccharomyces cerevisiae is a particularly powerful experimental system in that multiple deletion strains easily can be pooled for parallel growth assays. Individual deletion strains recently have been created for 5,918 open reading frames, representing nearly all of the estimated 6,000 genetic loci of Saccharomyces cerevisiae. Tagging of each deletion strain with one or two unique 20-nucleotide sequences enables identification of genes affected by specific growth conditions, without prior knowledge of gene functions. Hybridization of bar-code DNA to oligonucleotide arrays can be used to measure the growth rate of each strain over several cell-division generations. The growth rate thus measured serves as an index of the fitness of the strain.

  4. Assessing gene function in the ruminant placenta.

    Science.gov (United States)

    Anthony, R V; Cantlon, J D; Gates, K C; Purcell, S H; Clay, C M

    2010-01-01

    The placenta provides the means for nutrient transfer from the mother to the fetus, waste transfer from the fetus to the mother, protection of the fetus from the maternal immune system, and is an active endocrine organ. While many placental functions have been defined and investigated, assessing the function of specific genes expressed by the placenta has been problematic, since classical ablation-replacement methods are not feasible with the placenta. The pregnant sheep has been a long-standing animal model for assessing in vivo physiology during pregnancy, since surgical placement of indwelling catheters into both maternal and fetal vasculature has allowed the assessment of placental nutrient transfer and utilization, as well as placental hormone secretion, under unanesthetized-unstressed steady state sampling conditions. However, in ruminants the lack of well-characterized trophoblast cell lines and the inefficiency of creating transgenic pregnancies in ruminants have inhibited our ability to assess specific gene function. Recently, sheep and cattle primary trophoblast cell lines have been reported, and may further our ability to investigate trophoblast function and transcriptional regulation of genes expressed by the placenta. Furthermore, viral infection of the trophoectoderm layer of hatched blastocysts, as a means for placenta-specific transgenesis, holds considerable potential to assess gene function in the ruminant placenta. This approach has been used successfully to "knockdown" gene expression in the developing sheep conceptus, and has the potential for gain-of-function experiments as well. While this technology is still being developed, it may provide an efficient approach to assess specific gene function in the ruminant placenta.

  5. A high-resolution anatomical ontology of the developing murine genitourinary tract

    Science.gov (United States)

    Little, Melissa H.; Brennan, Jane; Georgas, Kylie; Davies, Jamie A.; Davidson, Duncan R.; Baldock, Richard A.; Beverdam, Annemiek; Bertram, John F.; Capel, Blanche; Chiu, Han Sheng; Clements, Dave; Cullen-McEwen, Luise; Fleming, Jean; Gilbert, Thierry; Houghton, Derek; Kaufman, Matt H.; Kleymenova, Elena; Koopman, Peter A.; Lewis, Alfor G.; McMahon, Andrew P.; Mendelsohn, Cathy L.; Mitchell, Eleanor K.; Rumballe, Bree A.; Sweeney, Derina E.; Valerius, M. Todd; Yamada, Gen; Yang, Yiya; Yu., Jing

    2007-01-01

    Cataloguing gene expression during development of the genitourinary tract will increase our understanding not only of this process but also of congenital defects and disease affecting this organ system. We have developed a high-resolution ontology with which to describe the subcompartments of the developing murine genitourinary tract. This ontology incorporates what can be defined histologically and begins to encompass other structures and cell types already identified at the molecular level. The ontology is being used to annotate in situ hybridisation data generated as part of the Genitourinary Development Molecular Anatomy Project (GUDMAP), a publicly available data resource on gene and protein expression during genitourinary development. The GUDMAP ontology encompasses Theiler stage (TS) 17 to 27 of development as well as the sexually mature adult. It has been written as a partonomic, text-based, hierarchical ontology that, for the embryological stages, has been developed as a high-resolution expansion of the existing Edinburgh Mouse Atlas Project (EMAP) ontology. It also includes group terms for well-characterised structural and/or functional units comprising several sub-structures, such as the nephron and juxtaglomerular complex. Each term has been assigned a unique identification number. Synonyms have been used to improve the success of query searching and maintain wherever possible existing EMAP terms relating to this organ system. We describe here the principles and structure of the ontology and provide representative diagrammatic, histological, and whole mount and section RNA in situ hybridisation images to clarify the terms used within the ontology. Visual examples of how terms appear in different specimen types are also provided. PMID:17452023

  6. Interaction between leptin and leptin receptor in gastric carcinoma: Gene ontology analysis Interacción entre la leptina y su receptor en el carcinoma gástrico: análisis de ontología genética

    Directory of Open Access Journals (Sweden)

    V. Wiwanitkit

    2007-04-01

    Full Text Available Gastric carcinoma is a rare but important malignancy. The link between leptin, a cytokine that is elevated in obese individuals, and cancer development has been proposed. It is noted that leptin and its receptor may play a positive role in the progression in gastric cancer. However, the exact mechanism resulting form the interaction between leptin and leptin receptor has never been clarified. Here, the author used a new gene ontology technology to predict the molecular function and biological process due to the interaction between leptin and leptin receptor. Comparing to leptin and leptin receptor, the leptin-leptin receptor poses the same function and biological process as leptin receptor. This can confirm that leptin receptor has a significant suppressive effect on the expression of leptin. Loss of hormone activity and disturbance of normal cell signaling pathway of leptin can be seen. Blocking of receptor might be rational therapeutic strategy.El carcinoma gástrico es un cáncer muy poco frecuente pero importante. Se ha postulado que la leptina, una citocina que aparece elevada en las personas obesas, está relacionada con el cáncer. Se sabe que la leptina y su receptor pueden desempeñar un papel positivo en la progresión del cáncer gástrico. Sin embargo, nunca se ha dilucidado el mecanismo exacto al que daría lugar la interacción entre la leptina y el receptor de leptina. Aquí, el autor empleó una nueva tecnología de ontología genética para predecir la función molecular y el proceso biológico resultantes de la interacción entre la leptina y su receptor. Frente a la leptina y su receptor, el compuesto leptina-receptor realiza la misma función y el mismo proceso biológico que el receptor de leptina. Esto puede confirmar que el receptor de leptina ejerce un importante efecto supresor sobre la expresión de leptina. Pueden observarse una pérdida de actividad hormonal y la alteración de la vía normal de señalización celular

  7. Assessment Applications of Ontologies.

    Science.gov (United States)

    Chung, Gregory K. W. K.; Niemi, David; Bewley, William L.

    This paper discusses the use of ontologies and their applications to assessment. An ontology provides a shared and common understanding of a domain that can be communicated among people and computational systems. The ontology captures one or more experts' conceptual representation of a domain expressed in terms of concepts and the relationships…

  8. Identification of functionally related genes using data mining and data integration: a breast cancer case study

    Directory of Open Access Journals (Sweden)

    Zucchi Ileana

    2009-10-01

    Full Text Available Abstract Background The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell, it is important to identify the set of genes to which it interacts with to determine cell function. In this context, the mining and the integration of a large amount of publicly available data, regarding the transcriptome and the proteome states of a cell, are a useful resource to complement biological research. Results We describe an approach for the identification of genes that interact with each other to regulate cell function. The strategy relies on the analysis of gene expression profile similarity, considering large datasets of expression data. During the similarity evaluation, the methodology determines the most significant subset of samples in which the evaluated genes are highly correlated. Hence, the strategy enables the exclusion of samples that are not relevant for each gene pair analysed. This feature is important when considering a large set of samples characterised by heterogeneous experimental conditions where different pools of biological processes can be active across the samples. The putative partners of the studied gene are then further characterised, analysing the distribution of the Gene Ontology terms and integrating the protein-protein interaction (PPI data. The strategy was applied for the analysis of the functional relationships of a gene of known function, Pyruvate Kinase, and for the prediction of functional partners of the human transcription factor TBX3. In both cases the analysis was done on a dataset composed by breast primary tumour expression data derived from the literature. Integration and analysis of PPI data confirmed the prediction of the methodology, since the genes identified to be functionally related were associated to proteins close in the PPI network

  9. The ontology-based answers (OBA) service: a connector for embedded usage of ontologies in applications.

    Science.gov (United States)

    Dönitz, Jürgen; Wingender, Edgar

    2012-01-01

    The semantic web depends on the use of ontologies to let electronic systems interpret contextual information. Optimally, the handling and access of ontologies should be completely transparent to the user. As a means to this end, we have developed a service that attempts to bridge the gap between experts in a certain knowledge domain, ontologists, and application developers. The ontology-based answers (OBA) service introduced here can be embedded into custom applications to grant access to the classes of ontologies and their relations as most important structural features as well as to information encoded in the relations between ontology classes. Thus computational biologists can benefit from ontologies without detailed knowledge about the respective ontology. The content of ontologies is mapped to a graph of connected objects which is compatible to the object-oriented programming style in Java. Semantic functions implement knowledge about the complex semantics of an ontology beyond the class hierarchy and "partOf" relations. By using these OBA functions an application can, for example, provide a semantic search function, or (in the examples outlined) map an anatomical structure to the organs it belongs to. The semantic functions relieve the application developer from the necessity of acquiring in-depth knowledge about the semantics and curation guidelines of the used ontologies by implementing the required knowledge. The architecture of the OBA service encapsulates the logic to process ontologies in order to achieve a separation from the application logic. A public server with the current plugins is available and can be used with the provided connector in a custom application in scenarios analogous to the presented use cases. The server and the client are freely available if a project requires the use of custom plugins or non-public ontologies. The OBA service and further documentation is available at http://www.bioinf.med.uni-goettingen.de/projects/oba.

  10. Mapping between the OBO and OWL ontology languages.

    Science.gov (United States)

    Tirmizi, Syed Hamid; Aitken, Stuart; Moreira, Dilvan A; Mungall, Chris; Sequeda, Juan; Shah, Nigam H; Miranker, Daniel P

    2011-03-07

    Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL. We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source. Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.

  11. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...... classification systems and meta data taxonomies, should be based on ontologies....

  12. Toxicology ontology perspectives.

    Science.gov (United States)

    Hardy, Barry; Apic, Gordana; Carthew, Philip; Clark, Dominic; Cook, David; Dix, Ian; Escher, Sylvia; Hastings, Janna; Heard, David J; Jeliazkova, Nina; Judson, Philip; Matis-Mitchell, Sherri; Mitic, Dragana; Myatt, Glenn; Shah, Imran; Spjuth, Ola; Tcheremenskaia, Olga; Toldo, Luca; Watson, David; White, Andrew; Yang, Chihae

    2012-01-01

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  13. The holistic claims of the biopsychosocial conception of WHO's International Classification of Functioning, Disability, and Health (ICF): a conceptual analysis on the basis of a pluralistic-holistic ontology and multidimensional view of the human being.

    Science.gov (United States)

    Solli, Hans Magnus; da Silva, António Barbosa

    2012-06-01

    The International Classification of Functioning, Disability and Health (ICF), designed by the WHO, attempts to provide a holistic model of functioning and disability by integrating a medical model with a social one. The aim of this article is to analyze the ICF's claim to holism. The following components of the ICF's complexity are analyzed: (1) health condition, (2) body functions and structures, (3) activity, (4) participation, (5) environmental factors, (6) personal factors, and (7) health. Although the ICF claims to be holistic, it presupposes a monistic materialistic ontology. We indicate some limitations of this ontology, proposing instead: (a) a pluralistic-holistic ontology (PHO) and (b) a multidimensional view of the human being, with individual and environmental aspects, in relation to three levels of reality implied by the PHO. For the ICF to attain its holistic claim, the interactions between its components should be based on (a) and (b).

  14. Methodology for Automatic Ontology Generation Using Database Schema Information

    Directory of Open Access Journals (Sweden)

    JungHyen An

    2018-01-01

    Full Text Available An ontology is a model language that supports the functions to integrate conceptually distributed domain knowledge and infer relationships among the concepts. Ontologies are developed based on the target domain knowledge. As a result, methodologies to automatically generate an ontology from metadata that characterize the domain knowledge are becoming important. However, existing methodologies to automatically generate an ontology using metadata are required to generate the domain metadata in a predetermined template, and it is difficult to manage data that are increased on the ontology itself when the domain OWL (Ontology Web Language individuals are continuously increased. The database schema has a feature of domain knowledge and provides structural functions to efficiently process the knowledge-based data. In this paper, we propose a methodology to automatically generate ontologies and manage the OWL individual through an interaction of the database and the ontology. We describe the automatic ontology generation process with example schema and demonstrate the effectiveness of the automatically generated ontology by comparing it with existing ontologies using the ontology quality score.

  15. Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors.

    Directory of Open Access Journals (Sweden)

    Haixi Miao

    2008-08-01

    Full Text Available To determine whether optic nerve head (ONH astrocytes, a key cellular component of glaucomatous neuropathy, exhibit differential gene expression in primary cultures of astrocytes from normal African American (AA donors compared to astrocytes from normal Caucasian American (CA donors.We used oligonucleotide Affymetrix microarray (HG U133A & HG U133A 2.0 chips to compare gene expression levels in cultured ONH astrocytes from twelve CA and twelve AA normal age matched donor eyes. Chips were normalized with Robust Microarray Analysis (RMA in R using Bioconductor. Significant differential gene expression levels were detected using mixed effects modeling and Statistical Analysis of Microarray (SAM. Functional analysis and Gene Ontology were used to classify differentially expressed genes. Differential gene expression was validated by quantitative real time RT-PCR. Protein levels were detected by Western blots and ELISA. Cell adhesion and migration assays tested physiological responses. Glutathione (GSH assay detected levels of intracellular GSH.Multiple analyses selected 87 genes differentially expressed between normal AA and CA (P<0.01. The most relevant genes expressed in AA were categorized by function, including: signal transduction, response to stress, ECM genes, migration and cell adhesion.These data show that normal astrocytes from AA and CA normal donors display distinct expression profiles that impact astrocyte functions in the ONH. Our data suggests that differences in gene expression in ONH astrocytes may be specific to the development and/or progression of glaucoma in AA.

  16. Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development

    Directory of Open Access Journals (Sweden)

    Mungall Christopher J

    2010-10-01

    Full Text Available Abstract Background The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation. Results We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology. Conclusions Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at https://sourceforge.net/tracker/?atid=605890&group_id=36855.

  17. Integrating phenotype ontologies with PhenomeNET

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel

    2017-12-19

    Background Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. Results Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. Conclusions PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.

  18. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Directory of Open Access Journals (Sweden)

    José Cuenca

    Full Text Available Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR to map a genome region linked to Alternaria brown spot (ABS resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  19. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  20. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression

    Directory of Open Access Journals (Sweden)

    Vandepoele Klaas

    2009-06-01

    Full Text Available Abstract Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization and components (e.g. ARPs, actin-related proteins exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.

  1. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    Science.gov (United States)

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out

  2. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  3. A UML profile for the OBO relation ontology

    Science.gov (United States)

    2012-01-01

    Background Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions The use of an established and well-known graphical language in the development of biomedical ontologies provides a more

  4. [Fanconi anemia: genes and function(s) revisited].

    Science.gov (United States)

    Papadopoulo, Dora; Moustacchi, Ethel

    2005-01-01

    Fanconi anemia (FA), a rare inherited disorder, exhibits a complex phenotype including progressive bone marrow failure, congenital malformations and increased risk of cancers, mainly acute myeloid leukaemia. At the cellular level, FA is characterized by hypersensitivity to DNA cross-linking agents and by high frequencies of induced chromosomal aberrations, a property used for diagnosis. FA results from mutations in one of the eleven FANC (FANCA to FANCJ) genes. Nine of them have been identified. In addition, FANCD1 gene has been shown to be identical to BRCA2, one of the two breast cancer susceptibility genes. Seven of the FANC proteins form a complex, which exists in four different forms depending of its subcellular localisation. Four FANC proteins (D1(BRCA2), D2, I and J) are not associated to the complex. The presence of the nuclear form of the FA core complex is necessary for the mono-ubiquitinylation of FANCD2 protein, a modification required for its re-localization to nuclear foci, likely to be sites of DNA repair. A clue towards understanding the molecular function of the FANC genes comes from the recently identified connection of FANC to the BRCA1, ATM, NBS1 and ATR genes. Two of the FANC proteins (A and D2) directly interact with BRCA1, which in turn interacts with the MRE11/RAD50/NBS1 complex, which is one of the key components in the mechanisms involved in the cellular response to DNA double strand breaks (DSB). Moreover, ATM, a protein kinase that plays a central role in the network of DSB signalling, phosphorylates in vitro and in vivo FANCD2 in response to ionising radiations. Moreover, the NBS1 protein and the monoubiquitinated form of FANCD2 seem to act together in response to DNA crosslinking agents. Taken together with the previously reported impaired DSB and DNA interstrand crosslinks repair in FA cells, the connection of FANC genes to the ATM, ATR, NBS1 and BRCA1 links the FANC genes function to the finely orchestrated network involved in the

  5. Towards Agile Ontology Maintenance

    Science.gov (United States)

    Luczak-Rösch, Markus

    Ontologies are an appropriate means to represent knowledge on the Web. Research on ontology engineering reached practices for an integrative lifecycle support. However, a broader success of ontologies in Web-based information systems remains unreached while the more lightweight semantic approaches are rather successful. We assume, paired with the emerging trend of services and microservices on the Web, new dynamic scenarios gain momentum in which a shared knowledge base is made available to several dynamically changing services with disparate requirements. Our work envisions a step towards such a dynamic scenario in which an ontology adapts to the requirements of the accessing services and applications as well as the user's needs in an agile way and reduces the experts' involvement in ontology maintenance processes.

  6. Conceptual querying through ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2009-01-01

    is motivated by an obvious need for users to survey huge volumes of objects in query answers. An ontology formalism and a special notion of-instantiated ontology" are introduced. The latter is a structure reflecting the content in the document collection in that; it is a restriction of a general world......We present here ail approach to conceptual querying where the aim is, given a collection of textual database objects or documents, to target an abstraction of the entire database content in terms of the concepts appearing in documents, rather than the documents in the collection. The approach...... knowledge ontology to the concepts instantiated in the collection. The notion of ontology-based similarity is briefly described, language constructs for direct navigation and retrieval of concepts in the ontology are discussed and approaches to conceptual summarization are presented....

  7. Survey on Ontology Mapping

    Science.gov (United States)

    Zhu, Junwu

    To create a sharable semantic space in which the terms from different domain ontology or knowledge system, Ontology mapping become a hot research point in Semantic Web Community. In this paper, motivated factors of ontology mapping research are given firstly, and then 5 dominating theories and methods, such as information accessing technology, machine learning, linguistics, structure graph and similarity, are illustrated according their technology class. Before we analyses the new requirements and takes a long view, the contributions of these theories and methods are summarized in details. At last, this paper suggest to design a group of semantic connector with the ability of migration learning for OWL-2 extended with constrains and the ontology mapping theory of axiom, so as to provide a new methodology for ontology mapping.

  8. Annotating breast cancer microarray samples using ontologies

    Science.gov (United States)

    Liu, Hongfang; Li, Xin; Yoon, Victoria; Clarke, Robert

    2008-01-01

    As the most common cancer among women, breast cancer results from the accumulation of mutations in essential genes. Recent advance in high-throughput gene expression microarray technology has inspired researchers to use the technology to assist breast cancer diagnosis, prognosis, and treatment prediction. However, the high dimensionality of microarray experiments and public access of data from many experiments have caused inconsistencies which initiated the development of controlled terminologies and ontologies for annotating microarray experiments, such as the standard microarray Gene Expression Data (MGED) ontology (MO). In this paper, we developed BCM-CO, an ontology tailored specifically for indexing clinical annotations of breast cancer microarray samples from the NCI Thesaurus. Our research showed that the coverage of NCI Thesaurus is very limited with respect to i) terms used by researchers to describe breast cancer histology (covering 22 out of 48 histology terms); ii) breast cancer cell lines (covering one out of 12 cell lines); and iii) classes corresponding to the breast cancer grading and staging. By incorporating a wider range of those terms into BCM-CO, we were able to indexed breast cancer microarray samples from GEO using BCM-CO and MGED ontology and developed a prototype system with web interface that allows the retrieval of microarray data based on the ontology annotations. PMID:18999108

  9. Nuclear Nonproliferation Ontology Assessment Team Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Strasburg, Jana D.; Hohimer, Ryan E.

    2012-01-01

    Final Report for the NA22 Simulations, Algorithm and Modeling (SAM) Ontology Assessment Team's efforts from FY09-FY11. The Ontology Assessment Team began in May 2009 and concluded in September 2011. During this two-year time frame, the Ontology Assessment team had two objectives: (1) Assessing the utility of knowledge representation and semantic technologies for addressing nuclear nonproliferation challenges; and (2) Developing ontological support tools that would provide a framework for integrating across the Simulation, Algorithm and Modeling (SAM) program. The SAM Program was going through a large assessment and strategic planning effort during this time and as a result, the relative importance of these two objectives changed, altering the focus of the Ontology Assessment Team. In the end, the team conducted an assessment of the state of art, created an annotated bibliography, and developed a series of ontological support tools, demonstrations and presentations. A total of more than 35 individuals from 12 different research institutions participated in the Ontology Assessment Team. These included subject matter experts in several nuclear nonproliferation-related domains as well as experts in semantic technologies. Despite the diverse backgrounds and perspectives, the Ontology Assessment team functioned very well together and aspects could serve as a model for future inter-laboratory collaborations and working groups. While the team encountered several challenges and learned many lessons along the way, the Ontology Assessment effort was ultimately a success that led to several multi-lab research projects and opened up a new area of scientific exploration within the Office of Nuclear Nonproliferation and Verification.

  10. Quality control for terms and definitions in ontologies and taxonomies

    Directory of Open Access Journals (Sweden)

    Rüegg Alexander

    2006-04-01

    Full Text Available Abstract Background Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO, the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. Results We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. Conclusion Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.

  11. GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Directory of Open Access Journals (Sweden)

    Weiller Georg

    2007-03-01

    Full Text Available Abstract Background To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. Results We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. Conclusion GeneBins provides resources to interpret gene expression results from microarray experiments. It is available at http://bioinfoserver.rsbs.anu.edu.au/utils/GeneBins/

  12. Practical ontologies for information professionals

    CERN Document Server

    AUTHOR|(CDS)2071712

    2016-01-01

    Practical Ontologies for Information Professionals provides an introduction to ontologies and their development, an essential tool for fighting back against information overload. The development of robust and widely used ontologies is an increasingly important tool in the fight against information overload. The publishing and sharing of explicit explanations for a wide variety of conceptualizations, in a machine readable format, has the power to both improve information retrieval and identify new knowledge. This new book provides an accessible introduction to the following: * What is an ontology? Defining the concept and why it is increasingly important to the information professional * Ontologies and the semantic web * Existing ontologies, such as SKOS, OWL, FOAF, schema.org, and the DBpedia Ontology * Adopting and building ontologies, showing how to avoid repetition of work and how to build a simple ontology with Protege * Interrogating semantic web ontologies * The future of ontologies and the role of the ...

  13. Diverse gene functions in a soil mobilome

    DEFF Research Database (Denmark)

    Luo, Wenting; Xu, Zhuofei; Riber, Leise

    2016-01-01

    Accessing bacterial mobilomes of any given environment enables the investigation of genetic traits encoded by circular genetic elements, and how their transfer drives the adaptation of microbial communities. Here we take advantage of Illumina HiSeq sequencing and report, for the first time......, the soil mobilome sampled from a well-characterized field in Hygum, Denmark. Soil bacterial cells were obtained by Nycodenz extraction, total DNA was purified by removing sheared chromosomal DNA using exonuclease digestion, and the remaining circular DNA was amplified with the phi29 polymerase and finally...... sequenced. The soil mobilome represented a wide range of known bacterial gene functions and highlighted the enrichment of plasmids, transposable elements and phages when compared to a well-characterized soil metagenome that, on the other hand, was dominated by basic biosynthesis and metabolism functions...

  14. Ontological foundations for evolutionary economics: A Darwinian social ontology

    NARCIS (Netherlands)

    Stoelhorst, J.W.

    2008-01-01

    The purpose of this paper is to further the project of generalized Darwinism by developing a social ontology on the basis of a combined commitment to ontological continuity and ontological commonality. Three issues that are central to the development of a social ontology are addressed: (1) the

  15. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes.

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-03-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  16. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Directory of Open Access Journals (Sweden)

    Saber Jelokhani-Niaraki

    2015-03-01

    Full Text Available During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

  17. An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

    Science.gov (United States)

    Jelokhani-Niaraki, Saber; Minuchehr, Zarrin; Nassiri, Mohammad Reza

    2015-01-01

    During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and Protégé as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data. PMID:25873847

  18. Biochemical mechanisms determine the functional compatibility of heterologous genes

    DEFF Research Database (Denmark)

    Porse, Andreas; Schou, Thea S.; Munck, Christian

    2018-01-01

    -gene libraries have suggested that sequence composition is a strong barrier for the successful integration of heterologous genes. Here we sample 200 diverse genes, representing >80% of sequenced antibiotic resistance genes, to interrogate the factors governing genetic compatibility in new hosts. In contrast...... factors governing the functionality and fitness of antibiotic resistance genes. These findings emphasize the importance of biochemical mechanism for heterologous gene compatibility, and suggest physiological constraints as a pivotal feature orienting the evolution of antibiotic resistance....

  19. MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions.

    Science.gov (United States)

    Blank, Carrine E; Cui, Hong; Moore, Lisa R; Walls, Ramona L

    2016-01-01

    MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO's Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we

  20. Perspectives on ontology learning

    CERN Document Server

    Lehmann, J

    2014-01-01

    Perspectives on Ontology Learning brings together researchers and practitioners from different communities − natural language processing, machine learning, and the semantic web − in order to give an interdisciplinary overview of recent advances in ontology learning.Starting with a comprehensive introduction to the theoretical foundations of ontology learning methods, the edited volume presents the state-of-the-start in automated knowledge acquisition and maintenance. It outlines future challenges in this area with a special focus on technologies suitable for pushing the boundaries beyond the c

  1. A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

    Science.gov (United States)

    Feng, Shou; Fu, Ping; Zheng, Wenbin

    2018-03-01

    Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.

  2. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  3. Ontology of fractures

    Science.gov (United States)

    Zhong, Jian; Aydina, Atilla; McGuinness, Deborah L.

    2009-03-01

    Fractures are fundamental structures in the Earth's crust and they can impact many societal and industrial activities including oil and gas exploration and production, aquifer management, CO 2 sequestration, waste isolation, the stabilization of engineering structures, and assessing natural hazards (earthquakes, volcanoes, and landslides). Therefore, an ontology which organizes the concepts of fractures could help facilitate a sound education within, and communication among, the highly diverse professional and academic community interested in the problems cited above. We developed a process-based ontology that makes explicit specifications about fractures, their properties, and the deformation mechanisms which lead to their formation and evolution. Our ontology emphasizes the relationships among concepts such as the factors that influence the mechanism(s) responsible for the formation and evolution of specific fracture types. Our ontology is a valuable resource with a potential to applications in a number of fields utilizing recent advances in Information Technology, specifically for digital data and information in computers, grids, and Web services.

  4. Age-Related Gene Expression in the Frontal Cortex Suggests Synaptic Function Changes in Specific Inhibitory Neuron Subtypes

    Directory of Open Access Journals (Sweden)

    Leon French

    2017-05-01

    Full Text Available Genome-wide expression profiling of the human brain has revealed genes that are differentially expressed across the lifespan. Characterizing these genes adds to our understanding of both normal functions and pathological conditions. Additionally, the specific cell-types that contribute to the motor, sensory and cognitive declines during aging are unclear. Here we test if age-related genes show higher expression in specific neural cell types. Our study leverages data from two sources of murine single-cell expression data and two sources of age-associations from large gene expression studies of postmortem human brain. We used nonparametric gene set analysis to test for age-related enrichment of genes associated with specific cell-types; we also restricted our analyses to specific gene ontology groups. Our analyses focused on a primary pair of single-cell expression data from the mouse visual cortex and age-related human post-mortem gene expression information from the orbitofrontal cortex. Additional pairings that used data from the hippocampus, prefrontal cortex, somatosensory cortex and blood were used to validate and test specificity of our findings. We found robust age-related up-regulation of genes that are highly expressed in oligodendrocytes and astrocytes, while genes highly expressed in layer 2/3 glutamatergic neurons were down-regulated across age. Genes not specific to any neural cell type were also down-regulated, possibly due to the bulk tissue source of the age-related genes. A gene ontology-driven dissection of the cell-type enriched genes highlighted the strong down-regulation of genes involved in synaptic transmission and cell-cell signaling in the Somatostatin (Sst neuron subtype that expresses the cyclin dependent kinase 6 (Cdk6 and in the vasoactive intestinal peptide (Vip neuron subtype expressing myosin binding protein C, slow type (Mybpc1. These findings provide new insights into cell specific susceptibility to normal aging

  5. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  6. Manufacturing ontology through templates

    Directory of Open Access Journals (Sweden)

    Diciuc Vlad

    2017-01-01

    Full Text Available The manufacturing industry contains a high volume of knowhow and of high value, much of it being held by key persons in the company. The passing of this know-how is the basis of manufacturing ontology. Among other methods like advanced filtering and algorithm based decision making, one way of handling the manufacturing ontology is via templates. The current paper tackles this approach and highlights the advantages concluding with some recommendations.

  7. The Electronic Notebook Ontology

    OpenAIRE

    Chalk, Stuart

    2016-01-01

    Science is rapidly being brought into the electronic realm and electronic laboratory notebooks (ELN) are a big part of this activity. The representation of the scientific process in the context of an ELN is an important component to making the data recorded in ELNs semantically integrated. This presentation will outline initial developments of an Electronic Notebook Ontology (ENO) that will help tie together the ExptML ontology, HCLS Community Profile data descriptions, and the VIVO-ISF ontol...

  8. Ontology Update in the Cognitive Model of Ontology Learning

    Directory of Open Access Journals (Sweden)

    Zhang De-Hai

    2016-01-01

    Full Text Available Ontology has been used in many hot-spot fields, but most ontology construction methods are semiautomatic, and the construction process of ontology is still a tedious and painstaking task. In this paper, a kind of cognitive models is presented for ontology learning which can simulate human being’s learning from world. In this model, the cognitive strategies are applied with the constrained axioms. Ontology update is a key step when the new knowledge adds into the existing ontology and conflict with old knowledge in the process of ontology learning. This proposal designs and validates the method of ontology update based on the axiomatic cognitive model, which include the ontology update postulates, axioms and operations of the learning model. It is proved that these operators subject to the established axiom system.

  9. Predicting Hydrologic Function With Aquatic Gene Fragments

    Science.gov (United States)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  10. Ontology-based validation and identification of regulatory phenotypes

    KAUST Repository

    Kulmanov, Maxat

    2018-01-31

    Motivation: Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations Results: We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. Our method can also be applied to the rule-based prediction of phenotypes from functions. We show that the predicted phenotypes can be utilized for identification of protein-protein interactions and gene-disease associations. Based on experimental functional annotations, we predict phenotypes for 1,986 genes in mouse and 7,301 genes in human for which no experimental phenotypes have yet been determined.

  11. Ontology-based validation and identification of regulatory phenotypes

    KAUST Repository

    Kulmanov, Maxat; Schofield, Paul N; Gkoutos, Georgios V; Hoehndorf, Robert

    2018-01-01

    Motivation: Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations Results: We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. Our method can also be applied to the rule-based prediction of phenotypes from functions. We show that the predicted phenotypes can be utilized for identification of protein-protein interactions and gene-disease associations. Based on experimental functional annotations, we predict phenotypes for 1,986 genes in mouse and 7,301 genes in human for which no experimental phenotypes have yet been determined.

  12. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  13. Non-functional genes repaired at the RNA level.

    Science.gov (United States)

    Burger, Gertraud

    2016-01-01

    Genomes and genes continuously evolve. Gene sequences undergo substitutions, deletions or nucleotide insertions; mobile genetic elements invade genomes and interleave in genes; chromosomes break, even within genes, and pieces reseal in reshuffled order. To maintain functional gene products and assure an organism's survival, two principal strategies are used - either repair of the gene itself or of its product. I will introduce common types of gene aberrations and how gene function is restored secondarily, and then focus on systematically fragmented genes found in a poorly studied protist group, the diplonemids. Expression of their broken genes involves restitching of pieces at the RNA-level, and substantial RNA editing, to compensate for point mutations. I will conclude with thoughts on how such a grotesquely unorthodox system may have evolved, and why this group of organisms persists and thrives since tens of millions of years. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  14. The Cognitive Paradigm Ontology: Design and Application

    Science.gov (United States)

    Laird, Angela R.

    2013-01-01

    We present the basic structure of the Cognitive Paradigm Ontology (CogPO) for human behavioral experiments. While the experimental psychology and cognitive neuroscience literature may refer to certain behavioral tasks by name (e.g., the Stroop paradigm or the Sternberg paradigm) or by function (a working memory task, a visual attention task), these paradigms can vary tremendously in the stimuli that are presented to the subject, the response expected from the subject, and the instructions given to the subject. Drawing from the taxonomy developed and used by the BrainMap project (www.brainmap.org) for almost two decades to describe key components of published functional imaging results, we have developed an ontology capable of representing certain characteristics of the cognitive paradigms used in the fMRI and PET literature. The Cognitive Paradigm Ontology is being developed to be compliant with the Basic Formal Ontology (BFO), and to harmonize where possible with larger ontologies such as RadLex, NeuroLex, or the Ontology of Biomedical Investigations (OBI). The key components of CogPO include the representation of experimental conditions focused on the stimuli presented, the instructions given, and the responses requested. The use of alternate and even competitive terminologies can often impede scientific discoveries. Categorization of paradigms according to stimulus, response, and instruction has been shown to allow advanced data retrieval techniques by searching for similarities and contrasts across multiple paradigm levels. The goal of CogPO is to develop, evaluate, and distribute a domain ontology of cognitive paradigms for application and use in the functional neuroimaging community. PMID:21643732

  15. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  16. CRAVE: a database, middleware and visualization system for phenotype ontologies.

    Science.gov (United States)

    Gkoutos, Georgios V; Green, Eain C J; Greenaway, Simon; Blake, Andrew; Mallon, Ann-Marie; Hancock, John M

    2005-04-01

    A major challenge in modern biology is to link genome sequence information to organismal function. In many organisms this is being done by characterizing phenotypes resulting from mutations. Efficiently expressing phenotypic information requires combinatorial use of ontologies. However tools are not currently available to visualize combinations of ontologies. Here we describe CRAVE (Concept Relation Assay Value Explorer), a package allowing storage, active updating and visualization of multiple ontologies. CRAVE is a web-accessible JAVA application that accesses an underlying MySQL database of ontologies via a JAVA persistent middleware layer (Chameleon). This maps the database tables into discrete JAVA classes and creates memory resident, interlinked objects corresponding to the ontology data. These JAVA objects are accessed via calls through the middleware's application programming interface. CRAVE allows simultaneous display and linking of multiple ontologies and searching using Boolean and advanced searches.

  17. Analysis of the robustness of network-based disease-gene prioritization methods reveals redundancy in the human interactome and functional diversity of disease-genes.

    Directory of Open Access Journals (Sweden)

    Emre Guney

    Full Text Available Complex biological systems usually pose a trade-off between robustness and fragility where a small number of perturbations can substantially disrupt the system. Although biological systems are robust against changes in many external and internal conditions, even a single mutation can perturb the system substantially, giving rise to a pathophenotype. Recent advances in identifying and analyzing the sequential variations beneath human disorders help to comprehend a systemic view of the mechanisms underlying various disease phenotypes. Network-based disease-gene prioritization methods rank the relevance of genes in a disease under the hypothesis that genes whose proteins interact with each other tend to exhibit similar phenotypes. In this study, we have tested the robustness of several network-based disease-gene prioritization methods with respect to the perturbations of the system using various disease phenotypes from the Online Mendelian Inheritance in Man database. These perturbations have been introduced either in the protein-protein interaction network or in the set of known disease-gene associations. As the network-based disease-gene prioritization methods are based on the connectivity between known disease-gene associations, we have further used these methods to categorize the pathophenotypes with respect to the recoverability of hidden disease-genes. Our results have suggested that, in general, disease-genes are connected through multiple paths in the human interactome. Moreover, even when these paths are disturbed, network-based prioritization can reveal hidden disease-gene associations in some pathophenotypes such as breast cancer, cardiomyopathy, diabetes, leukemia, parkinson disease and obesity to a greater extend compared to the rest of the pathophenotypes tested in this study. Gene Ontology (GO analysis highlighted the role of functional diversity for such diseases.

  18. ``Force,'' ontology, and language

    Science.gov (United States)

    Brookes, David T.; Etkina, Eugenia

    2009-06-01

    We introduce a linguistic framework through which one can interpret systematically students’ understanding of and reasoning about force and motion. Some researchers have suggested that students have robust misconceptions or alternative frameworks grounded in everyday experience. Others have pointed out the inconsistency of students’ responses and presented a phenomenological explanation for what is observed, namely, knowledge in pieces. We wish to present a view that builds on and unifies aspects of this prior research. Our argument is that many students’ difficulties with force and motion are primarily due to a combination of linguistic and ontological difficulties. It is possible that students are primarily engaged in trying to define and categorize the meaning of the term “force” as spoken about by physicists. We found that this process of negotiation of meaning is remarkably similar to that engaged in by physicists in history. In this paper we will describe a study of the historical record that reveals an analogous process of meaning negotiation, spanning multiple centuries. Using methods from cognitive linguistics and systemic functional grammar, we will present an analysis of the force and motion literature, focusing on prior studies with interview data. We will then discuss the implications of our findings for physics instruction.

  19. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert

    2017-01-01

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often

  20. Mining rare associations between biological ontologies.

    Science.gov (United States)

    Benites, Fernando; Simon, Svenja; Sapozhnikova, Elena

    2014-01-01

    The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

  1. Mining rare associations between biological ontologies.

    Directory of Open Access Journals (Sweden)

    Fernando Benites

    Full Text Available The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

  2. DeMO: An Ontology for Discrete-event Modeling and Simulation

    Science.gov (United States)

    Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

    2011-01-01

    Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

  3. Prioritising lexical patterns to increase axiomatisation in biomedical ontologies. The role of localisation and modularity.

    Science.gov (United States)

    Quesada-Martínez, M; Fernández-Breis, J T; Stevens, R; Mikroyannidi, E

    2015-01-01

    This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". In previous work, we have defined methods for the extraction of lexical patterns from labels as an initial step towards semi-automatic ontology enrichment methods. Our previous findings revealed that many biomedical ontologies could benefit from enrichment methods using lexical patterns as a starting point.Here, we aim to identify which lexical patterns are appropriate for ontology enrichment, driving its analysis by metrics to prioritised the patterns. We propose metrics for suggesting which lexical regularities should be the starting point to enrich complex ontologies. Our method determines the relevance of a lexical pattern by measuring its locality in the ontology, that is, the distance between the classes associated with the pattern, and the distribution of the pattern in a certain module of the ontology. The methods have been applied to four significant biomedical ontologies including the Gene Ontology and SNOMED CT. The metrics provide information about the engineering of the ontologies and the relevance of the patterns. Our method enables the suggestion of links between classes that are not made explicit in the ontology. We propose a prioritisation of the lexical patterns found in the analysed ontologies. The locality and distribution of lexical patterns offer insights into the further engineering of the ontology. Developers can use this information to improve the axiomatisation of their ontologies.

  4. OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

    Directory of Open Access Journals (Sweden)

    Johnson Helen L

    2008-01-01

    Full Text Available Abstract Background Information extraction (IE efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport, protein-protein interaction assertions (interaction and assertions that a gene is expressed in a cell type (expression. Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85. Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for

  5. Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism’s genome (such as the mouse genome in order to make physiological inferences about the role of genes and proteins in a less characterized organism’s genome (such as the Burmese python. We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1 production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2 enhanced assisted reproduction technology for endangered and captive reptiles; and (3 novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value.

  6. Computing an Ontological Semantics for a Natural Language Fragment

    DEFF Research Database (Denmark)

    Szymczak, Bartlomiej Antoni

    tried to establish a domain independent “ontological semantics” for relevant fragments of natural language. The purpose of this research is to develop methods and systems for taking advantage of formal ontologies for the purpose of extracting the meaning contents of texts. This functionality...

  7. Knowledge Representation in Patient Safety Reporting: An Ontological Approach

    Directory of Open Access Journals (Sweden)

    Liang Chen

    2016-10-01

    Full Text Available Purpose: The current development of patient safety reporting systems is criticized for loss of information and low data quality due to the lack of a uniformed domain knowledge base and text processing functionality. To improve patient safety reporting, the present paper suggests an ontological representation of patient safety knowledge. Design/methodology/approach: We propose a framework for constructing an ontological knowledge base of patient safety. The present paper describes our design, implementation, and evaluation of the ontology at its initial stage. Findings: We describe the design and initial outcomes of the ontology implementation. The evaluation results demonstrate the clinical validity of the ontology by a self-developed survey measurement. Research limitations: The proposed ontology was developed and evaluated using a small number of information sources. Presently, US data are used, but they are not essential for the ultimate structure of the ontology. Practical implications: The goal of improving patient safety can be aided through investigating patient safety reports and providing actionable knowledge to clinical practitioners. As such, constructing a domain specific ontology for patient safety reports serves as a cornerstone in information collection and text mining methods. Originality/value: The use of ontologies provides abstracted representation of semantic information and enables a wealth of applications in a reporting system. Therefore, constructing such a knowledge base is recognized as a high priority in health care.

  8. Ontology: ambiguity and accuracy

    Directory of Open Access Journals (Sweden)

    Marcelo Schiessl

    2012-08-01

    Full Text Available Ambiguity is a major obstacle to information retrieval. It is source of several researches in Information Science. Ontologies have been studied in order to solve problems related to ambiguities. Paradoxically, “ontology” term is also ambiguous and it is understood according to the use by the community. Philosophy and Computer Science seems to have the most accentuated difference related to the term sense. The former holds undisputed tradition and authority. The latter, in despite of being quite recent, holds an informal sense, but pragmatic. Information Science acts ranging from philosophical to computational approaches so as to get organized collections based on balance between users’ necessities and available information. The semantic web requires informational cycle automation and demands studies related to ontologies. Consequently, revisiting relevant approaches for the study of ontologies plays a relevant role as a way to provide useful ideas to researchers maintaining philosophical rigor, and convenience provided by computers.

  9. Ontological engineering versus metaphysics

    Science.gov (United States)

    Tataj, Emanuel; Tomanek, Roman; Mulawka, Jan

    2011-10-01

    It has been recognized that ontologies are a semantic version of world wide web and can be found in knowledge-based systems. A recent time survey of this field also suggest that practical artificial intelligence systems may be motivated by this research. Especially strong artificial intelligence as well as concept of homo computer can also benefit from their use. The main objective of this contribution is to present and review already created ontologies and identify the main advantages which derive such approach for knowledge management systems. We would like to present what ontological engineering borrows from metaphysics and what a feedback it can provide to natural language processing, simulations and modelling. The potential topics of further development from philosophical point of view is also underlined.

  10. Process attributes in bio-ontologies

    Directory of Open Access Journals (Sweden)

    Andrade André Q

    2012-08-01

    Full Text Available Abstract Background Biomedical processes can provide essential information about the (mal- functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency. Results We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity. Conclusions We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.

  11. A Method for Building Personalized Ontology Summaries

    OpenAIRE

    Queiroz-Sousa, Paulo Orlando; Salgado, Ana Carolina; Pires, Carlos Eduardo

    2013-01-01

    In the context of ontology engineering, the ontology understanding is the basis for its further developmentand reuse. One intuitive eective approach to support ontology understanding is the process of ontology summarizationwhich highlights the most important concepts of an ontology. Ontology summarization identies an excerpt from anontology that contains the most relevant concepts and produces an abridged ontology. In this article, we present amethod for summarizing ontologies that represent ...

  12. Ontology and medical diagnosis.

    Science.gov (United States)

    Bertaud-Gounot, Valérie; Duvauferrier, Régis; Burgun, Anita

    2012-03-01

    Ontology and associated generic tools are appropriate for knowledge modeling and reasoning, but most of the time, disease definitions in existing description logic (DL) ontology are not sufficient to classify patient's characteristics under a particular disease because they do not formalize operational definitions of diseases (association of signs and symptoms=diagnostic criteria). The main objective of this study is to propose an ontological representation which takes into account the diagnostic criteria on which specific patient conditions may be classified under a specific disease. This method needs as a prerequisite a clear list of necessary and sufficient diagnostic criteria as defined for lots of diseases by learned societies. It does not include probability/uncertainty which Web Ontology Language (OWL 2.0) cannot handle. We illustrate it with spondyloarthritis (SpA). Ontology has been designed in Protégé 4.1 OWL-DL2.0. Several kinds of criteria were formalized: (1) mandatory criteria, (2) picking two criteria among several diagnostic criteria, (3) numeric criteria. Thirty real patient cases were successfully classified with the reasoner. This study shows that it is possible to represent operational definitions of diseases with OWL and successfully classify real patient cases. Representing diagnostic criteria as descriptive knowledge (instead of rules in Semantic Web Rule Language or Prolog) allows us to take advantage of tools already available for OWL. While we focused on Assessment of SpondyloArthritis international Society SpA criteria, we believe that many of the representation issues addressed here are relevant to using OWL-DL for operational definition of other diseases in ontology.

  13. Analysis of multiplex gene expression maps obtained by voxelation.

    Science.gov (United States)

    An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios

    2009-04-29

    Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental

  14. Analysis of multiplex gene expression maps obtained by voxelation

    Directory of Open Access Journals (Sweden)

    Smith Desmond J

    2009-04-01

    Full Text Available Abstract Background Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. Results To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in

  15. Comparative mapping reveals similar linkage of functional genes to ...

    Indian Academy of Sciences (India)

    genes between O. sativa and B. napus may have consistent function and control similar traits, which may be ..... acea chromosomes reveals islands of conserved organization. ... 1998 Conserved structure and function of the Arabidopsis flow-.

  16. The First Organ-Based Ontology for Arthropods (Ontology of Arthropod Circulatory Systems - OArCS) and its Integration into a Novel Formalization Scheme for Morphological Descriptions.

    Science.gov (United States)

    Wirkner, Christian S; Göpel, Torben; Runge, Jens; Keiler, Jonas; Klussmann-Fricke, Bastian-Jesper; Huckstorf, Katarina; Scholz, Stephan; Mikó, István; J Yoder, Matthew; Richter, Stefan

    2017-09-01

    ontology. That is, descriptions in ontologies are only descriptions of individuals if they are necessary/and or sufficient representations of attributes (independently) observed and recorded for an individual. In addition, we here present for the first time an entirely new approach to formalizing phenotypic research, a semantic model for the description of a complex organ system in a highly disparate taxon, the arthropods. We demonstrate this with a formalized morphological description of the hemolymph vascular system in one specimen of the European garden spider Araneus diadematus. Our description targets five categories of descriptive statement: "position", "spatial relationships", "shape", "constituents", and "connections", as the corresponding formalizations constitute exemplary patterns useful not only when talking about the circulatory system, but also in descriptions in general. The downstream applications of computer-parsable morphological descriptions are widespread, with their core utility being the fact that they make it possible to compare collective description sets in computational time, that is, very quickly. Among other things, this facilitates the identification of phenotypic plasticity and variation when single individuals are compared, the identification of those traits which correlate between and within taxa, and the identification of links between morphological traits and genetic (using GO, Gene Ontology) or environmental (using ENVO, Environmental Ontology) factors. [Arthropoda; concept; function; hemolymph vascular system; homology; terminology.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Core Semantics for Public Ontologies

    National Research Council Canada - National Science Library

    Suni, Niranjan

    2005-01-01

    ... (schemas or ontologies) with respect to objects. The DARPA Agent Markup Language (DAML) through the use of ontologies provides a very powerful way to describe objects and their relationships to other objects...

  18. Learning expressive ontologies

    CERN Document Server

    Völker, J

    2009-01-01

    This publication advances the state-of-the-art in ontology learning by presenting a set of novel approaches to the semi-automatic acquisition, refinement and evaluation of logically complex axiomatizations. It has been motivated by the fact that the realization of the semantic web envisioned by Tim Berners-Lee is still hampered by the lack of ontological resources, while at the same time more and more applications of semantic technologies emerge from fast-growing areas such as e-business or life sciences. Such knowledge-intensive applications, requiring large scale reasoning over complex domai

  19. ONTOLOGY IN PHARMACY

    Directory of Open Access Journals (Sweden)

    L. Yu. Babintseva

    2015-05-01

    Full Text Available It’s considered ontological models for formalization of knowledge in pharmacy. There is emphasized the view that the possibility of rapid exchange of information in the pharmaceutical industry, it is necessary to create a single information space. This means not only the establishment of uniform standards for the presentation of information on pharmaceutical groups pharmacotherapeutic classifications, but also the creation of a unified and standardized system for the transfer and renewal of knowledge. It is the organization of information in the ontology helps quickly in the future to build expert systems and applications to work with data.

  20. Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach.

    Science.gov (United States)

    Panahiazar, Maryam; Sheth, Amit P; Ranabahu, Ajith; Vos, Rutger A; Leebens-Mack, Jim

    2013-01-01

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.

  1. Summarization by domain ontology navigation

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2013-01-01

    of the subject. In between these two extremes, conceptual summaries encompass selected concepts derived using background knowledge. We address in this paper an approach where conceptual summaries are provided through a conceptualization as given by an ontology. The ontology guiding the summarization can...... be a simple taxonomy or a generative domain ontology. A domain ontology can be provided by a preanalysis of a domain corpus and can be used to condense improved summaries that better reflects the conceptualization of a given domain....

  2. An ontology-based exploration of the concepts and relationships in the activities and participation component of the international classification of functioning, disability and health.

    Science.gov (United States)

    Della Mea, Vincenzo; Simoncello, Andrea

    2012-02-28

    The International Classification of Functioning, Disability and Health (ICF) is a classification of health and health-related issues, aimed at describing and measuring health and disability at both individual and population levels. Here we discuss a preliminary qualitative and quantitative analysis of the relationships used in the Activities and Participation component of ICF, and a preliminary mapping to SUMO (Suggested Upper Merged Ontology) concepts. The aim of the analysis is to identify potential logical problems within this component of ICF, and to understand whether activities and participation might be defined more formally than in the current version of ICF. In the relationship analysis, we used four predicates among those available in SUMO for processes (Patient, Instrument, Agent, and subProcess). While at the top level subsumption was used in most cases (90%), at the lower levels the percentage of other relationships rose to 41%. Chapters were heterogeneous in the relationships used and some of the leaves of the tree seemed to represent properties or parts of the parent concept rather than subclasses. Mapping of ICF to SUMO proved partially feasible, with the activity concepts being mapped mostly (but not totally) under the IntentionalProcess concept in SUMO. On the other hand, the participation concept has not been mapped to any upper level concept. Our analysis of the relationships within ICF revealed issues related to confusion between classes and their properties, incorrect classifications, and overemphasis on subsumption, confirming what already observed by other researchers. However, it also suggested some properties for Activities that could be included in a more formal model: number of agents involved, the instrument used to carry out the activity, the object of the activity, complexity of the task, and an enumeration of relevant subtasks.

  3. An ontology-based exploration of the concepts and relationships in the activities and participation component of the international classification of functioning, disability and health

    Directory of Open Access Journals (Sweden)

    Della Mea Vincenzo

    2012-02-01

    Full Text Available Abstract Background The International Classification of Functioning, Disability and Health (ICF is a classification of health and health-related issues, aimed at describing and measuring health and disability at both individual and population levels. Here we discuss a preliminary qualitative and quantitative analysis of the relationships used in the Activities and Participation component of ICF, and a preliminary mapping to SUMO (Suggested Upper Merged Ontology concepts. The aim of the analysis is to identify potential logical problems within this component of ICF, and to understand whether activities and participation might be defined more formally than in the current version of ICF. Results In the relationship analysis, we used four predicates among those available in SUMO for processes (Patient, Instrument, Agent, and subProcess. While at the top level subsumption was used in most cases (90%, at the lower levels the percentage of other relationships rose to 41%. Chapters were heterogeneous in the relationships used and some of the leaves of the tree seemed to represent properties or parts of the parent concept rather than subclasses. Mapping of ICF to SUMO proved partially feasible, with the activity concepts being mapped mostly (but not totally under the IntentionalProcess concept in SUMO. On the other hand, the participation concept has not been mapped to any upper level concept. Conclusions Our analysis of the relationships within ICF revealed issues related to confusion between classes and their properties, incorrect classifications, and overemphasis on subsumption, confirming what already observed by other researchers. However, it also suggested some properties for Activities that could be included in a more formal model: number of agents involved, the instrument used to carry out the activity, the object of the activity, complexity of the task, and an enumeration of relevant subtasks.

  4. Inferring gene expression dynamics via functional regression analysis

    Directory of Open Access Journals (Sweden)

    Leng Xiaoyan

    2008-01-01

    Full Text Available Abstract Background Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene expression associated with different developmental stages to each other to study patterns of long-term developmental gene regulation. We use tools from functional data analysis to study dynamic changes by relating temporal gene expression profiles of different developmental stages to each other. Results We demonstrate that functional regression methodology can pinpoint relationships that exist between temporary gene expression profiles for different life cycle phases and incorporates dimension reduction as needed for these high-dimensional data. By applying these tools, gene expression profiles for pupa and adult phases are found to be strongly related to the profiles of the same genes obtained during the embryo phase. Moreover, one can distinguish between gene groups that exhibit relationships with positive and others with negative associations between later life and embryonal expression profiles. Specifically, we find a positive relationship in expression for muscle development related genes, and a negative relationship for strictly maternal genes for Drosophila, using temporal gene expression profiles. Conclusion Our findings point to specific reactivation patterns of gene expression during the Drosophila life cycle which differ in characteristic ways between various gene groups. Functional regression emerges as a useful tool for relating gene expression patterns from different developmental stages, and avoids the problems with large numbers of parameters and multiple testing that affect alternative approaches.

  5. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  6. Biomedical ontologies: toward scientific debate.

    Science.gov (United States)

    Maojo, V; Crespo, J; García-Remesal, M; de la Iglesia, D; Perez-Rey, D; Kulikowski, C

    2011-01-01

    Biomedical ontologies have been very successful in structuring knowledge for many different applications, receiving widespread praise for their utility and potential. Yet, the role of computational ontologies in scientific research, as opposed to knowledge management applications, has not been extensively discussed. We aim to stimulate further discussion on the advantages and challenges presented by biomedical ontologies from a scientific perspective. We review various aspects of biomedical ontologies going beyond their practical successes, and focus on some key scientific questions in two ways. First, we analyze and discuss current approaches to improve biomedical ontologies that are based largely on classical, Aristotelian ontological models of reality. Second, we raise various open questions about biomedical ontologies that require further research, analyzing in more detail those related to visual reasoning and spatial ontologies. We outline significant scientific issues that biomedical ontologies should consider, beyond current efforts of building practical consensus between them. For spatial ontologies, we suggest an approach for building "morphospatial" taxonomies, as an example that could stimulate research on fundamental open issues for biomedical ontologies. Analysis of a large number of problems with biomedical ontologies suggests that the field is very much open to alternative interpretations of current work, and in need of scientific debate and discussion that can lead to new ideas and research directions.

  7. Using a Foundational Ontology for Reengineering a Software Enterprise Ontology

    Science.gov (United States)

    Perini Barcellos, Monalessa; de Almeida Falbo, Ricardo

    The knowledge about software organizations is considerably relevant to software engineers. The use of a common vocabulary for representing the useful knowledge about software organizations involved in software projects is important for several reasons, such as to support knowledge reuse and to allow communication and interoperability between tools. Domain ontologies can be used to define a common vocabulary for sharing and reuse of knowledge about some domain. Foundational ontologies can be used for evaluating and re-designing domain ontologies, giving to these real-world semantics. This paper presents an evaluating of a Software Enterprise Ontology that was reengineered using the Unified Foundation Ontology (UFO) as basis.

  8. The design ontology

    DEFF Research Database (Denmark)

    Storga, Mario; Andreasen, Mogens Myrup; Marjanovic, Dorian

    2010-01-01

    The article presents the research of the nature, building and practical role of a Design Ontology as a potential framework for the more efficient product development (PD) data-, information- and knowledge- description, -explanation, -understanding and -reusing. In the methodology for development ...

  9. Dahlbeck and Pure Ontology

    Science.gov (United States)

    Mackenzie, Jim

    2016-01-01

    This article responds to Johan Dahlbeck's "Towards a pure ontology: Children's bodies and morality" ["Educational Philosophy and Theory," vol. 46 (1), 2014, pp. 8-23 (EJ1026561)]. His arguments from Nietzsche and Spinoza do not carry the weight he supposes, and the conclusions he draws from them about pedagogy would be…

  10. Audit Validation Using Ontologies

    Directory of Open Access Journals (Sweden)

    Ion IVAN

    2015-01-01

    Full Text Available Requirements to increase quality audit processes in enterprises are defined. It substantiates the need for assessment and management audit processes using ontologies. Sets of rules, ways to assess the consistency of rules and behavior within the organization are defined. Using ontologies are obtained qualifications that assess the organization's audit. Elaboration of the audit reports is a perfect algorithm-based activity characterized by generality, determinism, reproducibility, accuracy and a well-established. The auditors obtain effective levels. Through ontologies obtain the audit calculated level. Because the audit report is qualitative structure of information and knowledge it is very hard to analyze and interpret by different groups of users (shareholders, managers or stakeholders. Developing ontology for audit reports validation will be a useful instrument for both auditors and report users. In this paper we propose an instrument for validation of audit reports contain a lot of keywords that calculates indicators, a lot of indicators for each key word there is an indicator, qualitative levels; interpreter who builds a table of indicators, levels of actual and calculated levels.

  11. Biomedicine: an ontological dissection.

    Science.gov (United States)

    Baronov, David

    2008-01-01

    Though ubiquitous across the medical social sciences literature, the term "biomedicine" as an analytical concept remains remarkably slippery. It is argued here that this imprecision is due in part to the fact that biomedicine is comprised of three interrelated ontological spheres, each of which frames biomedicine as a distinct subject of investigation. This suggests that, depending upon one's ontological commitment, the meaning of biomedicine will shift. From an empirical perspective, biomedicine takes on the appearance of a scientific enterprise and is defined as a derivative category of Western science more generally. From an interpretive perspective, biomedicine represents a symbolic-cultural expression whose adherence to the principles of scientific objectivity conceals an ideological agenda. From a conceptual perspective, biomedicine represents an expression of social power that reflects structures of power and privilege within capitalist society. No one perspective exists in isolation and so the image of biomedicine from any one presents an incomplete understanding. It is the mutually-conditioning interrelations between these ontological spheres that account for biomedicine's ongoing development. Thus, the ontological dissection of biomedicine that follows, with particular emphasis on the period of its formal crystallization in the latter nineteenth and early twentieth century, is intended to deepen our understanding of biomedicine as an analytical concept across the medical social sciences literature.

  12. Methods for transient assay of gene function in floral tissues

    Directory of Open Access Journals (Sweden)

    Pathirana Nilangani N

    2007-01-01

    Full Text Available Abstract Background There is considerable interest in rapid assays or screening systems for assigning gene function. However, analysis of gene function in the flowers of some species is restricted due to the difficulty of producing stably transformed transgenic plants. As a result, experimental approaches based on transient gene expression assays are frequently used. Biolistics has long been used for transient over-expression of genes of interest, but has not been exploited for gene silencing studies. Agrobacterium-infiltration has also been used, but the focus primarily has been on the transient transformation of leaf tissue. Results Two constructs, one expressing an inverted repeat of the Antirrhinum majus (Antirrhinum chalcone synthase gene (CHS and the other an inverted repeat of the Antirrhinum transcription factor gene Rosea1, were shown to effectively induce CHS and Rosea1 gene silencing, respectively, when introduced biolistically into petal tissue of Antirrhinum flowers developing in vitro. A high-throughput vector expressing the Antirrhinum CHS gene attached to an inverted repeat of the nos terminator was also shown to be effective. Silencing spread systemically to create large zones of petal tissue lacking pigmentation, with transmission of the silenced state spreading both laterally within the affected epidermal cell layer and into lower cell layers, including the epidermis of the other petal surface. Transient Agrobacterium-mediated transformation of petal tissue of tobacco and petunia flowers in situ or detached was also achieved, using expression of the reporter genes GUS and GFP to visualise transgene expression. Conclusion We demonstrate the feasibility of using biolistics-based transient RNAi, and transient transformation of petal tissue via Agrobacterium infiltration to study gene function in petals. We have also produced a vector for high throughput gene silencing studies, incorporating the option of using T-A cloning to

  13. Epistemology and ontology in core ontologies: FOLaw and LRI-Core, two core ontologies for law

    NARCIS (Netherlands)

    Breukers, J.A.P.J.; Hoekstra, R.J.

    2004-01-01

    For more than a decade constructing ontologies for legal domains, we, at the Leibniz Center for Law, felt really the need to develop a core ontology for law that would enable us to re-use the common denominator of the various legal domains. In this paper we present two core ontologies for law. The

  14. [Towards a structuring fibrillar ontology].

    Science.gov (United States)

    Guimberteau, J-C

    2012-10-01

    Over previous decades and centuries, the difficulty encountered in the manner in which the tissue of our bodies is organised, and structured, is clearly explained by the impossibility of exploring it in detail. Since the creation of the microscope, the perception of the basic unity, which is the cell, has been essential in understanding the functioning of reproduction and of transmission, but has not been able to explain the notion of form; since the cells are not everywhere and are not distributed in an apparently balanced manner. The problems that remain are those of form and volume and also of connection. The concept of multifibrillar architecture, shaping the interfibrillar microvolumes in space, represents a solution to all these questions. The architectural structures revealed, made up of fibres, fibrils and microfibrils, from the mesoscopic to the microscopic level, provide the concept of a living form with structural rationalism that permits the association of psychochemical molecular biodynamics and quantum physics: the form can thus be described and interpreted, and a true structural ontology is elaborated from a basic functional unity, which is the microvacuole, the intra and interfibrillar volume of the fractal organisation, and the chaotic distribution. Naturally, new, less linear, less conclusive, and less specific concepts will be implied by this ontology, leading one to believe that the emergence of life takes place under submission to forces that the original form will have imposed and oriented the adaptive finality. Copyright © 2012. Published by Elsevier SAS.

  15. When natural selection gives gene function the cold shoulder.

    Science.gov (United States)

    Cutter, Asher D; Jovelin, Richard

    2015-11-01

    It is tempting to invoke organismal selection as perpetually optimizing the function of any given gene. However, natural selection can drive genic functional change without improvement of biochemical activity, even to the extinction of gene activity. Detrimental mutations can creep in owing to linkage with other selectively favored loci. Selection can promote functional degradation, irrespective of genetic drift, when adaptation occurs by loss of gene function. Even stabilizing selection on a trait can lead to divergence of the underlying molecular constituents. Selfish genetic elements can also proliferate independent of any functional benefits to the host genome. Here we review the logic and evidence for these diverse processes acting in genome evolution. This collection of distinct evolutionary phenomena - while operating through easily understandable mechanisms - all contribute to the seemingly counterintuitive notion that maintenance or improvement of a gene's biochemical function sometimes do not determine its evolutionary fate. © 2015 WILEY Periodicals, Inc.

  16. Human Intellectual Disability Genes Form Conserved Functional Modules in Drosophila

    Science.gov (United States)

    Oortveld, Merel A. W.; Keerthikumar, Shivakumar; Oti, Martin; Nijhof, Bonnie; Fernandes, Ana Clara; Kochinke, Korinna; Castells-Nobau, Anna; van Engelen, Eva; Ellenkamp, Thijs; Eshuis, Lilian; Galy, Anne; van Bokhoven, Hans; Habermann, Bianca; Brunner, Han G.; Zweier, Christiane; Verstreken, Patrik; Huynen, Martijn A.; Schenck, Annette

    2013-01-01

    Intellectual Disability (ID) disorders, defined by an IQ below 70, are genetically and phenotypically highly heterogeneous. Identification of common molecular pathways underlying these disorders is crucial for understanding the molecular basis of cognition and for the development of therapeutic intervention strategies. To systematically establish their functional connectivity, we used transgenic RNAi to target 270 ID gene orthologs in the Drosophila eye. Assessment of neuronal function in behavioral and electrophysiological assays and multiparametric morphological analysis identified phenotypes associated with knockdown of 180 ID gene orthologs. Most of these genotype-phenotype associations were novel. For example, we uncovered 16 genes that are required for basal neurotransmission and have not previously been implicated in this process in any system or organism. ID gene orthologs with morphological eye phenotypes, in contrast to genes without phenotypes, are relatively highly expressed in the human nervous system and are enriched for neuronal functions, suggesting that eye phenotyping can distinguish different classes of ID genes. Indeed, grouping genes by Drosophila phenotype uncovered 26 connected functional modules. Novel links between ID genes successfully predicted that MYCN, PIGV and UPF3B regulate synapse development. Drosophila phenotype groups show, in addition to ID, significant phenotypic similarity also in humans, indicating that functional modules are conserved. The combined data indicate that ID disorders, despite their extreme genetic diversity, are caused by disruption of a limited number of highly connected functional modules. PMID:24204314

  17. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated...

  18. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Development of an Ontology for Periodontitis.

    Science.gov (United States)

    Suzuki, Asami; Takai-Igarashi, Takako; Nakaya, Jun; Tanaka, Hiroshi

    2015-01-01

    In the clinical dentists and periodontal researchers' community, there is an obvious demand for a systems model capable of linking the clinical presentation of periodontitis to underlying molecular knowledge. A computer-readable representation of processes on disease development will give periodontal researchers opportunities to elucidate pathways and mechanisms of periodontitis. An ontology for periodontitis can be a model for integration of large variety of factors relating to a complex disease such as chronic inflammation in different organs accompanied by bone remodeling and immune system disorders, which has recently been referred to as osteoimmunology. Terms characteristic of descriptions related to the onset and progression of periodontitis were manually extracted from 194 review articles and PubMed abstracts by experts in periodontology. We specified all the relations between the extracted terms and constructed them into an ontology for periodontitis. We also investigated matching between classes of our ontology and that of Gene Ontology Biological Process. We developed an ontology for periodontitis called Periodontitis-Ontology (PeriO). The pathological progression of periodontitis is caused by complex, multi-factor interrelationships. PeriO consists of all the required concepts to represent the pathological progression and clinical treatment of periodontitis. The pathological processes were formalized with reference to Basic Formal Ontology and Relation Ontology, which accounts for participants in the processes realized by biological objects such as molecules and cells. We investigated the peculiarity of biological processes observed in pathological progression and medical treatments for the disease in comparison with Gene Ontology Biological Process (GO-BP) annotations. The results indicated that peculiarities of Perio existed in 1) granularity and context dependency of both the conceptualizations, and 2) causality intrinsic to the pathological processes

  20. Benchmarking ontologies: bigger or better?

    Directory of Open Access Journals (Sweden)

    Lixia Yao

    2011-01-01

    Full Text Available A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1 four of the most common medical ontologies with respect to a corpus of medical documents and (2 seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.

  1. Ontology-based Information Retrieval

    DEFF Research Database (Denmark)

    Styltsvig, Henrik Bulskov

    In this thesis, we will present methods for introducing ontologies in information retrieval. The main hypothesis is that the inclusion of conceptual knowledge such as ontologies in the information retrieval process can contribute to the solution of major problems currently found in information...... retrieval. This utilization of ontologies has a number of challenges. Our focus is on the use of similarity measures derived from the knowledge about relations between concepts in ontologies, the recognition of semantic information in texts and the mapping of this knowledge into the ontologies in use......, as well as how to fuse together the ideas of ontological similarity and ontological indexing into a realistic information retrieval scenario. To achieve the recognition of semantic knowledge in a text, shallow natural language processing is used during indexing that reveals knowledge to the level of noun...

  2. Investigating Gene Function in Cereal Rust Fungi by Plant-Mediated Virus-Induced Gene Silencing.

    Science.gov (United States)

    Panwar, Vinay; Bakkeren, Guus

    2017-01-01

    Cereal rust fungi are destructive pathogens, threatening grain production worldwide. Targeted breeding for resistance utilizing host resistance genes has been effective. However, breakdown of resistance occurs frequently and continued efforts are needed to understand how these fungi overcome resistance and to expand the range of available resistance genes. Whole genome sequencing, transcriptomic and proteomic studies followed by genome-wide computational and comparative analyses have identified large repertoire of genes in rust fungi among which are candidates predicted to code for pathogenicity and virulence factors. Some of these genes represent defence triggering avirulence effectors. However, functions of most genes still needs to be assessed to understand the biology of these obligate biotrophic pathogens. Since genetic manipulations such as gene deletion and genetic transformation are not yet feasible in rust fungi, performing functional gene studies is challenging. Recently, Host-induced gene silencing (HIGS) has emerged as a useful tool to characterize gene function in rust fungi while infecting and growing in host plants. We utilized Barley stripe mosaic virus-mediated virus induced gene silencing (BSMV-VIGS) to induce HIGS of candidate rust fungal genes in the wheat host to determine their role in plant-fungal interactions. Here, we describe the methods for using BSMV-VIGS in wheat for functional genomics study in cereal rust fungi.

  3. Ontology Design Patterns for Combining Pathology and Anatomy: Application to Study Aging and Longevity in Inbred Mouse Strains

    KAUST Repository

    Alghamdi, Sarah M.

    2018-05-13

    In biomedical research, ontologies are widely used to represent knowledge as well as to annotate datasets. Many of the existing ontologies cover a single type of phenomena, such as a process, cell type, gene, pathological entity or anatomical structure. Consequently, there is a requirement to use multiple ontologies to fully characterize the observations in the datasets. Although this allows precise annotation of different aspects of a given dataset, it limits our ability to use the ontologies in data analysis, as the ontologies are usually disconnected and their combinations cannot be exploited. Motivated by this, here we present novel ontology design methods for combining pathology and anatomy concepts. To this end, we use a dataset of mouse models which has been characterized through two ontologies: one of them is the mouse pathology ontology (MPATH) covering pathological lesions while the other is the mouse anatomy ontology (MA) covering the anatomical site of the lesions. We propose four novel ontology design patterns for combining these ontologies, and use these patterns to generate four ontologies in a data-driven way. To evaluate the generated ontologies, we utilize these in ontology-based data analysis, including ontology enrichment analysis and computation of semantic similarity. We demonstrate that there are significant differences between the four ontologies in different analysis approaches. In addition, when using semantic similarity to confirm the hypothesis that genetically identical mice should develop more similar diseases, the generated combined ontologies lead to significantly better analysis results compared to using each ontology individually. Our results reveal that using ontology design patterns to combine different facets characterizing a dataset can improve established analysis methods.

  4. Gene-environment interactions involving functional variants

    DEFF Research Database (Denmark)

    Barrdahl, Myrto; Rudolph, Anja; Hopper, John L

    2017-01-01

    .36, 95% CI: 1.16-1.59, pint  = 1.9 × 10(-5) ) in relation to ER- disease risk. The remaining two gene-environment interactions were also identified in relation to ER- breast cancer risk and were found between 3p21-rs6796502 and age at menarche (ORint  = 1.26, 95% CI: 1.12-1.43, pint =1.8 × 10...... epidemiological breast cancer risk factors in relation to breast cancer. Analyses were conducted on up to 58,573 subjects (26,968 cases and 31,605 controls) from the Breast Cancer Association Consortium, in one of the largest studies of its kind. Analyses were carried out separately for estrogen receptor (ER......) positive (ER+) and ER negative (ER-) disease. The Bayesian False Discovery Probability (BFDP) was computed to assess the noteworthiness of the results. Four potential gene-environment interactions were identified as noteworthy (BFDP 

  5. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies.

    Science.gov (United States)

    Walls, Ramona L; Deck, John; Guralnick, Robert; Baskauf, Steve; Beaman, Reed; Blum, Stanley; Bowers, Shawn; Buttigieg, Pier Luigi; Davies, Neil; Endresen, Dag; Gandolfo, Maria Alejandra; Hanner, Robert; Janning, Alyssa; Krishtalka, Leonard; Matsunaga, Andréa; Midford, Peter; Morrison, Norman; Ó Tuama, Éamonn; Schildhauer, Mark; Smith, Barry; Stucky, Brian J; Thomer, Andrea; Wieczorek, John; Whitacre, Jamie; Wooley, John

    2014-01-01

    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

  6. Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

    Science.gov (United States)

    Baskauf, Steve; Blum, Stanley; Bowers, Shawn; Davies, Neil; Endresen, Dag; Gandolfo, Maria Alejandra; Hanner, Robert; Janning, Alyssa; Krishtalka, Leonard; Matsunaga, Andréa; Midford, Peter; Tuama, Éamonn Ó.; Schildhauer, Mark; Smith, Barry; Stucky, Brian J.; Thomer, Andrea; Wieczorek, John; Whitacre, Jamie; Wooley, John

    2014-01-01

    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers

  7. Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies.

    Science.gov (United States)

    Lamy, Jean-Baptiste

    2017-07-01

    Ontologies are widely used in the biomedical domain. While many tools exist for the edition, alignment or evaluation of ontologies, few solutions have been proposed for ontology programming interface, i.e. for accessing and modifying an ontology within a programming language. Existing query languages (such as SPARQL) and APIs (such as OWLAPI) are not as easy-to-use as object programming languages are. Moreover, they provide few solutions to difficulties encountered with biomedical ontologies. Our objective was to design a tool for accessing easily the entities of an OWL ontology, with high-level constructs helping with biomedical ontologies. From our experience on medical ontologies, we identified two difficulties: (1) many entities are represented by classes (rather than individuals), but the existing tools do not permit manipulating classes as easily as individuals, (2) ontologies rely on the open-world assumption, whereas the medical reasoning must consider only evidence-based medical knowledge as true. We designed a Python module for ontology-oriented programming. It allows access to the entities of an OWL ontology as if they were objects in the programming language. We propose a simple high-level syntax for managing classes and the associated "role-filler" constraints. We also propose an algorithm for performing local closed world reasoning in simple situations. We developed Owlready, a Python module for a high-level access to OWL ontologies. The paper describes the architecture and the syntax of the module version 2. It details how we integrated the OWL ontology model with the Python object model. The paper provides examples based on Gene Ontology (GO). We also demonstrate the interest of Owlready in a use case focused on the automatic comparison of the contraindications of several drugs. This use case illustrates the use of the specific syntax proposed for manipulating classes and for performing local closed world reasoning. Owlready has been successfully

  8. Functionally enigmatic genes: a case study of the brain ignorome.

    Directory of Open Access Journals (Sweden)

    Ashutosh K Pandey

    Full Text Available What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.

  9. Functionally enigmatic genes: a case study of the brain ignorome.

    Science.gov (United States)

    Pandey, Ashutosh K; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W

    2014-01-01

    What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.

  10. Comprehensive analysis of coding-lncRNA gene co-expression network uncovers conserved functional lncRNAs in zebrafish.

    Science.gov (United States)

    Chen, Wen; Zhang, Xuan; Li, Jing; Huang, Shulan; Xiang, Shuanglin; Hu, Xiang; Liu, Changning

    2018-05-09

    Zebrafish is a full-developed model system for studying development processes and human disease. Recent studies of deep sequencing had discovered a large number of long non-coding RNAs (lncRNAs) in zebrafish. However, only few of them had been functionally characterized. Therefore, how to take advantage of the mature zebrafish system to deeply investigate the lncRNAs' function and conservation is really intriguing. We systematically collected and analyzed a series of zebrafish RNA-seq data, then combined them with resources from known database and literatures. As a result, we obtained by far the most complete dataset of zebrafish lncRNAs, containing 13,604 lncRNA genes (21,128 transcripts) in total. Based on that, a co-expression network upon zebrafish coding and lncRNA genes was constructed and analyzed, and used to predict the Gene Ontology (GO) and the KEGG annotation of lncRNA. Meanwhile, we made a conservation analysis on zebrafish lncRNA, identifying 1828 conserved zebrafish lncRNA genes (1890 transcripts) that have their putative mammalian orthologs. We also found that zebrafish lncRNAs play important roles in regulation of the development and function of nervous system; these conserved lncRNAs present a significant sequential and functional conservation, with their mammalian counterparts. By integrative data analysis and construction of coding-lncRNA gene co-expression network, we gained the most comprehensive dataset of zebrafish lncRNAs up to present, as well as their systematic annotations and comprehensive analyses on function and conservation. Our study provides a reliable zebrafish-based platform to deeply explore lncRNA function and mechanism, as well as the lncRNA commonality between zebrafish and human.

  11. Regulatory network analysis of Epstein-Barr virus identifies functional modules and hub genes involved in infectious mononucleosis.

    Science.gov (United States)

    Poorebrahim, Mansour; Salarian, Ali; Najafi, Saeideh; Abazari, Mohammad Foad; Aleagha, Maryam Nouri; Dadras, Mohammad Nasr; Jazayeri, Seyed Mohammad; Ataei, Atousa; Poortahmasebi, Vahdat

    2017-05-01

    Epstein-Barr virus (EBV) is the most common cause of infectious mononucleosis (IM) and establishes lifetime infection associated with a variety of cancers and autoimmune diseases. The aim of this study was to develop an integrative gene regulatory network (GRN) approach and overlying gene expression data to identify the representative subnetworks for IM and EBV latent infection (LI). After identifying differentially expressed genes (DEGs) in both IM and LI gene expression profiles, functional annotations were applied using gene ontology (GO) and BiNGO tools, and construction of GRNs, topological analysis and identification of modules were carried out using several plugins of Cytoscape. In parallel, a human-EBV GRN was generated using the Hu-Vir database for further analyses. Our analysis revealed that the majority of DEGs in both IM and LI were involved in cell-cycle and DNA repair processes. However, these genes showed a significant negative correlation in the IM and LI states. Furthermore, cyclin-dependent kinase 2 (CDK2) - a hub gene with the highest centrality score - appeared to be the key player in cell cycle regulation in IM disease. The most significant functional modules in the IM and LI states were involved in the regulation of the cell cycle and apoptosis, respectively. Human-EBV network analysis revealed several direct targets of EBV proteins during IM disease. Our study provides an important first report on the response to IM/LI EBV infection in humans. An important aspect of our data was the upregulation of genes associated with cell cycle progression and proliferation.

  12. Gene Overexpression Resources in Cereals for Functional Genomics and Discovery of Useful Genes

    Directory of Open Access Journals (Sweden)

    Kiyomi Abe

    2016-09-01

    Full Text Available Identification and elucidation of functions of plant genes is valuable for both basic and applied research. In addition to natural variation in model plants, numerous loss-of-function resources have been produced by mutagenesis with chemicals, irradiation, or insertions of transposable elements or T-DNA. However, we may be unable to observe loss-of-function phenotypes for genes with functionally redundant homologs, and for those essential for growth and development. To offset such disadvantages, gain-of-function transgenic resources have been exploited. Activation-tagged lines have been generated using obligatory overexpression of endogenous genes by random insertion of an enhancer. Recent progress in DNA sequencing technology and bioinformatics has enabled the preparation of genomewide collections of full-length cDNAs (fl-cDNAs in some model species. Using the fl-cDNA clones, a novel gain-of-function strategy, Fl-cDNA OvereXpressor gene (FOX-hunting system, has been developed. A mutant phenotype in a FOX line can be directly attributed to the overexpressed fl-cDNA. Investigating a large population of FOX lines could reveal important genes conferring favorable phenotypes for crop breeding. Alternatively, a unique loss-of-function approach Chimeric REpressor gene Silencing Technology (CRES-T has been developed. In CRES-T, overexpression of a chimeric repressor, composed of the coding sequence of a transcription factor (TF and short peptide designated as the repression domain, could interfere with the action of endogenous TF in plants. Although plant TFs usually consist of gene families, CRES-T is effective, in principle, even for the TFs with functional redundancy. In this review, we focus on the current status of the gene-overexpression strategies and resources for identifying and elucidating novel functions of cereal genes. We discuss the potential of these research tools for identifying useful genes and phenotypes for application in crop

  13. GoGene: gene annotation in the fast lane.

    Science.gov (United States)

    Plake, Conrad; Royer, Loic; Winnenburg, Rainer; Hakenberg, Jörg; Schroeder, Michael

    2009-07-01

    High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4,000,000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18,000,000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.

  14. Completeness, supervenience and ontology

    International Nuclear Information System (INIS)

    Maudlin, Tim W E

    2007-01-01

    In 1935, Einstein, Podolsky and Rosen raised the issue of the completeness of the quantum description of a physical system. What they had in mind is whether or not the quantum description is informationally complete, in that all physical features of a system can be recovered from it. In a collapse theory such as the theory of Ghirardi, Rimini and Weber, the quantum wavefunction is informationally complete, and this has often been taken to suggest that according to that theory the wavefunction is all there is. If we distinguish the ontological completeness of a description from its informational completeness, we can see that the best interpretations of the GRW theory must postulate more physical ontology than just the wavefunction

  15. Completeness, supervenience and ontology

    Energy Technology Data Exchange (ETDEWEB)

    Maudlin, Tim W E [Department of Philosophy, Rutgers University, 26 Nichol Avenue, New Brunswick, NJ 08901-1411 (United States)

    2007-03-23

    In 1935, Einstein, Podolsky and Rosen raised the issue of the completeness of the quantum description of a physical system. What they had in mind is whether or not the quantum description is informationally complete, in that all physical features of a system can be recovered from it. In a collapse theory such as the theory of Ghirardi, Rimini and Weber, the quantum wavefunction is informationally complete, and this has often been taken to suggest that according to that theory the wavefunction is all there is. If we distinguish the ontological completeness of a description from its informational completeness, we can see that the best interpretations of the GRW theory must postulate more physical ontology than just the wavefunction.

  16. Functional requirements driving the gene duplication in 12 Drosophila species.

    Science.gov (United States)

    Zhong, Yan; Jia, Yanxiao; Gao, Yang; Tian, Dacheng; Yang, Sihai; Zhang, Xiaohui

    2013-08-15

    Gene duplication supplies the raw materials for novel gene functions and many gene families arisen from duplication experience adaptive evolution. Most studies of young duplicates have focused on mammals, especially humans, whereas reports describing their genome-wide evolutionary patterns across the closely related Drosophila species are rare. The sequenced 12 Drosophila genomes provide the opportunity to address this issue. In our study, 3,647 young duplicate gene families were identified across the 12 Drosophila species and three types of expansions, species-specific, lineage-specific and complex expansions, were detected in these gene families. Our data showed that the species-specific young duplicate genes predominated (86.6%) over the other two types. Interestingly, many independent species-specific expansions in the same gene family have been observed in many species, even including 11 or 12 Drosophila species. Our data also showed that the functional bias observed in these young duplicate genes was mainly related to responses to environmental stimuli and biotic stresses. This study reveals the evolutionary patterns of young duplicates across 12 Drosophila species on a genomic scale. Our results suggest that convergent evolution acts on young duplicate genes after the species differentiation and adaptive evolution may play an important role in duplicate genes for adaption to ecological factors and environmental changes in Drosophila.

  17. An intronic microRNA silences genes that are functionally antagonistic to its host gene.

    Science.gov (United States)

    Barik, Sailen

    2008-09-01

    MicroRNAs (miRNAs) are short noncoding RNAs that down-regulate gene expression by silencing specific target mRNAs. While many miRNAs are transcribed from their own genes, nearly half map within introns of 'host' genes, the significance of which remains unclear. We report that transcriptional activation of apoptosis-associated tyrosine kinase (AATK), essential for neuronal differentiation, also generates miR-338 from an AATK gene intron that silences a family of mRNAs whose protein products are negative regulators of neuronal differentiation. We conclude that an intronic miRNA, transcribed together with the host gene mRNA, may serve the interest of its host gene by silencing a cohort of genes that are functionally antagonistic to the host gene itself.

  18. LOGISTICS OPTIMIZATION USING ONTOLOGIES

    OpenAIRE

    Hendi , Hayder; Ahmad , Adeel; Bouneffa , Mourad; Fonlupt , Cyril

    2014-01-01

    International audience; Logistics processes involve complex physical flows and integration of different elements. It is widely observed that the uncontrolled processes can decline the state of logistics. The optimization of logistic processes can support the desired growth and consistent continuity of logistics. In this paper, we present a software framework for logistic processes optimization. It primarily defines logistic ontologies and then optimize them. It intends to assist the design of...

  19. Construction of functional linkage gene networks by data integration.

    Science.gov (United States)

    Linghu, Bolan; Franzosa, Eric A; Xia, Yu

    2013-01-01

    Networks of functional associations between genes have recently been successfully used for gene function and disease-related research. A typical approach for constructing such functional linkage gene networks (FLNs) is based on the integration of diverse high-throughput functional genomics datasets. Data integration is a nontrivial task due to the heterogeneous nature of the different data sources and their variable accuracy and completeness. The presence of correlations between data sources also adds another layer of complexity to the integration process. In this chapter we discuss an approach for constructing a human FLN from data integration and a subsequent application of the FLN to novel disease gene discovery. Similar approaches can be applied to nonhuman species and other discovery tasks.

  20. The ALMT Gene Family Performs Multiple Functions in Plants

    Directory of Open Access Journals (Sweden)

    Jie Liu

    2018-02-01

    Full Text Available The aluminium activated malate transporter (ALMT gene family is named after the first member of the family identified in wheat (Triticum aestivum L.. The product of this gene controls resistance to aluminium (Al toxicity. ALMT genes encode transmembrane proteins that function as anion channels and perform multiple functions involving the transport of organic anions (e.g., carboxylates and inorganic anions in cells. They share a PF11744 domain and are classified in the Fusaric acid resistance protein-like superfamily, CL0307. The proteins typically have five to seven transmembrane regions in the N-terminal half and a long hydrophillic C-terminal tail but predictions of secondary structure vary. Although widely spread in plants, relatively little information is available on the roles performed by other members of this family. In this review, we summarized functions of ALMT gene families, including Al resistance, stomatal function, mineral nutrition, microbe interactions, fruit acidity, light response and seed development.

  1. A unified anatomy ontology of the vertebrate skeletal system.

    Directory of Open Access Journals (Sweden)

    Wasila M Dahdul

    Full Text Available The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO, to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish and multispecies (teleost, amphibian vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages, and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO, Gene Ontology (GO, Uberon, and Cell Ontology (CL, and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  2. A unified anatomy ontology of the vertebrate skeletal system.

    Science.gov (United States)

    Dahdul, Wasila M; Balhoff, James P; Blackburn, David C; Diehl, Alexander D; Haendel, Melissa A; Hall, Brian K; Lapp, Hilmar; Lundberg, John G; Mungall, Christopher J; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E; Vickaryous, Matthew K; Westerfield, Monte; Mabee, Paula M

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  3. A Unified Anatomy Ontology of the Vertebrate Skeletal System

    Science.gov (United States)

    Dahdul, Wasila M.; Balhoff, James P.; Blackburn, David C.; Diehl, Alexander D.; Haendel, Melissa A.; Hall, Brian K.; Lapp, Hilmar; Lundberg, John G.; Mungall, Christopher J.; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E.; Vickaryous, Matthew K.; Westerfield, Monte; Mabee, Paula M.

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity. PMID:23251424

  4. Feasibility of automated foundational ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2014-11-01

    Full Text Available the Source Domain Ontology (sOd), with the domain knowledge com- ponent of the source ontology, the Source Foundational Ontology (sOf ) that is the foundational ontology component of the source ontology that is to be interchanged, and any equivalence... or subsumption mappings between enti- ties in sOd and sOf . – The Target Ontology (tO) which has been interchanged, which comprises the Target Domain Ontology (tOd), with the domain knowledge component of the target ontology, and the Target Foundational Ontology...

  5. NegGOA: negative GO annotations selection using ontology structure.

    Science.gov (United States)

    Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

    2016-10-01

    Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. An Ontology for Software Engineering Education

    Science.gov (United States)

    Ling, Thong Chee; Jusoh, Yusmadi Yah; Adbullah, Rusli; Alwi, Nor Hayati

    2013-01-01

    Software agents communicate using ontology. It is important to build an ontology for specific domain such as Software Engineering Education. Building an ontology from scratch is not only hard, but also incur much time and cost. This study aims to propose an ontology through adaptation of the existing ontology which is originally built based on a…

  7. Assaying gene function by growth competition experiment.

    Science.gov (United States)

    Merritt, Joshua; Edwards, Jeremy S

    2004-07-01

    High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.

  8. FUNCTIONAL SPECIALIZATION OF DUPLICATED FLAVONOID BIOSYNTHESIS GENES IN WHEAT

    Directory of Open Access Journals (Sweden)

    Khlestkina E.

    2012-08-01

    Full Text Available Gene duplication followed by subfunctionalization and neofunctionalization is of a great evolutionary importance. In plant genomes, duplicated genes may result from either polyploidization (homoeologous genes or segmental chromosome duplications (paralogous genes. In allohexaploid wheat Triticum aestivum L. (2n=6x=42, genome BBAADD, both homoeologous and paralogous copies were found for the regulatory gene Myc encoding MYC-like transcriptional factor in the biosynthesis of flavonoid pigments, anthocyanins, and for the structural gene F3h encoding one of the key enzymes of flavonoid biosynthesis, flavanone 3-hydroxylase. From the 5 copies (3 homoeologous and 2 paralogous of the Myc gene found in T. aestivum, only one plays a regulatory role in anthocyanin biosynthesis, interacting complementary with another transcriptional factor (MYB-like to confer purple pigmentation of grain pericarp in wheat. The role and functionality of the other 4 copies of the Myc gene remain unknown. From the 4 functional copies of the F3h gene in T. aestivum, three homoeologues have similar function. They are expressed in wheat organs colored with anthocyanins or in the endosperm, participating there in biosynthesis of uncolored flavonoid substances. The fourth copy (the B-genomic paralogue is transcribed neither in wheat organs colored with anthocyanins nor in seeds, however, it’s expression has been noticed in roots of aluminium-stressed plants, where the three homoeologous copies are not active. Functional diversification of the duplicated flavonoid biosynthesis genes in wheat may be a reason for maintenance of the duplicated copies and preventing them from pseudogenization.The study was supported by RFBR (11-04-92707. We also thank Ms. Galina Generalova for technical assistance.

  9. Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function.

    Science.gov (United States)

    Chasman, Daniel I; Fuchsberger, Christian; Pattaro, Cristian; Teumer, Alexander; Böger, Carsten A; Endlich, Karlhans; Olden, Matthias; Chen, Ming-Huei; Tin, Adrienne; Taliun, Daniel; Li, Man; Gao, Xiaoyi; Gorski, Mathias; Yang, Qiong; Hundertmark, Claudia; Foster, Meredith C; O'Seaghdha, Conall M; Glazer, Nicole; Isaacs, Aaron; Liu, Ching-Ti; Smith, Albert V; O'Connell, Jeffrey R; Struchalin, Maksim; Tanaka, Toshiko; Li, Guo; Johnson, Andrew D; Gierman, Hinco J; Feitosa, Mary F; Hwang, Shih-Jen; Atkinson, Elizabeth J; Lohman, Kurt; Cornelis, Marilyn C; Johansson, Asa; Tönjes, Anke; Dehghan, Abbas; Lambert, Jean-Charles; Holliday, Elizabeth G; Sorice, Rossella; Kutalik, Zoltan; Lehtimäki, Terho; Esko, Tõnu; Deshmukh, Harshal; Ulivi, Sheila; Chu, Audrey Y; Murgia, Federico; Trompet, Stella; Imboden, Medea; Coassin, Stefan; Pistis, Giorgio; Harris, Tamara B; Launer, Lenore J; Aspelund, Thor; Eiriksdottir, Gudny; Mitchell, Braxton D; Boerwinkle, Eric; Schmidt, Helena; Cavalieri, Margherita; Rao, Madhumathi; Hu, Frank; Demirkan, Ayse; Oostra, Ben A; de Andrade, Mariza; Turner, Stephen T; Ding, Jingzhong; Andrews, Jeanette S; Freedman, Barry I; Giulianini, Franco; Koenig, Wolfgang; Illig, Thomas; Meisinger, Christa; Gieger, Christian; Zgaga, Lina; Zemunik, Tatijana; Boban, Mladen; Minelli, Cosetta; Wheeler, Heather E; Igl, Wilmar; Zaboli, Ghazal; Wild, Sarah H; Wright, Alan F; Campbell, Harry; Ellinghaus, David; Nöthlings, Ute; Jacobs, Gunnar; Biffar, Reiner; Ernst, Florian; Homuth, Georg; Kroemer, Heyo K; Nauck, Matthias; Stracke, Sylvia; Völker, Uwe; Völzke, Henry; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Hofman, Albert; Uitterlinden, Andre G; Rivadeneira, Fernando; Aulchenko, Yurii S; Polasek, Ozren; Hastie, Nick; Vitart, Veronique; Helmer, Catherine; Wang, Jie Jin; Stengel, Bénédicte; Ruggiero, Daniela; Bergmann, Sven; Kähönen, Mika; Viikari, Jorma; Nikopensius, Tiit; Province, Michael; Ketkar, Shamika; Colhoun, Helen; Doney, Alex; Robino, Antonietta; Krämer, Bernhard K; Portas, Laura; Ford, Ian; Buckley, Brendan M; Adam, Martin; Thun, Gian-Andri; Paulweber, Bernhard; Haun, Margot; Sala, Cinzia; Mitchell, Paul; Ciullo, Marina; Kim, Stuart K; Vollenweider, Peter; Raitakari, Olli; Metspalu, Andres; Palmer, Colin; Gasparini, Paolo; Pirastu, Mario; Jukema, J Wouter; Probst-Hensch, Nicole M; Kronenberg, Florian; Toniolo, Daniela; Gudnason, Vilmundur; Shuldiner, Alan R; Coresh, Josef; Schmidt, Reinhold; Ferrucci, Luigi; Siscovick, David S; van Duijn, Cornelia M; Borecki, Ingrid B; Kardia, Sharon L R; Liu, Yongmei; Curhan, Gary C; Rudan, Igor; Gyllensten, Ulf; Wilson, James F; Franke, Andre; Pramstaller, Peter P; Rettig, Rainer; Prokopenko, Inga; Witteman, Jacqueline; Hayward, Caroline; Ridker, Paul M; Parsa, Afshin; Bochud, Murielle; Heid, Iris M; Kao, W H Linda; Fox, Caroline S; Köttgen, Anna

    2012-12-15

    In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.

  10. Functional Analysis of an ATP-Binding Cassette Transporter Gene in Botrytis cinerea by Gene Disruption

    OpenAIRE

    Masami, NAKAJIMA; Junko, SUZUKI; Takehiko, HOSAKA; Tadaaki, HIBI; Katsumi, AKUTSU; School of Agriculture, Ibaraki University; School of Agriculture, Ibaraki University; School of Agriculture, Ibaraki University; Department of Agriculture and Environmental Biology, The University of Tokyo; School of Agriculture, Ibaraki University

    2001-01-01

    The BMR1 gene encoding an ABC transporter was cloned from Botrytis cinerea. To examine the function of BMR1 in B.cinerea, we isolated BMR1-deficient mutants after gene disruption. Disruption vector pBcDF4 was constructed by replacing the BMR1-coding region with a hygromycin B phosphotransferase gene(hph)cassette. The BMR1 disruptants had an increased sensitivity to polyoxin and iprobenfos. Polyoxin and iprobenfos, structurally unrelated compounds, may therefore be substrates of BMR1.

  11. Drug target ontology to classify and integrate drug discovery data

    DEFF Research Database (Denmark)

    Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande

    2017-01-01

    using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem...... of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target...... characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. CONCLUSIONS: DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein...

  12. Gene function in early mouse embryonic stem cell differentiation

    Directory of Open Access Journals (Sweden)

    Campbell Pearl A

    2007-03-01

    Full Text Available Abstract Background Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5 undergoing undirected differentiation into embryoid bodies (EBs over a period of two weeks. Results We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1, our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2 that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. Conclusion Our analysis profiles for the first time gene expression at a very early stage of m

  13. Automated discovery of functional generality of human gene expression programs.

    Directory of Open Access Journals (Sweden)

    Georg K Gerber

    2007-08-01

    Full Text Available An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-kappaB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal

  14. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  15. Molecular and Functional Characterization of Broccoli EMBRYONIC FLOWER 2 Genes

    Science.gov (United States)

    Chen, Long-Fang O.; Lin, Chun-Hung; Lai, Ying-Mi; Huang, Jia-Yuan; Sung, Zinmay Renee

    2012-01-01

    Polycomb group (PcG) proteins regulate major developmental processes in Arabidopsis. EMBRYONIC FLOWER 2 (EMF2), the VEFS domain-containing PcG gene, regulates diverse genetic pathways and is required for vegetative development and plant survival. Despite widespread EMF2-like sequences in plants, little is known about their function other than in Arabidopsis and rice. To study the role of EMF2 in broccoli (Brassica oleracea var. italica cv. Elegance) development, we identified two broccoli EMF2 (BoEMF2) genes with sequence homology to and a similar gene expression pattern to that in Arabidopsis (AtEMF2). Reducing their expression in broccoli resulted in aberrant phenotypes and gene expression patterns. BoEMF2 regulates genes involved in diverse developmental and stress programs similar to AtEMF2 in Arabidopsis. However, BoEMF2 differs from AtEMF2 in the regulation of flower organ identity, cell proliferation and elongation, and death-related genes, which may explain the distinct phenotypes. The expression of BoEMF2.1 in the Arabidopsis emf2 mutant (Rescued emf2) partially rescued the mutant phenotype and restored the gene expression pattern to that of the wild type. Many EMF2-mediated molecular and developmental functions are conserved in broccoli and Arabidopsis. Furthermore, the restored gene expression pattern in Rescued emf2 provides insights into the molecular basis of PcG-mediated growth and development. PMID:22537758

  16. ONSET: Automated foundational ontology selection and explanation

    CSIR Research Space (South Africa)

    Khan, Z

    2012-10-01

    Full Text Available It has been shown that using a foundational ontology for domain ontology development is beneficial in theory and practice. However, developers have difficulty with choosing the appropriate foundational ontology, and why. In order to solve...

  17. Application of neuroanatomical ontologies for neuroimaging data annotation

    Directory of Open Access Journals (Sweden)

    Jessica A Turner

    2010-06-01

    Full Text Available The annotation of functional neuroimaging results for data sharing and reuse is particularly challenging, due to the diversity of terminologies of neuroanatomical structures and cortical parcellation schemes. To address this challenge, we extended the Foundational Model of Anatomy Ontology (FMA to include cytoarchitectural, Brodmann area labels, and a morphological cortical labeling scheme (e.g., the part of Brodmann area 6 in the left precentral gyrus. This representation was also used to augment the neuroanatomical axis of RadLex, the ontology for clinical imaging. The resulting neuroanatomical ontology contains explicit relationships indicating which brain regions are “part of” which other regions, across cytoarchitectural and morphological labeling schemas. We annotated a large functional neuroimaging dataset with terms from the ontology and applied a reasoning engine to analyze this dataset in conjunction with the ontology, and achieved successful inferences from the most specific level (e.g., how many subjects showed activation in a sub-part of the middle frontal gyrus to more general (how many activations were found in areas connected via a known white matter tract?. In summary, we have produced a neuroanatomical ontology that harmonizes several different terminologies of neuroanatomical structures and cortical parcellation schemes. This neuranatomical ontology is publicly available as a view of FMA at the Bioportal website at http://rest.bioontology.org/bioportal/ontologies/download/10005. The ontological encoding of anatomic knowledge can be exploited by computer reasoning engines to make inferences about neuroanatomical relationships described in imaging datasets using different terminologies. This approach could ultimately enable knowledge discovery from large, distributed fMRI studies or medical record mining.

  18. Polyploidization altered gene functions in cotton (Gossypium spp.).

    Science.gov (United States)

    Xu, Zhanyou; Yu, John Z; Cho, Jaemin; Yu, Jing; Kohel, Russell J; Percy, Richard G

    2010-12-16

    Cotton (Gossypium spp.) is an important crop plant that is widely grown to produce both natural textile fibers and cottonseed oil. Cotton fibers, the economically more important product of the cotton plant, are seed trichomes derived from individual cells of the epidermal layer of the seed coat. It has been known for a long time that large numbers of genes determine the development of cotton fiber, and more recently it has been determined that these genes are distributed across At and Dt subgenomes of tetraploid AD cottons. In the present study, the organization and evolution of the fiber development genes were investigated through the construction of an integrated genetic and physical map of fiber development genes whose functions have been verified and confirmed. A total of 535 cotton fiber development genes, including 103 fiber transcription factors, 259 fiber development genes, and 173 SSR-contained fiber ESTs, were analyzed at the subgenome level. A total of 499 fiber related contigs were selected and assembled. Together these contigs covered about 151 Mb in physical length, or about 6.7% of the tetraploid cotton genome. Among the 499 contigs, 397 were anchored onto individual chromosomes. Results from our studies on the distribution patterns of the fiber development genes and transcription factors between the At and Dt subgenomes showed that more transcription factors were from Dt subgenome than At, whereas more fiber development genes were from At subgenome than Dt. Combining our mapping results with previous reports that more fiber QTLs were mapped in Dt subgenome than At subgenome, the results suggested a new functional hypothesis for tetraploid cotton. After the merging of the two diploid Gossypium genomes, the At subgenome has provided most of the genes for fiber development, because it continues to function similar to its fiber producing diploid A genome ancestor. On the other hand, the Dt subgenome, with its non-fiber producing D genome ancestor

  19. Drosha regulates gene expression independently of RNA cleavage function

    DEFF Research Database (Denmark)

    Gromak, Natalia; Dienstbier, Martin; Macias, Sara

    2013-01-01

    Drosha is the main RNase III-like enzyme involved in the process of microRNA (miRNA) biogenesis in the nucleus. Using whole-genome ChIP-on-chip analysis, we demonstrate that, in addition to miRNA sequences, Drosha specifically binds promoter-proximal regions of many human genes in a transcription......-dependent manner. This binding is not associated with miRNA production or RNA cleavage. Drosha knockdown in HeLa cells downregulated nascent gene transcription, resulting in a reduction of polyadenylated mRNA produced from these gene regions. Furthermore, we show that this function of Drosha is dependent on its N......-terminal protein-interaction domain, which associates with the RNA-binding protein CBP80 and RNA Polymerase II. Consequently, we uncover a previously unsuspected RNA cleavage-independent function of Drosha in the regulation of human gene expression....

  20. Functional Potential of Bacterial Communities using Gene Context Information

    Directory of Open Access Journals (Sweden)

    Anwesha Mohapatra

    2017-12-01

    Full Text Available Estimation of the functional potential of a bacterial genome can be determined by accurate annotation of its metabolic pathways. Existing homology based methods for pathway annotation fail to account for homologous genes that participate in multiple pathways, causing overestimation of gene copy number. Mere presence of constituent genes of a candidate pathway which are dispersed on a genome often results in incorrect annotation, thereby leading to erroneous gene abundance and pathway estimation. Clusters of evolutionarily conserved coregulated genes are characteristic features in bacterial genomes and their spatial arrangement in the genome is constrained by the pathway encoded by them. Thus, in order to improve the accuracy of pathway prediction, it is important to augment homology based annotation with gene organization information. In this communication, we present a methodology considering prioritization of gene context for improved pathway annotation. Extensive literature mining was performed to confirm conserved juxtaposed arrangement of gene components of various pathways. Our method was utilized to identify and analyse the functional potential of all available completely sequenced bacterial genomes. The accuracy of the predicted gene clusters and their importance in metabolic pathways will be demonstrated using a few case studies. One of such case study corresponds to butyrate production pathways in gut bacteria where it was observed that gut pathogens and commensals possess a distinct set of pathway components. In another example, we will demonstrate how our methodology improves the prediction accuracy of carbohydrate metabolic potential in human microbial communities. Applicability of our method for estimation of functional potential in bacterial communities present in diverse environments will also be illustrated.

  1. Stably Expressed Genes Involved in Basic Cellular Functions.

    Directory of Open Access Journals (Sweden)

    Kejian Wang

    Full Text Available Stably Expressed Genes (SEGs whose expression varies within a narrow range may be involved in core cellular processes necessary for basic functions. To identify such genes, we re-analyzed existing RNA-Seq gene expression profiles across 11 organs at 4 developmental stages (from immature to old age in both sexes of F344 rats (n = 4/group; 320 samples. Expression changes (calculated as the maximum expression / minimum expression for each gene of >19000 genes across organs, ages, and sexes ranged from 2.35 to >109-fold, with a median of 165-fold. The expression of 278 SEGs was found to vary ≤4-fold and these genes were significantly involved in protein catabolism (proteasome and ubiquitination, RNA transport, protein processing, and the spliceosome. Such stability of expression was further validated in human samples where the expression variability of the homologous human SEGs was significantly lower than that of other genes in the human genome. It was also found that the homologous human SEGs were generally less subject to non-synonymous mutation than other genes, as would be expected of stably expressed genes. We also found that knockout of SEG homologs in mouse models was more likely to cause complete preweaning lethality than non-SEG homologs, corroborating the fundamental roles played by SEGs in biological development. Such stably expressed genes and pathways across life-stages suggest that tight control of these processes is important in basic cellular functions and that perturbation by endogenous (e.g., genetics or exogenous agents (e.g., drugs, environmental factors may cause serious adverse effects.

  2. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  3. Ontology Based Model Transformation Infrastructure

    NARCIS (Netherlands)

    Göknil, Arda; Topaloglu, N.Y.

    2005-01-01

    Using MDA in ontology development has been investigated in several works recently. The mappings and transformations between the UML constructs and the OWL elements to develop ontologies are the main concern of these research projects. We propose another approach in order to achieve the collaboration

  4. Ontology through a Mindfulness Process

    Science.gov (United States)

    Bearance, Deborah; Holmes, Kimberley

    2015-01-01

    Traditionally, when ontology is taught in a graduate studies course on social research, there is a tendency for this concept to be examined through the process of lectures and readings. Such an approach often leaves graduate students to grapple with a personal embodiment of this concept and to comprehend how ontology can ground their research.…

  5. The foundational ontology library ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available . We present here a basic step in that direction with the Repository of Ontologies for MULtiple USes, ROMULUS, which is the first online library of machine-processable, modularised, aligned, and logic-based merged foundational ontologies. In addition...

  6. Tracking Changes during Ontology Evolution

    NARCIS (Netherlands)

    Noy, Natalya F.; Kunnatur, Sandhya; Klein, Michel; Musen, Mark A.

    2004-01-01

    As ontology development becomes a collaborative process, developers face the problem of maintaining versions of ontologies akin to maintaining versions of software code or versions of documents in large projects. Traditional versioning systems enable users to compare versions, examine changes, and

  7. The Drosophila melanogaster methuselah gene: a novel gene with ancient functions.

    Directory of Open Access Journals (Sweden)

    Ana Rita Araújo

    Full Text Available The Drosophila melanogaster G protein-coupled receptor gene, methuselah (mth, has been described as a novel gene that is less than 10 million years old. Nevertheless, it shows a highly specific expression pattern in embryos, larvae, and adults, and has been implicated in larval development, stress resistance, and in the setting of adult lifespan, among others. Although mth belongs to a gene subfamily with 16 members in D. melanogaster, there is no evidence for functional redundancy in this subfamily. Therefore, it is surprising that a novel gene influences so many traits. Here, we explore the alternative hypothesis that mth is an old gene. Under this hypothesis, in species distantly related to D. melanogaster, there should be a gene with features similar to those of mth. By performing detailed phylogenetic, synteny, protein structure, and gene expression analyses we show that the D. virilis GJ12490 gene is the orthologous of mth in species distantly related to D. melanogaster. We also show that, in D. americana (a species of the virilis group of Drosophila, a common amino acid polymorphism at the GJ12490 orthologous gene is significantly associated with developmental time, size, and lifespan differences. Our results imply that GJ12490 orthologous genes are candidates for developmental time and lifespan differences in Drosophila in general.

  8. ANLN functions as a key candidate gene in cervical cancer as determined by integrated bioinformatic analysis

    Directory of Open Access Journals (Sweden)

    Xia L

    2018-04-01

    Full Text Available Leilei Xia,1,* Xiaoling Su,1,2,* Jizi Shen,1,* Qi Meng,1 Jiuqiong Yan,1 Caihong Zhang,1 Yu Chen,1 Han Wang,3 Mingjuan Xu,1 1Department of Obstetrics and Gynecology, Changhai Hospital, Second Military Medical University, Shanghai, People’s Republic of China; 2Department of Obstetrics and Gynecology, No. 455 Hospital, Shanghai, People’s Republic of China; 3Department of Pathology, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: Cervical cancer, one of the leading causes of female deaths, remains a top cause of mortality in gynecologic oncology and tends to affect younger individuals. However, the pathogenesis of cervical cancer is still far from clear. Given the high incidence and mortality of cervical cancer, uncovering the causes and pathogenesis as well as identifying novel biomarkers are of great significance and are desperately needed.Materials and methods: First, raw data were downloaded from the Gene Expression Omnibus database. The Robuse Multi-Array Average algorithm and combat function of the sva package were subsequently applied to preprocess and remove batch effects. Differentially expressed genes (DEGs analyzed with the limma package were followed by gene ontology and pathway analysis, and a protein–protein interaction (PPI network based on the STRING website and the Cytoscape software was constructed. Weighted Correlation Network Analysis (WGCNA was utilized to build the coexpression network. Subsequently, UALCAN websites were employed to conduct survival analysis. Finally, the oncomine database was used to validate the expression of ANLN in other datasets.Results: GSE29570 and GSE89657, including 49 cervical cancer tissues and 20 normal cervical tissues, were screened as the datasets. Three-hundred-twenty-four DEGs were identified and, among them, 123 were upregulated, while 201 were downregulated. The

  9. Knowledge Representation in Patient Safety Reporting: An Ontological Approach

    OpenAIRE

    Liang Chen; Yang Gong

    2016-01-01

    Purpose: The current development of patient safety reporting systems is criticized for loss of information and low data quality due to the lack of a uniformed domain knowledge base and text processing functionality. To improve patient safety reporting, the present paper suggests an ontological representation of patient safety knowledge. Design/methodology/approach: We propose a framework for constructing an ontological knowledge base of patient safety. The present paper describes our desig...

  10. An ontology for human-like interaction systems

    OpenAIRE

    Albacete García, Esperanza

    2016-01-01

    This report proposes and describes the development of a Ph.D. Thesis aimed at building an ontological knowledge model supporting Human-Like Interaction systems. The main function of such knowledge model in a human-like interaction system is to unify the representation of each concept, relating it to the appropriate terms, as well as to other concepts with which it shares semantic relations. When developing human-like interactive systems, the inclusion of an ontological module can be valuab...

  11. Differential Retention of Gene Functions in a Secondary Metabolite Cluster.

    Science.gov (United States)

    Reynolds, Hannah T; Slot, Jason C; Divon, Hege H; Lysøe, Erik; Proctor, Robert H; Brown, Daren W

    2017-08-01

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacetylase inhibitor that contributes weakly to virulence. The DEP cluster includes genes encoding enzymes, a transporter, and a transcription regulator. We investigated the distribution and evolution of the DEP cluster in 585 fungal genomes and found a wide but sporadic distribution among Dothideomycetes, Sordariomycetes, and Eurotiomycetes. We confirmed DEP gene expression and depudecin production in one fungus, Fusarium langsethiae. Phylogenetic analyses suggested 6-10 horizontal gene transfers (HGTs) of the cluster, including a transfer that led to the presence of closely related cluster homologs in Alternaria and Fusarium. The analyses also indicated that HGTs were frequently followed by loss/pseudogenization of one or more DEP genes. Independent cluster inactivation was inferred in at least four fungal classes. Analyses of transitions among functional, pseudogenized, and absent states of DEP genes among Fusarium species suggest enzyme-encoding genes are lost at higher rates than the transporter (DEP3) and regulatory (DEP6) genes. The phenotype of an experimentally-induced DEP3 mutant of Fusarium did not support the hypothesis that selective retention of DEP3 and DEP6 protects fungi from exogenous depudecin. Together, the results suggest that HGT and gene loss have contributed significantly to DEP cluster distribution, and that some DEP genes provide a greater fitness benefit possibly due to a differential tendency to form network connections. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.

  12. The function and evolution of Wnt genes in arthropods.

    Science.gov (United States)

    Murat, Sophie; Hopfen, Corinna; McGregor, Alistair P

    2010-11-01

    Wnt signalling is required for a wide range of developmental processes, from cleavage to patterning and cell migration. There are 13 subfamilies of Wnt ligand genes and this diverse repertoire appeared very early in metazoan evolution. In this review, we first summarise the known Wnt gene repertoire in various arthropods. Insects appear to have lost several Wnt subfamilies, either generally, such as Wnt3, or in lineage specific patterns, for example, the loss of Wnt7 in Anopheles. In Drosophila and Acyrthosiphon, only seven and six Wnt subfamilies are represented, respectively; however, the finding of nine Wnt genes in Tribolium suggests that arthropods had a larger repertoire ancestrally. We then discuss what is currently known about the expression and developmental function of Wnt ligands in Drosophila and other insects in comparison to other arthropods, such as the spiders Achaearanea and Cupiennius. We conclude that studies of Wnt genes have given us much insight into the developmental roles of some of these ligands. However, given the frequent loss of Wnt genes in insects and the derived development of Drosophila, further studies of these important genes are required in a broader range of arthropods to fully understand their developmental function and evolution. Copyright © 2010 Elsevier Ltd. All rights reserved.

  13. Transient transformation meets gene function discovery: the strawberry fruit case

    Directory of Open Access Journals (Sweden)

    Michela eGuidarelli

    2015-06-01

    Full Text Available Beside the well known nutritional and health benefits, strawberry (Fragaria X ananassa crop draws increasing attention as plant model system for the Rosaceae family, due to the short generation time, the rapid in vitro regeneration, and to the availability of the genome sequence of F. X ananassa and of the closely related F. vesca species. In the last years, the use of high-throughput sequence technologies provided large amounts of molecular information on the genes possibly related to several biological processes of this crop. Nevertheless, the function of most genes or gene products is still poorly understood and needs investigation. Transient transformation technology provides a powerful tool to study gene function in vivo, avoiding difficult drawbacks that typically affect the stable transformation protocols, such as transformation efficiency, transformants selection and regeneration. In this review we provide an overview of the use of transient expression in the investigation of the function of genes important for strawberry fruit development, defence and nutritional properties. The technical aspects related to an efficient use of this technique are described, and the possible impact and application in strawberry crop improvement are discussed.

  14. Sponge microbiota are a reservoir of functional antibiotic resistance genes

    Directory of Open Access Journals (Sweden)

    Dennis Versluis

    2016-11-01

    Full Text Available Wide application of antibiotics has contributed to the evolution of multi-drug resistant human pathogens, resulting in poorer treatment outcomes for infections. In the marine environment, seawater samples have been investigated as a resistance reservoir; however, no studies have methodically examined sponges as a reservoir of antibiotic resistance. Sponges could be important in this respect because they often contain diverse microbial communities that have the capacity to produce bioactive metabolites. Here, we applied functional metagenomics to study the presence and diversity of functional resistance genes in the sponges Aplysina aerophoba, Petrosia ficiformis and Corticium candelabrum. We obtained 37 insert sequences facilitating resistance to D-cycloserine (n=6, gentamicin (n=1, amikacin (n=7, trimethoprim (n=17, chloramphenicol (n=1, rifampicin (n=2 and ampicillin (n=3. Fifteen of 37 inserts harboured resistance genes that shared <90% amino acid identity with known gene products, whereas on 13 inserts no resistance gene could be identified with high confidence, in which case we predicted resistance to be mainly mediated by antibiotic efflux. One marine-specific ampicillin-resistance-conferring β-lactamase was identified in the genus Pseudovibrio with 41% global amino acid identity to the closest β-lactamase with demonstrated functionality, and subsequently classified into a new family termed PSV. Taken together, our results show that sponge microbiota host diverse and novel resistance genes that may be harnessed by phylogenetically distinct bacteria.

  15. Using riboswitches to regulate gene expression and define gene function in mycobacteria.

    Science.gov (United States)

    Van Vlack, Erik R; Seeliger, Jessica C

    2015-01-01

    Mycobacteria include both environmental species and many pathogenic species such as Mycobacterium tuberculosis, an intracellular pathogen that is the causative agent of tuberculosis in humans. Inducible gene expression is a powerful tool for examining gene function and essentiality, both in in vitro culture and in host cell infections. The theophylline-inducible artificial riboswitch has recently emerged as an alternative to protein repressor-based systems. The riboswitch is translationally regulated and is combined with a mycobacterial promoter that provides transcriptional control. We here provide methods used by our laboratory to characterize the riboswitch response to theophylline in reporter strains, recombinant organisms containing riboswitch-regulated endogenous genes, and in host cell infections. These protocols should facilitate the application of both existing and novel artificial riboswitches to the exploration of gene function in mycobacteria. © 2015 Elsevier Inc. All rights reserved.

  16. The function and evolution of Msx genes: pointers and paradoxes.

    Science.gov (United States)

    Davidson, D

    1995-10-01

    The Msx genes of vertebrates comprise a small family of chromosomally unlinked homeobox-containing genes related to the Drosophila gene muscle-segment homeobox (msh). Despite their ancient pedigree, the Msx genes are expressed in a range of vertebrate-specific tissues, including neural crest, cranial sensory placodes, bone and teeth. They are active in numerous systems, which have been used as models to study pattern formation and tissue interaction, and are, therefore, attracting a growing interest among developmental biologists. But beyond their presumed role as transcription factors, we do not know what their functions are in the cell or the embryo. Here, I review recent evidence that is beginning to address this problem and might eventually increase our understanding of how the vertebrate embryo has evolved.

  17. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association.

    Directory of Open Access Journals (Sweden)

    Liang Cheng

    Full Text Available Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim that integrates semantic and functional association is proposed to address the issue.SemFunSim is designed as follows. First of all, FunSim (Functional similarity is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity.The high average AUC (area under the receiver operating characteristic curve (96.37% shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.

  18. NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.

    Science.gov (United States)

    Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun

    2017-09-21

    High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

  19. Ontological Issues and the Possible Development of Cultural Psychology.

    Science.gov (United States)

    Pérez-Campos, Gilberto

    2017-12-01

    Ontological issues have a bad reputation within mainstream psychology. This paper, however, is an attempt to argue that ontological reflection may play an important role in the development of cultural psychology. A cross-reading of two recent papers on the subject (Mammen & Mironenko, Integrative Psychological and Behavioral Science, 49(4), 681-713, 2015; Simão Integrative Psychological and Behavioral Science, 50, 568-585, 2016), aimed at characterizing their respective approaches to ontological issues, sets the stage for a presentation of Cornelius Castoriadis' ontological reflections. On this basis, a dialogue is initiated with E.E. Boesch's Symbolic Activity Theory that could contribute to a more refined understanding of human psychological functioning in its full complexity.

  20. In search of a primitive ontology for relativistic quantum field theory

    Energy Technology Data Exchange (ETDEWEB)

    Lam, Vincent [University of Lausanne, CH-1015 Lausanne (Switzerland)

    2014-07-01

    There is a recently much discussed approach to the ontology of quantum mechanics according to which the theory is ultimately about entities in 3-dimensional space and their temporal evolution. Such an ontology postulating from the start matter localized in usual physical space or spacetime, by contrast to an abstract high-dimensional space such as the configuration space of wave function realism, is called primitive ontology in the recent literature on the topic and finds its roots in Bell's notion of local beables. The main motivation for a primitive ontology lies in its explanatory power: the primitive ontology allows for a direct account of the behaviour and properties of familiar macroscopic objects. In this context, it is natural to look for a primitive ontology for relativistic quantum field theory (RQFT). The aim of this talk is to critically discuss this interpretative move within RQFT, in particular with respect to the foundational issue of the existence of unitarily inequivalent representations. Indeed the proposed primitive ontologies for RQFT rely either on a Fock space representation or a wave functional representation, which are strictly speaking only unambiguously available for free systems in flat spacetime. As a consequence, it is argued that these primitive ontologies constitute only effective ontologies and are hardly satisfying as a fundamental ontology for RQFT.

  1. Logic and Ontology

    Directory of Open Access Journals (Sweden)

    Newton C. A. da Costa

    2002-12-01

    Full Text Available In view of the present state of development of non classical logic, especially of paraconsistent logic, a new stand regarding the relations between logic and ontology is defended In a parody of a dictum of Quine, my stand May be summarized as follows. To be is to be the value of a variable a specific language with a given underlying logic Yet my stand differs from Quine’s, because, among other reasons, I accept some first order heterodox logics as genuine alternatives to classical logic I also discuss some questions of non classical logic to substantiate my argument, and suggest that may position complements and extends some ideas advanced by L Apostel.

  2. Functional validation of GWAS gene candidates for abnormal liver function during zebrafish liver development

    Directory of Open Access Journals (Sweden)

    Leah Y. Liu

    2013-09-01

    Genome-wide association studies (GWAS have revealed numerous associations between many phenotypes and gene candidates. Frequently, however, further elucidation of gene function has not been achieved. A recent GWAS identified 69 candidate genes associated with elevated liver enzyme concentrations, which are clinical markers of liver disease. To investigate the role of these genes in liver homeostasis, we narrowed down this list to 12 genes based on zebrafish orthology, zebrafish liver expression and disease correlation. To assess the function of gene candidates during liver development, we assayed hepatic progenitors at 48 hours post fertilization (hpf and hepatocytes at 72 hpf using in situ hybridization following morpholino knockdown in zebrafish embryos. Knockdown of three genes (pnpla3, pklr and mapk10 decreased expression of hepatic progenitor cells, whereas knockdown of eight genes (pnpla3, cpn1, trib1, fads2, slc2a2, pklr, mapk10 and samm50 decreased cell-specific hepatocyte expression. We then induced liver injury in zebrafish embryos using acetaminophen exposure and observed changes in liver toxicity incidence in morphants. Prioritization of GWAS candidates and morpholino knockdown expedites the study of newly identified genes impacting liver development and represents a feasible method for initial assessment of candidate genes to instruct further mechanistic analyses. Our analysis can be extended to GWAS for additional disease-associated phenotypes.

  3. Building a developmental toxicity ontology.

    Science.gov (United States)

    Baker, Nancy; Boobis, Alan; Burgoon, Lyle; Carney, Edward; Currie, Richard; Fritsche, Ellen; Knudsen, Thomas; Laffont, Madeleine; Piersma, Aldert H; Poole, Alan; Schneider, Steffen; Daston, George

    2018-04-03

    As more information is generated about modes of action for developmental toxicity and more data are generated using high-throughput and high-content technologies, it is becoming necessary to organize that information. This report discussed the need for a systematic representation of knowledge about developmental toxicity (i.e., an ontology) and proposes a method to build one based on knowledge of developmental biology and mode of action/ adverse outcome pathways in developmental toxicity. This report is the result of a consensus working group developing a plan to create an ontology for developmental toxicity that spans multiple levels of biological organization. This report provide a description of some of the challenges in building a developmental toxicity ontology and outlines a proposed methodology to meet those challenges. As the ontology is built on currently available web-based resources, a review of these resources is provided. Case studies on one of the most well-understood morphogens and developmental toxicants, retinoic acid, are presented as examples of how such an ontology might be developed. This report outlines an approach to construct a developmental toxicity ontology. Such an ontology will facilitate computer-based prediction of substances likely to induce human developmental toxicity. © 2018 Wiley Periodicals, Inc.

  4. An ontological analysis of the electrocardiogram - DOI: 10.3395/reciis.v3i1.242en

    Directory of Open Access Journals (Sweden)

    Bernardo Gonçalves

    2009-04-01

    Full Text Available Bioinformatics has been a fertile field for the application of the discipline of formal ontology. The principled representation of biomedical entities has increasingly supported biological research, with direct benefits ranging from the reformulation of medical terminologies to the introduction of new perspectives for enhanced models of Electronic Health Records (EHR. This paper introduces an application-independent ontological analysis of the electrocardiogram (ECG grounded in the Unified Foundational Ontology. With the objective of investigating the phenomena underlying this cardiological exam, we deal with the sub-domains of human heart electrophysiology and anatomy. We then outline an ECG Ontology built upon the OBO Relation Ontology. In addition, the domain ontology sketched here takes inspiration both in the Foundational Model of Anatomy and in the Ontology of Functions proposed under the auspices of the General Formal Ontology (GFO research program.

  5. Gene Discovery and Functional Analyses in the Model Plant Arabidopsis

    DEFF Research Database (Denmark)

    Feng, Cai-ping; Mundy, J.

    2006-01-01

    The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions, TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also discus...

  6. Expression and functional analysis of apoptosis-related gene ...

    African Journals Online (AJOL)

    Administrator

    2011-10-19

    Oct 19, 2011 ... conducted a molecular cloning and functional analysis to study a specific silkworm gene BmICAD related to apoptosis. .... blocking with 5% non-fat milk for 1 h at room temperature, the .... requirements for all next experiments.

  7. Bone marrow transplantations to study gene function in hematopoietic cells

    NARCIS (Netherlands)

    de Winther, Menno P. J.; Heeringa, Peter

    2011-01-01

    Immune cells are derived from hematopoietic stem cells in the bone marrow. Experimental replacement of bone marrow offers the unique possibility to replace immune cells, to study gene function in mouse models of disease. Over the past decades, this technique has been used extensively to study, for

  8. Aplicación de visualización de una ontología para el dominio del análisis del semen humano Application to visualize an ontology for the human semen analysis domain

    Directory of Open Access Journals (Sweden)

    Roberto Casañas

    2007-06-01

    Full Text Available En este trabajo se presenta el diseño e implementación de una ontología para el dominio del análisis del semen humano, cuyo objetivo es representar, organizar, formalizar y estandarizar el conocimiento del dominio, para que éste pueda ser compartido y reutilizado por distintos grupos de personas y aplicaciones de software. Para visualizar la ontología se desarrolló una aplicación basada en una arquitectura cliente/servidor para ambientes Web, la cual está constituida por un módulo de Administración y otro de Acceso Público. A través del primero se mantiene el sitio Web de la ontología, mientras que el segundo permite a los usuarios acceder al conocimiento almacenado y a un conjunto de recursos tales como imágenes, videos, artículos relativos al dominio, manuales y protocolos de laboratorio. La arquitectura propuesta facilita la observación y recuperación de las complejas estructuras de conocimiento, así como la navegación y administración de la información representada en la ontología. El enfoque utilizado en el diseño de los mecanismos de recuperación de información está dirigido tanto a usuarios poco familiarizados con el vocabulario del dominio, como a aquellos que ya lo conocen. Esta funcionalidad es de especial interés dado lo heterogénea que resulta la audiencia a la que está dirigida la ontología, como son profesionales y estudiantes de las ciencias de la salud, entre otros. La metodología Methontology fue seleccionada para desarrollar la ontología y se utilizó el editor Protégé para su implementación.The following work presents the design and implementation of an ontology for human semen analysis whose objective is to present, organize, formalize and standardize the domain knowledge, in order to be shared and reused by different groups of people and software applications. To visualize this ontology, a Web application based on a client/server architecture was developed, which is constituted by an

  9. The identification of functional motifs in temporal gene expression analysis

    Directory of Open Access Journals (Sweden)

    Michael G. Surette

    2005-01-01

    Full Text Available The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.

  10. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  11. There is no quantum ontology without classical ontology

    Energy Technology Data Exchange (ETDEWEB)

    Fink, Helmut [Institut fuer Theoretische Physik, Univ. Erlangen-Nuernberg (Germany)

    2011-07-01

    The relation between quantum physics and classical physics is still under debate. In his recent book ''Rational Reconstructions of Modern Physics'', Peter Mittelstaedt explores a route from classical to quantum mechanics by reduction and elimination of (some of) the ontological hypotheses underlying classical mechanics. While, according to Mittelstaedt, classical mechanics describes a fictitious world that does not exist in reality, he claims to achieve a universal quantum ontology that can be improved by incorporating unsharp properties and equipped with Planck's constant without any need to refer to classical concepts. In this talk, we argue that quantum ontology in Mittelstaedt's sense is not enough. Quantum ontology can never be universal as long as the difference between potential and real properties is not represented adequately. Quantum properties are potential, not (yet) real, be they sharp or unsharp. Hence, preparation and measurement presuppose classical concepts, even in quantum theory. We end up with a classical-quantum sandwich ontology, which is still less extravagant than Bohmian or many-worlds ontologies are.

  12. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    Science.gov (United States)

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Elucidating gene function and function evolution through comparison of co-expression networks in plants

    Directory of Open Access Journals (Sweden)

    Marek eMutwil

    2014-08-01

    Full Text Available The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23. In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We show that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that, in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution.

  14. Plant ion channels: gene families, physiology, and functional genomics analyses.

    Science.gov (United States)

    Ward, John M; Mäser, Pascal; Schroeder, Julian I

    2009-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization- and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide-gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport.

  15. Comparative genomics of Geobacter chemotaxis genes reveals diverse signaling function

    Directory of Open Access Journals (Sweden)

    Antommattei Frances M

    2008-10-01

    Full Text Available Abstract Background Geobacter species are δ-Proteobacteria and are often the predominant species in a variety of sedimentary environments where Fe(III reduction is important. Their ability to remediate contaminated environments and produce electricity makes them attractive for further study. Cell motility, biofilm formation, and type IV pili all appear important for the growth of Geobacter in changing environments and for electricity production. Recent studies in other bacteria have demonstrated that signaling pathways homologous to the paradigm established for Escherichia coli chemotaxis can regulate type IV pili-dependent motility, the synthesis of flagella and type IV pili, the production of extracellular matrix material, and biofilm formation. The classification of these pathways by comparative genomics improves the ability to understand how Geobacter thrives in natural environments and better their use in microbial fuel cells. Results The genomes of G. sulfurreducens, G. metallireducens, and G. uraniireducens contain multiple (~70 homologs of chemotaxis genes arranged in several major clusters (six, seven, and seven, respectively. Unlike the single gene cluster of E. coli, the Geobacter clusters are not all located near the flagellar genes. The probable functions of some Geobacter clusters are assignable by homology to known pathways; others appear to be unique to the Geobacter sp. and contain genes of unknown function. We identified large numbers of methyl-accepting chemotaxis protein (MCP homologs that have diverse sensing domain architectures and generate a potential for sensing a great variety of environmental signals. We discuss mechanisms for class-specific segregation of the MCPs in the cell membrane, which serve to maintain pathway specificity and diminish crosstalk. Finally, the regulation of gene expression in Geobacter differs from E. coli. The sequences of predicted promoter elements suggest that the alternative sigma factors

  16. IGF-I Gene Therapy in Aging Rats Modulates Hippocampal Genes Relevant to Memory Function.

    Science.gov (United States)

    Pardo, Joaquín; Abba, Martin C; Lacunza, Ezequiel; Ogundele, Olalekan M; Paiva, Isabel; Morel, Gustavo R; Outeiro, Tiago F; Goya, Rodolfo G

    2018-03-14

    In rats, learning and memory performance decline during normal aging, which makes this rodent species a suitable model to evaluate therapeutic strategies. In aging rats, insulin-like growth factor-I (IGF-I), is known to significantly improve spatial memory accuracy as compared to control counterparts. A constellation of gene expression changes underlie the hippocampal phenotype of aging but no studies on the effects of IGF-I on the hippocampal transcriptome of old rodents have been documented. Here, we assessed the effects of IGF-I gene therapy on spatial memory performance in old female rats and compared them with changes in the hippocampal transcriptome. In the Barnes maze test, experimental rats showed a significantly higher exploratory frequency of the goal hole than controls. Hippocampal RNA-sequencing showed that 219 genes are differentially expressed in 28-month-old rats intracerebroventricularly injected with an adenovector expressing rat IGF-I as compared with placebo adenovector-injected counterparts. From the differentially expressed genes, 81 were down and 138 upregulated. From those genes, a list of functionally relevant genes, concerning hippocampal IGF-I expression, synaptic plasticity as well as neuronal function was identified. Our results provide an initial glimpse at the molecular mechanisms underlying the neuroprotective actions of IGF-I in the aging brain.

  17. development of ontological knowledge representation

    African Journals Online (AJOL)

    Preferred Customer

    ABSTRACT. This paper presents the development of an ontological knowledge organization and .... intelligence in order to facilitate knowledge sharing and reuse of acquired knowledge (15). Soon, ..... Water Chemistry, AJCE, 1(2), 50-58. 25.

  18. A Mobile Army of Ontologies

    DEFF Research Database (Denmark)

    Juul, Jesper

    2015-01-01

    Presentation at the Ludo-ontologies panel. Do we need ludo-ontologies, and what are they? In this event several scholars of games and videogames discuss these questions from a variety of perspectives. What different game and videogame ontologies exist and could exist, and why they are important...... for game and videogame research? The round table is designed to promote ludo-ontological dialogue in order to make these questions visible and debated. A series of short presentations (approximately 10 minutes each) will be followed by an intense debate through freeform dialogue. After the industrial...... commercialization of games and videogames their study has shifted between approaches focused on players (ludic processes) and artifacts (ludic objects). Some attempts to analyze the relationship between the process and the object have occasionally been done in terms of ‘ontology’ (Zagal 2005; Leino 2010; Gualeni...

  19. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  20. Functional modules by relating protein interaction networks and gene expression.

    Science.gov (United States)

    Tornow, Sabine; Mewes, H W

    2003-11-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.

  1. Ontologies and Formation Spaces for Conceptual ReDesign of Systems

    Directory of Open Access Journals (Sweden)

    J. Bíla

    2005-01-01

    Full Text Available This paper discusses ontologies, methods for developing them and languages for representing them. A special ontology for computational support of the Conceptual ReDesign Process (CRDP is introduced with a simple illustrative example of an application. The ontology denoted as Global context (GLB combines features of general semantic networks and features of UML language. The ontology is task-oriented and domain-oriented, and contains three basic strata – GLBExpl(stratum of Explanation, GLBFAct (stratum of Fields of Activities and GLBEnv (stratum of Environment, with their sub-strata. The ontology has been developed to represent functions of systems and their components in CRDP. The main difference between this ontology and ontologies which have been developed to identify functions (the semantic details in those ontologies must be as deep as possible is in the style of the description of the functions. In the proposed ontology, Formation Spaces were used as lower semantic categories the semantic deepness of which is variable and depends on the actual solution approach of a specialised Conceptual Designer.

  2. Building a Chemical Ontology using Methontology and the Ontology Design Environment

    OpenAIRE

    Fernández López, Mariano; Gómez-Pérez, A.; Pazos Sierra, Alejandro; Pazos Sierra, Juan

    1999-01-01

    METHONTOLOGY PROVIDES GUIDELINES FOR SPECIFYING ONTOLOGIES AT THE KNOWLEDGE LEVEL, AS A SPECIFICATION OF A CONCEPTUALIZATION. ODE ENABLES ONTOLOGY CONSTRUCTION, COVERING THE ENTIRE LIFE CYCLE AND AUTOMATICALLY IMPLEMENTING ONTOLOGIES

  3. SITEX 2.0: Projections of protein functional sites on eukaryotic genes. Extension with orthologous genes.

    Science.gov (United States)

    Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

    2017-04-01

    Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .

  4. miRNA-mediated functional changes through co-regulating function related genes.

    Directory of Open Access Journals (Sweden)

    Jie He

    Full Text Available BACKGROUND: MicroRNAs play important roles in various biological processes involving fairly complex mechanism. Analysis of genome-wide miRNA microarray demonstrate that a single miRNA can regulate hundreds of genes, but the regulative extent on most individual genes is surprisingly mild so that it is difficult to understand how a miRNA provokes detectable functional changes with such mild regulation. RESULTS: To explore the internal mechanism of miRNA-mediated regulation, we re-analyzed the data collected from genome-wide miRNA microarray with bioinformatics assay, and found that the transfection of miR-181b and miR-34a in Hela and HCT-116 tumor cells regulated large numbers of genes, among which, the genes related to cell growth and cell death demonstrated high Enrichment scores, suggesting that these miRNAs may be important in cell growth and cell death. MiR-181b induced changes in protein expression of most genes that were seemingly related to enhancing cell growth and decreasing cell death, while miR-34a mediated contrary changes of gene expression. Cell growth assays further confirmed this finding. In further study on miR-20b-mediated osteogenesis in hMSCs, miR-20b was found to enhance osteogenesis by activating BMPs/Runx2 signaling pathway in several stages by co-repressing of PPARγ, Bambi and Crim1. CONCLUSIONS: With its multi-target characteristics, miR-181b, miR-34a and miR-20b provoked detectable functional changes by co-regulating functionally-related gene groups or several genes in the same signaling pathway, and thus mild regulation from individual miRNA targeting genes could have contributed to an additive effect. This might also be one of the modes of miRNA-mediated gene regulation.

  5. Huntington's disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database.

    Science.gov (United States)

    Kalathur, Ravi Kiran Reddy; Hernández-Prieto, Miguel A; Futschik, Matthias E

    2012-06-28

    Huntington's disease (HD) is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling), but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling). For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are associated with HD, at http://hdtt.sysbiolab.eu Additionally

  6. Huntington's Disease and its therapeutic target genes: a global functional profile based on the HD Research Crossroads database

    Directory of Open Access Journals (Sweden)

    Kalathur Ravi Kiran

    2012-06-01

    Full Text Available Abstract Background Huntington’s disease (HD is a fatal progressive neurodegenerative disorder caused by the expansion of the polyglutamine repeat region in the huntingtin gene. Although the disease is triggered by the mutation of a single gene, intensive research has linked numerous other genes to its pathogenesis. To obtain a systematic overview of these genes, which may serve as therapeutic targets, CHDI Foundation has recently established the HD Research Crossroads database. With currently over 800 cataloged genes, this web-based resource constitutes the most extensive curation of genes relevant to HD. It provides us with an unprecedented opportunity to survey molecular mechanisms involved in HD in a holistic manner. Methods To gain a synoptic view of therapeutic targets for HD, we have carried out a variety of bioinformatical and statistical analyses to scrutinize the functional association of genes curated in the HD Research Crossroads database. In particular, enrichment analyses were performed with respect to Gene Ontology categories, KEGG signaling pathways, and Pfam protein families. For selected processes, we also analyzed differential expression, using published microarray data. Additionally, we generated a candidate set of novel genetic modifiers of HD by combining information from the HD Research Crossroads database with previous genome-wide linkage studies. Results Our analyses led to a comprehensive identification of molecular mechanisms associated with HD. Remarkably, we not only recovered processes and pathways, which have frequently been linked to HD (such as cytotoxicity, apoptosis, and calcium signaling, but also found strong indications for other potentially disease-relevant mechanisms that have been less intensively studied in the context of HD (such as the cell cycle and RNA splicing, as well as Wnt and ErbB signaling. For follow-up studies, we provide a regularly updated compendium of molecular mechanism, that are

  7. Functional analyses of cellulose synthase genes in flax (Linum usitatissimum) by virus-induced gene silencing.

    Science.gov (United States)

    Chantreau, Maxime; Chabbert, Brigitte; Billiard, Sylvain; Hawkins, Simon; Neutelings, Godfrey

    2015-12-01

    Flax (Linum usitatissimum) bast fibres are located in the stem cortex where they play an important role in mechanical support. They contain high amounts of cellulose and so are used for linen textiles and in the composite industry. In this study, we screened the annotated flax genome and identified 14 distinct cellulose synthase (CESA) genes using orthologous sequences previously identified. Transcriptomics of 'primary cell wall' and 'secondary cell wall' flax CESA genes showed that some were preferentially expressed in different organs and stem tissues providing clues as to their biological role(s) in planta. The development for the first time in flax of a virus-induced gene silencing (VIGS) approach was used to functionally evaluate the biological role of different CESA genes in stem tissues. Quantification of transcript accumulation showed that in many cases, silencing not only affected targeted CESA clades, but also had an impact on other CESA genes. Whatever the targeted clade, inactivation by VIGS affected plant growth. In contrast, only clade 1- and clade 6-targeted plants showed modifications in outer-stem tissue organization and secondary cell wall formation. In these plants, bast fibre number and structure were severely impacted, suggesting that the targeted genes may play an important role in the establishment of the fibre cell wall. Our results provide new fundamental information about cellulose biosynthesis in flax that should facilitate future plant improvement/engineering. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  8. Microarray analysis of differentially expressed genes and their functions in omental visceral adipose tissues of pregnant women with vs. without gestational diabetes mellitus

    Science.gov (United States)

    Qian, Yuan; Sun, Hao; Xiao, Hongli; Ma, Meirun; Xiao, Xue; Qu, Qinzai

    2017-01-01

    Increasing evidence has shown that insulin resistance in omental visceral adipose tissue (OVAT) is a characteristic of gestational diabetes mellitus (GDM). The present study aimed to identify differentially expressed genes (DEGs) and their associated functions and pathways involved in the pathogenesis of GDM by comparing the expression profiles of OVATs obtained from pregnant Chinese women with and without GDM during caesarian section. A total of 935 DEGs were identified, including 450 downregulated and 485 upregulated genes. In the gene ontology category cellular components, the DEGs were predominantly associated with functions of the extracellular region, while receptor binding was predominant in the molecular function category and biological process terms included antigen processing and presentation, extracellular matrix organization, positive regulation of cell-substrate adhesion, response to nutrients and response to dietary excess. Functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were performed and a functional interaction network was constructed. Functions of downregulated genes included antigen processing and presentation as well as cell adhesion molecules, while those of upregulated genes included transforming growth factor (TGF)-β-signaling, focal adhesion, phosphoinositide-3 kinase-Akt-signaling, P53 signaling, extracellular matrix-receptor interaction and regulation of actin cytoskeleton pathway. The five main pathways associated with GDM were antigen processing and presentation, cell adhesion molecules, Type 1 diabetes mellitus, natural killer cell-mediated cytotoxicity and TGF-β signaling. These pathways were included in the KEGG pathway categories of ‘signaling molecules and interaction’, ‘immune system’ and ‘inflammatory response’, suggesting that these processes are involved in GDM. The results of the present study enhanced the present understanding of the mechanisms associated with insulin

  9. Knowledge Enrichment Analysis for Human Tissue- Specific Genes Uncover New Biological Insights

    Directory of Open Access Journals (Sweden)

    Gong Xiu-Jun

    2012-06-01

    Full Text Available The expression and regulation of genes in different tissues are fundamental questions to be answered in biology. Knowledge enrichment analysis for tissue specific (TS and housekeeping (HK genes may help identify their roles in biological process or diseases and gain new biological insights.In this paper, we performed the knowledge enrichment analysis for 17,343 genes in 84 human tissues using Gene Set Enrichment Analysis (GSEA and Hypergeometric Analysis (HA against three biological ontologies: Gene Ontology (GO, KEGG pathways and Disease Ontology (DO respectively.The analyses results demonstrated that the functions of most gene groups are consistent with their tissue origins. Meanwhile three interesting new associations for HK genes and the skeletal muscle tissuegenes are found. Firstly, Hypergeometric analysis against KEGG database for HK genes disclosed that three disease terms (Parkinson’s disease, Huntington’s disease, Alzheimer’s disease are intensively enriched.Secondly, Hypergeometric analysis against the KEGG database for Skeletal Muscle tissue genes shows that two cardiac diseases of “Hypertrophic cardiomyopathy (HCM” and “Arrhythmogenic right ventricular cardiomyopathy (ARVC” are heavily enriched, which are also considered as no relationship with skeletal functions.Thirdly, “Prostate cancer” is intensively enriched in Hypergeometric analysis against the disease ontology (DO for the Skeletal Muscle tissue genes, which is a much unexpected phenomenon.

  10. Gene expression profiling for human iPS-derived motor neurons from sporadic ALS patients reveals a strong association between mitochondrial functions and neurodegeneration

    Science.gov (United States)

    Alves, Chrystian J.; Dariolli, Rafael; Jorge, Frederico M.; Monteiro, Matheus R.; Maximino, Jessica R.; Martins, Roberto S.; Strauss, Bryan E.; Krieger, José E.; Callegaro, Dagoberto; Chadi, Gerson

    2015-01-01

    Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease that leads to widespread motor neuron death, general palsy and respiratory failure. The most prevalent sporadic ALS form is not genetically inherited. Attempts to translate therapeutic strategies have failed because the described mechanisms of disease are based on animal models carrying specific gene mutations and thus do not address sporadic ALS. In order to achieve a better approach to study the human disease, human induced pluripotent stem cell (hiPSC)-differentiated motor neurons were obtained from motor nerve fibroblasts of sporadic ALS and non-ALS subjects using the STEMCCA Cre-Excisable Constitutive Polycistronic Lentivirus system and submitted to microarray analyses using a whole human genome platform. DAVID analyses of differentially expressed genes identified molecular function and biological process-related genes through Gene Ontology. REVIGO highlighted the related functions mRNA and DNA binding, GTP binding, transcription (co)-repressor activity, lipoprotein receptor binding, synapse organization, intracellular transport, mitotic cell cycle and cell death. KEGG showed pathways associated with Parkinson's disease and oxidative phosphorylation, highlighting iron homeostasis, neurotrophic functions, endosomal trafficking and ERK signaling. The analysis of most dysregulated genes and those representative of the majority of categorized genes indicates a strong association between mitochondrial function and cellular processes possibly related to motor neuron degeneration. In conclusion, iPSC-derived motor neurons from motor nerve fibroblasts of sporadic ALS patients may recapitulate key mechanisms of neurodegeneration and may offer an opportunity for translational investigation of sporadic ALS. Large gene profiling of differentiated motor neurons from sporadic ALS patients highlights mitochondrial participation in the establishment of autonomous mechanisms associated with sporadic ALS

  11. Protein-protein networks construction and their relevance measurement based on multi-epitope-ligand-kartographie and gene ontology data of T-cell surface proteins for polymyositis.

    Science.gov (United States)

    Li, Fang-Zhen; Gao, Feng

    2012-08-01

    Polymyositis is an inflammatory myopathy characterized by muscle invasion of T-cells penetrating the basal lamina and displacing the plasma membrane of normal muscle fibers. In order to understand the different adhesive mechanisms at the T-cell surface, Schubert randomly selected 19 proteins expressed at the T-cell surface and studied them using MELK technique [4], among which 15 proteins are picked up for further study by us. Two types of functional similarity networks are constructed for these proteins. The first type is MELK similarity network, which is constructed based on their MELK data by using the McNemar's test [24]. The second type is GO similarity network, which is constructed based on their GO annotation data by using the RSS method to measuring functional similarity. Then the subset surprisology theory is employed to measure the degree of similarity between two networks. Our computing results show that these two types of networks are high related. This conclusion added new values on MELK technique and expanded its applications greatly.

  12. Functional characterization of a Penicillium chrysogenum mutanase gene induced upon co-cultivation with Bacillus subtilis

    NARCIS (Netherlands)

    Bajaj, I.; Veiga, T.; Van Dissel, D.; Pronk, J.T.; Daran, J.M.

    2014-01-01

    Background Microbial gene expression is strongly influenced by environmental growth conditions. Comparison of gene expression under different conditions is frequently used for functional analysis and to unravel regulatory networks, however, gene expression responses to co-cultivation with other

  13. Aligning ontologies and integrating textual evidence for pathway analysis of microarray data

    Energy Technology Data Exchange (ETDEWEB)

    Gopalan, Banu; Posse, Christian; Sanfilippo, Antonio P.; Stenzel-Poore, Mary; Stevens, S.L.; Castano, Jose; Beagley, Nathaniel; Riensche, Roderick M.; Baddeley, Bob; Simon, R.P.; Pustejovsky, James

    2006-10-08

    Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

  14. Use of the CIM Ontology

    Energy Technology Data Exchange (ETDEWEB)

    Neumann, Scott; Britton, Jay; Devos, Arnold N.; Widergren, Steven E.

    2006-02-08

    There are many uses for the Common Information Model (CIM), an ontology that is being standardized through Technical Committee 57 of the International Electrotechnical Commission (IEC TC57). The most common uses to date have included application modeling, information exchanges, information management and systems integration. As one should expect, there are many issues that become apparent when the CIM ontology is applied to any one use. Some of these issues are shortcomings within the current draft of the CIM, and others are a consequence of the different ways in which the CIM can be applied using different technologies. As the CIM ontology will and should evolve, there are several dangers that need to be recognized. One is overall consistency and impact upon applications when extending the CIM for a specific need. Another is that a tight coupling of the CIM to specific technologies could limit the value of the CIM in the longer term as an ontology, which becomes a larger issue over time as new technologies emerge. The integration of systems is one specific area of interest for application of the CIM ontology. This is an area dominated by the use of XML for the definition of messages. While this is certainly true when using Enterprise Application Integration (EAI) products, it is even more true with the movement towards the use of Web Services (WS), Service-Oriented Architectures (SOA) and Enterprise Service Buses (ESB) for integration. This general IT industry trend is consistent with trends seen within the IEC TC57 scope of power system management and associated information exchange. The challenge for TC57 is how to best leverage the CIM ontology using the various XML technologies and standards for integration. This paper will provide examples of how the CIM ontology is used and describe some specific issues that should be addressed within the CIM in order to increase its usefulness as an ontology. It will also describe some of the issues and challenges that will

  15. Assessment of community-submitted ontology annotations from a novel database-journal partnership.

    Science.gov (United States)

    Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

    2012-01-01

    As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.

  16. Sponge Microbiota are a Reservoir of Functional Antibiotic Resistance Genes

    DEFF Research Database (Denmark)

    Versluis, Dennis; de Evgrafov, Mari Cristina Rodriguez; Sommer, Morten Otto Alexander

    2016-01-01

    examined sponges as a reservoir of antibiotic resistance. Sponges could be important in this respect because they often contain diverse microbial communities that have the capacity to produce bioactive metabolites. Here, we applied functional metagenomics to study the presence and diversity of functional...... resistance genes in the sponges Aplysina aerophoba, Petrosia ficiformis, and Corticium candelabrum. We obtained 37 insert sequences facilitating resistance to D-cycloserine (n = 6), gentamicin (n = 1), amikacin (n = 7), trimethoprim (n = 17), chloramphenicol (n = 1), rifampicin (n = 2) and ampicillin (n = 3......-resistance-conferring β-lactamase was identified in the genus Pseudovibrio with 41% global amino acid identity to the closest β-lactamase with demonstrated functionality, and subsequently classified into a new family termed PSV. Taken together, our results show that sponge microbiota host diverse and novel resistance...

  17. Transcriptome analysis by GeneTrail revealed regulation of functional categories in response to alterations of iron homeostasis in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Lenhof Hans-Peter

    2011-05-01

    Full Text Available Abstract Background High-throughput technologies have opened new avenues to study biological processes and pathways. The interpretation of the immense amount of data sets generated nowadays needs to be facilitated in order to enable biologists to identify complex gene networks and functional pathways. To cope with this task multiple computer-based programs have been developed. GeneTrail is a freely available online tool that screens comparative transcriptomic data for differentially regulated functional categories and biological pathways extracted from common data bases like KEGG, Gene Ontology (GO, TRANSPATH and TRANSFAC. Additionally, GeneTrail offers a feature that allows screening of individually defined biological categories that are relevant for the respective research topic. Results We have set up GeneTrail for the use of Arabidopsis thaliana. To test the functionality of this tool for plant analysis, we generated transcriptome data of root and leaf responses to Fe deficiency and the Arabidopsis metal homeostasis mutant nas4x-1. We performed Gene Set Enrichment Analysis (GSEA with eight meaningful pairwise comparisons of transcriptome data sets. We were able to uncover several functional pathways including metal homeostasis that were affected in our experimental situations. Representation of the differentially regulated functional categories in Venn diagrams uncovered regulatory networks at the level of whole functional pathways. Over-Representation Analysis (ORA of differentially regulated genes identified in pairwise comparisons revealed specific functional plant physiological categories as major targets upon Fe deficiency and in nas4x-1. Conclusion Here, we obtained supporting evidence, that the nas4x-1 mutant was defective in metal homeostasis. It was confirmed that nas4x-1 showed Fe deficiency in roots and signs of Fe deficiency and Fe sufficiency in leaves. Besides metal homeostasis, biotic stress, root carbohydrate, leaf

  18. Toward semantic interoperability with linked foundational ontologies in ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-06-01

    Full Text Available A purpose of a foundational ontology is to solve interoperability issues among ontologies. Many foundational ontologies have been developed, reintroducing the ontology interoperability problem. We address this with the new online foundational...

  19. Memory functions reveal structural properties of gene regulatory networks

    Science.gov (United States)

    Perez-Carrasco, Ruben

    2018-01-01

    Gene regulatory networks (GRNs) control cellular function and decision making during tissue development and homeostasis. Mathematical tools based on dynamical systems theory are often used to model these networks, but the size and complexity of these models mean that their behaviour is not always intuitive and the underlying mechanisms can be difficult to decipher. For this reason, methods that simplify and aid exploration of complex networks are necessary. To this end we develop a broadly applicable form of the Zwanzig-Mori projection. By first converting a thermodynamic state ensemble model of gene regulation into mass action reactions we derive a general method that produces a set of time evolution equations for a subset of components of a network. The influence of the rest of the network, the bulk, is captured by memory functions that describe how the subnetwork reacts to its own past state via components in the bulk. These memory functions provide probes of near-steady state dynamics, revealing information not easily accessible otherwise. We illustrate the method on a simple cross-repressive transcriptional motif to show that memory functions not only simplify the analysis of the subnetwork but also have a natural interpretation. We then apply the approach to a GRN from the vertebrate neural tube, a well characterised developmental transcriptional network composed of four interacting transcription factors. The memory functions reveal the function of specific links within the neural tube network and identify features of the regulatory structure that specifically increase the robustness of the network to initial conditions. Taken together, the study provides evidence that Zwanzig-Mori projections offer powerful and effective tools for simplifying and exploring the behaviour of GRNs. PMID:29470492

  20. Functional imaging: monitoring heme oxygenase-1 gene expression in vivo

    Science.gov (United States)

    Zhang, Weisheng; Reilly-Contag, Pamela; Stevenson, David K.; Contag, Christopher H.

    1999-07-01

    The regulation of genetic elements can be monitored in living animals using photoproteins as reporters. Heme oxygenase (HO) is the key catabolic enzyme in the heme degradation pathway. Here, HO expression serves as a model for in vivo functional imaging of transcriptional regulation of a clinically relevant gene. HO enzymatic activity is inhibited by heme analogs, metalloporphyrins, but many members of this family of compounds also activate transcription of the HO-1 promoter. The degree of transcriptional activation by twelve metalloporphyrins, differing at the central metal and porphyrin ring substituents, was evaluated in both NIH 3T3 stable lines and transgenic animals containing HO-1 promoter-luciferase gene fusions. In the correlative cell culture assays, the metalloporphyrins increased transcription form the full length HO promoter fusion to varying degrees, but none increased transcription from a truncated HO-1 promoter. These results suggested that one or both of the two distal enhancer elements located at -4 and -10 Kb upstream from transcriptional start are required for HO-1 induction by heme and its analogs. The full-length HO-1-luc fusion was then evaluated as a transgene in mice. It was possible to monitor the effects of the metalloporphyrins, SnMP and ZnPP, in living animals over time. This spatiotemporal analyses of gene expression in vivo implied that alterations in porphyrin ring substituents and the central metal may affect the extent of gene activation. These data further indicate that using photoprotein reporters, subtle differences in gene expression can be monitored in living animals.

  1. Complex Topographic Feature Ontology Patterns

    Science.gov (United States)

    Varanka, Dalia E.; Jerris, Thomas J.

    2015-01-01

    Semantic ontologies are examined as effective data models for the representation of complex topographic feature types. Complex feature types are viewed as integrated relations between basic features for a basic purpose. In the context of topographic science, such component assemblages are supported by resource systems and found on the local landscape. Ontologies are organized within six thematic modules of a domain ontology called Topography that includes within its sphere basic feature types, resource systems, and landscape types. Context is constructed not only as a spatial and temporal setting, but a setting also based on environmental processes. Types of spatial relations that exist between components include location, generative processes, and description. An example is offered in a complex feature type ‘mine.’ The identification and extraction of complex feature types are an area for future research.

  2. Geographic Ontologies, Gazetteers and Multilingualism

    Directory of Open Access Journals (Sweden)

    Robert Laurini

    2015-01-01

    Full Text Available Different languages imply different visions of space, so that terminologies are different in geographic ontologies. In addition to their geometric shapes, geographic features have names, sometimes different in diverse languages. In addition, the role of gazetteers, as dictionaries of place names (toponyms, is to maintain relations between place names and location. The scope of geographic information retrieval is to search for geographic information not against a database, but against the whole Internet: but the Internet stores information in different languages, and it is of paramount importance not to remain stuck to a unique language. In this paper, our first step is to clarify the links between geographic objects as computer representations of geographic features, ontologies and gazetteers designed in various languages. Then, we propose some inference rules for matching not only types, but also relations in geographic ontologies with the assistance of gazetteers.

  3. Ontology Matching with Semantic Verification.

    Science.gov (United States)

    Jean-Mary, Yves R; Shironoshita, E Patrick; Kabuka, Mansur R

    2009-09-01

    ASMOV (Automated Semantic Matching of Ontologies with Verification) is a novel algorithm that uses lexical and structural characteristics of two ontologies to iteratively calculate a similarity measure between them, derives an alignment, and then verifies it to ensure that it does not contain semantic inconsistencies. In this paper, we describe the ASMOV algorithm, and then present experimental results that measure its accuracy using the OAEI 2008 tests, and that evaluate its use with two different thesauri: WordNet, and the Unified Medical Language System (UMLS). These results show the increased accuracy obtained by combining lexical, structural and extensional matchers with semantic verification, and demonstrate the advantage of using a domain-specific thesaurus for the alignment of specialized ontologies.

  4. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  5. Terminological Ontologies for Risk and Vulnerability Analysis

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2014-01-01

    Risk and vulnerability analyses are an important preliminary stage in civil contingency planning. The Danish Emergency Management Agency has developed a generic model and a set of tools that may be used in the preparedness planning, i.e. for identifying and describing society’s critical functions......, for formulating threat scenarios and for assessing consequences. Terminological ontologies, which are systems of domain specific concepts comprising concept relations and characteristics, are useful, both when describing the central concepts of risk and vulnerability analysis (meta concepts), and for further...

  6. Functional analysis of mating type genes and transcriptome analysis during fruiting body development of botrytis cinerea

    NARCIS (Netherlands)

    Rodenburg, Sander Y.A.; Terhem, Razak B.; Veloso, Javier; Stassen, Joost H.M.; Kan, van Jan A.L.

    2018-01-01

    Botrytis cinerea is a plant-pathogenic fungus producing apothecia as sexual fruiting bodies. To study the function of mating type (MAT) genes, single-gene deletion mutants were generated in both genes of the MAT1-1 locus and both genes of the MAT1-2 locus. Deletion mutants in two MAT genes were

  7. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

    Science.gov (United States)

    Wang, Xiao; Zhang, Jun; Li, Guo-Zheng

    2015-01-01

    It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible

  8. Induction of Protective Genes Leads to Islet Survival and Function

    Directory of Open Access Journals (Sweden)

    Hongjun Wang

    2011-01-01

    Full Text Available Islet transplantation is the most valid approach to the treatment of type 1 diabetes. However, the function of transplanted islets is often compromised since a large number of β cells undergo apoptosis induced by stress and the immune rejection response elicited by the recipient after transplantation. Conventional treatment for islet transplantation is to administer immunosuppressive drugs to the recipient to suppress the immune rejection response mounted against transplanted islets. Induction of protective genes in the recipient (e.g., heme oxygenase-1 (HO-1, A20/tumor necrosis factor alpha inducible protein3 (tnfaip3, biliverdin reductase (BVR, Bcl2, and others or administration of one or more of the products of HO-1 to the donor, the islets themselves, and/or the recipient offers an alternative or synergistic approach to improve islet graft survival and function. In this perspective, we summarize studies describing the protective effects of these genes on islet survival and function in rodent allogeneic and xenogeneic transplantation models and the prevention of onset of diabetes, with emphasis on HO-1, A20, and BVR. Such approaches are also appealing to islet autotransplantation in patients with chronic pancreatitis after total pancreatectomy, a procedure that currently only leads to 1/3 of transplanted patients being diabetes-free.

  9. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  10. Inferring ontology graph structures using OWL reasoning

    KAUST Repository

    Rodriguez-Garcia, Miguel Angel; Hoehndorf, Robert

    2018-01-01

    ' semantic content remains a challenge.We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies

  11. Ontologies, Knowledge Bases and Knowledge Management

    National Research Council Canada - National Science Library

    Chalupsky, Hans

    2002-01-01

    ...) an application called Strategy Development Assistant (SDA) that uses that ontology. The JFACC ontology served as a basis for knowledge sharing among several applications in the domain of air campaign planning...

  12. Addressing issues in foundational ontology mediation

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available An approach in achieving semantic interoperability among heterogeneous systems is to offer infrastructure to assist with linking and integration using a foundational ontology. Due to the creation of multiple foundational ontologies, this also means...

  13. Technique for designing a domain ontology

    OpenAIRE

    Palagin, A. V.; Petrenko, N. G.; Malakhov, K. S.

    2018-01-01

    The article describes the technique for designing a domain ontology, shows the flowchart of algorithm design and example of constructing a fragment of the ontology of the subject area of Computer Science is considered.

  14. Platonic wholes and quantum ontology

    CERN Document Server

    Woszczek, Marek

    2015-01-01

    The subject of the book is a reconsideration of the internalistic model of composition of the Platonic type, more radical than traditional, post-Aristotelian externalistic compositionism, and its application in the field of the ontology of quantum theory. At the centre of quantum ontology is nonseparability. Quantum wholes are atemporal wholes governed by internalistic logic and they are primitive, global physical entities, requiring an extreme relativization of the fundamental notions of mechanics. That ensures quantum theory to be fully consistent with the relativistic causal structure, with

  15. Multimedia ontology representation and applications

    CERN Document Server

    Chaudhury, Santanu; Ghosh, Hiranmay

    2015-01-01

    The result of more than 15 years of collective research, Multimedia Ontology: Representation and Applications provides a theoretical foundation for understanding the nature of media data and the principles involved in its interpretation. The book presents a unified approach to recent advances in multimedia and explains how a multimedia ontology can fill the semantic gap between concepts and the media world. It relays real-life examples of implementations in different domains to illustrate how this gap can be filled.The book contains information that helps with building semantic, content-based

  16. Root justifications for ontology repair

    CSIR Research Space (South Africa)

    Moodley, K

    2011-08-01

    Full Text Available stream_source_info Moodley_2011.pdf.txt stream_content_type text/plain stream_size 32328 Content-Encoding ISO-8859-1 stream_name Moodley_2011.pdf.txt Content-Type text/plain; charset=ISO-8859-1 Root Justi cations... the ontology, based on the no- tion of root justi cations [8, 9]. In Section 5, we discuss the implementation of a Prot eg e3 plugin which demonstrates our approach to ontology repair. In this section we also discuss some experimental results comparing...

  17. Functional Associations by Response Overlap (FARO), a functional genomics approach matching gene expression phenotypes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Mundy, J.; Willenbrock, Hanni

    2007-01-01

    The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental facto...

  18. Effects of traditional Japanese massage therapy on gene expression: preliminary study.

    Science.gov (United States)

    Donoyama, Nozomi; Ohkoshi, Norio

    2011-06-01

    Changes in gene expression after traditional Japanese massage therapy were investigated to clarify the mechanisms of the clinical effects of traditional Japanese massage therapy. This was a pilot experimental study. The study was conducted in a laboratory at Tsukuba University of Technology. The subjects were 2 healthy female volunteers (58-year-old Participant A, 55-year-old Participant B). The intervention consisted of a 40-minute full-body massage using standard traditional Japanese massage techniques through the clothing and a 40-minute rest as a control, in which participants lie on the massage table without being massaged. Before and after an intervention, blood was taken and analyzed by microarray: (1) The number of genes whose expression was more than double after the intervention than before was examined; (2) For those genes, gene ontology analysis identified statistically significant gene ontology terms. The gene expression count in the total of 41,000 genes was 1256 genes for Participant A and 1778 for Participant B after traditional Japanese massage, and was 157 and 82 after the control, respectively. The significant gene ontology terms selected by both Participants A and B after massage were "immune response" and "immune system," whereas no gene ontology terms were selected by them in the control. It is implied that traditional Japanese massage therapy may affect the immune function. Further studies with more samples are necessary.

  19. Towards ontology based search and knowledgesharing using domain ontologies

    DEFF Research Database (Denmark)

    Zambach, Sine

    verbs for relations in the ontology modeling. For this work we use frequency lists from a biomedical text corpus of different genres as well as a study of the relations used in other biomedical text mining tools. In addition, we discuss how these relations can be used in broarder perspective....

  20. An Ontology for Knowledge Representation and Applications

    OpenAIRE

    Nhon Do

    2008-01-01

    Ontology is a terminology which is used in artificial intelligence with different meanings. Ontology researching has an important role in computer science and practical applications, especially distributed knowledge systems. In this paper we present an ontology which is called Computational Object Knowledge Base Ontology. It has been used in designing some knowledge base systems for solving problems such as the system that supports studying knowledge and solving analytic ...

  1. On Algebraic Spectrum of Ontology Evaluation

    OpenAIRE

    Adekoya Adebayo Felix; kinwale Adio Taofiki; Sofoluwe Adetokunbo

    2011-01-01

    Ontology evaluation remains an important open problem in the area of its application. The ontology structure evaluation framework for benchmarking the internal graph structures was proposed. The framework was used in transport and biochemical ontology. The corresponding adjacency, incidence matrices and other structural properties due to the class hierarchical structure of the transport and biochemical ontology were computed using MATLAB. The results showed that the choice of suitable choice ...

  2. Zebrafish Lacking Circadian Gene per2 Exhibit Visual Function Deficiency

    Directory of Open Access Journals (Sweden)

    Deng-feng Huang

    2018-03-01

    Full Text Available The retina has an intrinsic circadian clock, but the importance of this clock for vision is unknown. Zebrafish offer many advantages for studying vertebrate vision and circadian rhythm. Here, we explored the role of zebrafish per2, a light-regulated gene, in visual behavior and the underlying mechanisms. We observed that per2 mutant zebrafish larvae showed decreased contrast sensitivity and visual acuity using optokinetic response (OKR assays. Using a visual motor response (VMR assay, we observed normal OFF responses but abnormal ON responses in mutant zebrafish larvae. Immunofluorescence showed that mutants had a normal morphology of cone photoreceptor cells and retinal organization. However, electron microscopy showed that per2 mutants displayed abnormal and decreased photoreceptor ribbon synapses with arciform density, which resulted in retinal ON pathway defect. We also examined the expression of three cone opsins by quantitative real-time PCR (qRT-PCR, and the expression of long-wave-sensitive opsin (opn1lw and short-wave-sensitive opsin (opn1sw was reduced in mutant zebrafish larvae. qRT-PCR analyses also showed a down-regulation of the clock genes cry1ba and bmal1b in the adult eye of per2 mutant zebrafish. This study identified a mechanism by which a clock gene affects visual function and defined important roles of per2 in retinal information processing.

  3. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.

    Science.gov (United States)

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-02-21

    Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.

  4. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA)

    Science.gov (United States)

    Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg

    2014-01-01

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  5. Mining a database of single amplified genomes from Red Sea brine pool extremophiles – Improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA

    Directory of Open Access Journals (Sweden)

    Stefan Wolfgang Grötzinger

    2014-04-01

    Full Text Available Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs and poor homology of novel extremophile’s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the INDIGO data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes may translate into false positives when searching for specific functions. The Profile & Pattern Matching (PPM strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO-terms (which represent enzyme function profiles and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern. The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2,577 E.C. numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from 6 different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter and PROSITE IDs (pattern filter. Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.

  6. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA).

    KAUST Repository

    Grötzinger, Stefan W.

    2014-04-07

    Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile\\'s genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available

  7. An ontological approach to domain engineering

    NARCIS (Netherlands)

    Falbo, R.A.; Guizzardi, G.; Duarte, K.

    2002-01-01

    Domain engineering aims to support systematic reuse, focusing on modeling common knowledge in a problem domain. Ontologies have also been pointed as holding great promise for software reuse. In this paper, we present ODE (Ontology-based Domain Engineering), an ontological approach for domain

  8. Aspects of ontology visualization and integration

    NARCIS (Netherlands)

    Dmitrieva, Joelia Borisovna

    2011-01-01

    In this thesis we will describe and discuss methodologies for ontology visualization and integration. Two visualization methods will be elaborated. In one method the ontology is visualized with the node-link technique, and with the other method the ontology is visualized with the containment

  9. Gene-environment interaction and male reproductive function

    DEFF Research Database (Denmark)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L

    2010-01-01

    As genetic factors can hardly explain the changes taking place during short time spans, environmental and lifestyle-related factors have been suggested as the causes of time-related deterioration of male reproductive function. However, considering the strong heterogeneity of male fecundity between...... that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism...... of reproduction, namely environmental and lifestyle factors as the cause of sperm DNA damage. It remains to be investigated to what extent such genetic changes, by natural conception or through the use of assisted reproductive techniques, are transmitted to the next generation, thereby causing increased morbidity...

  10. Gene-environment interaction and male reproductive function

    DEFF Research Database (Denmark)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L

    2010-01-01

    As genetic factors can hardly explain the changes taking place during short time spans, environmental and lifestyle-related factors have been suggested as the causes of time-related deterioration of male reproductive function. However, considering the strong heterogeneity of male fecundity between...... and within populations, genetic variants might be important determinants of the individual susceptibility to the adverse effects of environment or lifestyle. Although the possible mechanisms of such interplay in relation to the reproductive system are largely unknown, some recent studies have indicated...... that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism...

  11. Spaceflight effects on T lymphocyte distribution, function and gene expression

    Science.gov (United States)

    Gridley, Daila S.; Slater, James M.; Luo-Owen, Xian; Rizvi, Asma; Chapes, Stephen K.; Stodieck, Louis S.; Ferguson, Virginia L.; Pecaut, Michael J.

    2009-01-01

    The immune system is highly sensitive to stressors present during spaceflight. The major emphasis of this study was on the T lymphocytes in C57BL/6NTac mice after return from a 13-day space shuttle mission (STS-118). Spleens and thymuses from flight animals (FLT) and ground controls similarly housed in animal enclosure modules (AEM) were evaluated within 3–6 h after landing. Phytohemagglutinin-induced splenocyte DNA synthesis was significantly reduced in FLT mice when based on both counts per minute and stimulation indexes (P < 0.05). Flow cytometry showed that CD3+ T and CD19+ B cell counts were low in spleens from the FLT group, whereas the number of NK1.1+ natural killer (NK) cells was increased (P < 0.01 for all three populations vs. AEM). The numerical changes resulted in a low percentage of T cells and high percentage of NK cells in FLT animals (P < 0.05). After activation of spleen cells with anti-CD3 monoclonal antibody, interleukin-2 (IL-2) was decreased, but IL-10, interferon-γ, and macrophage inflammatory protein-1α were increased in FLT mice (P < 0.05). Analysis of cancer-related genes in the thymus showed that the expression of 30 of 84 genes was significantly affected by flight (P < 0.05). Genes that differed from AEM controls by at least 1.5-fold were Birc5, Figf, Grb2, and Tert (upregulated) and Fos, Ifnb1, Itgb3, Mmp9, Myc, Pdgfb, S100a4, Thbs, and Tnf (downregulated). Collectively, the data show that T cell distribution, function, and gene expression are significantly modified shortly after return from the spaceflight environment. PMID:18988762

  12. Gene-environment interaction and male reproductive function

    Science.gov (United States)

    Axelsson, Jonatan; Bonde, Jens Peter; Giwercman, Yvonne L.; Rylander, Lars; Giwercman, Aleksander

    2010-01-01

    As genetic factors can hardly explain the changes taking place during short time spans, environmental and lifestyle-related factors have been suggested as the causes of time-related deterioration of male reproductive function. However, considering the strong heterogeneity of male fecundity between and within populations, genetic variants might be important determinants of the individual susceptibility to the adverse effects of environment or lifestyle. Although the possible mechanisms of such interplay in relation to the reproductive system are largely unknown, some recent studies have indicated that specific genotypes may confer a larger risk of male reproductive disorders following certain exposures. This paper presents a critical review of animal and human evidence on how genes may modify environmental effects on male reproductive function. Some examples have been found that support this mechanism, but the number of studies is still limited. This type of interaction studies may improve our understanding of normal physiology and help us to identify the risk factors to male reproductive malfunction. We also shortly discuss other aspects of gene-environment interaction specifically associated with the issue of reproduction, namely environmental and lifestyle factors as the cause of sperm DNA damage. It remains to be investigated to what extent such genetic changes, by natural conception or through the use of assisted reproductive techniques, are transmitted to the next generation, thereby causing increased morbidity in the offspring. PMID:20348940

  13. Clock gene evolution: seasonal timing, phylogenetic signal, or functional constraint?

    Science.gov (United States)

    Krabbenhoft, Trevor J; Turner, Thomas F

    2014-01-01

    Genetic determinants of seasonal reproduction are not fully understood but may be important predictors of organism responses to climate change. We used a comparative approach to study the evolution of seasonal timing within a fish community in a natural common garden setting. We tested the hypothesis that allelic length variation in the PolyQ domain of a circadian rhythm gene, Clock1a, corresponded to interspecific differences in seasonal reproductive timing across 5 native and 1 introduced cyprinid fishes (n = 425 individuals) that co-occur in the Rio Grande, NM, USA. Most common allele lengths were longer in native species that initiated reproduction earlier (Spearman's r = -0.70, P = 0.23). Clock1a allele length exhibited strong phylogenetic signal and earlier spawners were evolutionarily derived. Aside from length variation in Clock1a, all other amino acids were identical across native species, suggesting functional constraint over evolutionary time. Interestingly, the endangered Rio Grande silvery minnow (Hybognathus amarus) exhibited less allelic variation in Clock1a and observed heterozygosity was 2- to 6-fold lower than the 5 other (nonimperiled) species. Reduced genetic variation in this functionally important gene may impede this species' capacity to respond to ongoing environmental change.

  14. Alignment of ICNP? 2.0 Ontology and a proposed INCP? Brazilian Ontology1

    OpenAIRE

    Carvalho, Carina Maris Gaspar; Cubas, Marcia Regina; Malucelli, Andreia; da N?brega, Maria Miriam Lima

    2014-01-01

    OBJECTIVE: to align the International Classification for Nursing Practice (ICNP®) Version 2.0 ontology and a proposed INCP® Brazilian Ontology.METHOD: document-based, exploratory and descriptive study, the empirical basis of which was provided by the ICNP® 2.0 Ontology and the INCP® Brazilian Ontology. The ontology alignment was performed using a computer tool with algorithms to identify correspondences between concepts, which were organized and analyzed according to their presence or absence...

  15. Functional Associations by Response Overlap (FARO, a functional genomics approach matching gene expression phenotypes.

    Directory of Open Access Journals (Sweden)

    Henrik Bjørn Nielsen

    2007-08-01

    Full Text Available The systematic comparison of transcriptional responses of organisms is a powerful tool in functional genomics. For example, mutants may be characterized by comparing their transcript profiles to those obtained in other experiments querying the effects on gene expression of many experimental factors including treatments, mutations and pathogen infections. Similarly, drugs may be discovered by the relationship between the transcript profiles effectuated or impacted by a candidate drug and by the target disease. The integration of such data enables systems biology to predict the interplay between experimental factors affecting a biological system. Unfortunately, direct comparisons of gene expression profiles obtained in independent, publicly available microarray experiments are typically compromised by substantial, experiment-specific biases. Here we suggest a novel yet conceptually simple approach for deriving 'Functional Association(s by Response Overlap' (FARO between microarray gene expression studies. The transcriptional response is defined by the set of differentially expressed genes independent from the magnitude or direction of the change. This approach overcomes the limited comparability between studies that is typical for methods that rely on correlation in gene expression. We apply FARO to a compendium of 242 diverse Arabidopsis microarray experimental factors, including phyto-hormones, stresses and pathogens, growth conditions/stages, tissue types and mutants. We also use FARO to confirm and further delineate the functions of Arabidopsis MAP kinase 4 in disease and stress responses. Furthermore, we find that a large, well-defined set of genes responds in opposing directions to different stress conditions and predict the effects of different stress combinations. This demonstrates the usefulness of our approach for exploiting public microarray data to derive biologically meaningful associations between experimental factors. Finally, our

  16. [Gene deletion and functional analysis of the heptyl glycosyltransferase (waaF) gene in Vibrio parahemolyticus O-antigen cluster].

    Science.gov (United States)

    Zhao, Feng; Meng, Songsong; Zhou, Deqing

    2016-02-04

    To construct heptyl glycosyltransferase gene II (waaF) gene deletion mutant of Vibrio parahaemolyticus, and explore the function of the waaF gene in Vibrio parahaemolyticus. The waaF gene deletion mutant was constructed by chitin-based transformation technology using clinical isolates, and then the growth rate, morphology and serotypes were identified. The different sources (O3, O5 and O10) waaF gene complementations were constructed through E. coli S17λpir strains conjugative transferring with Vibrio parahaemolyticus, and the function of the waaF gene was further verified by serotypes. The waaF gene deletion mutant strain was successfully constructed and it grew normally. The growth rate and morphology of mutant were similar with the wild type strains (WT), but the mutant could not occurred agglutination reaction with O antisera. The O3 and O5 sources waaF gene complementations occurred agglutination reaction with O antisera, but the O10 sources waaF gene complementations was not. The waaF gene was related with O-antigen synthesis and it was the key gene of O-antigen synthesis pathway in Vibrio parahaemolyticus. The function of different sources waaF gene were not the same.

  17. CLO : The cell line ontology

    NARCIS (Netherlands)

    Sarntivijai, Sirarat; Lin, Yu; Xiang, Zuoshuang; Meehan, Terrence F.; Diehl, Alexander D.; Vempati, Uma D.; Schuerer, Stephan C.; Pang, Chao; Malone, James; Parkinson, Helen; Liu, Yue; Takatsuki, Terue; Saijo, Kaoru; Masuya, Hiroshi; Nakamura, Yukio; Brush, Matthew H.; Haendel, Melissa A.; Zheng, Jie; Stoeckert, Christian J.; Peters, Bjoern; Mungall, Christopher J.; Carey, Thomas E.; States, David J.; Athey, Brian D.; He, Yongqun

    2014-01-01

    Background: Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO

  18. Emotion Education without Ontological Commitment?

    Science.gov (United States)

    Kristjansson, Kristjan

    2010-01-01

    Emotion education is enjoying new-found popularity. This paper explores the "cosy consensus" that seems to have developed in education circles, according to which approaches to emotion education are immune from metaethical considerations such as contrasting rationalist and sentimentalist views about the moral ontology of emotions. I spell out five…

  19. Quantum physics and relational ontology

    Energy Technology Data Exchange (ETDEWEB)

    Cordovil, Joao [Center of Philosophy of Sciences of University of Lisbon (Portugal)

    2013-07-01

    The discovery of the quantum domain of reality put a serious ontological challenge, a challenge that is still well present in the recent developments of Quantum Physics. Physics was conceived from an atomistic conception of the world, reducing it, in all its diversity, to two types of entities: simple, individual and immutable entities (atoms, in metaphysical sense) and composite entities, resulting solely from combinations. Linear combinations, additive, indifferent to the structure or to the context. However, the discovery of wave-particle dualism and the developments in Quantum Field Theories and in Quantum Nonlinear Physical, showed that quantum entities are not, in metaphysical sense, neither simple, nor merely the result of linear (or additive) combinations. In other words, the ontological foundations of Physics revealed as inadequate to account for the nature of quantum entities. Then a fundamental challenge arises: How to think the ontic nature of these entities? In my view, this challenge appeals to a relational and dynamist ontology of physical entities. This is the central hypothesis of this communication. In this sense, this communication has two main intentions: 1) positively characterize this relational and dynamist ontology; 2) show some elements of its metaphysical suitability to contemporary Quantum Physics.

  20. Ontological problems of contemporary linguistics

    Directory of Open Access Journals (Sweden)

    А В Бондаренко

    2009-03-01

    Full Text Available The article studies linguistic ontology problems such as evolution of essential-existential views of language, interrelation within Being-Language-Man triad, linguistics gnosiological principles, language essence localization, and «expression» as language metalinguistic unit as well as architectonics of language personality et alia.

  1. An ontological approach to logistics

    NARCIS (Netherlands)

    Daniele, L.M.; Ferreira Pires, Luis; Zelm, M.; van Sinderen, Marten J.; Doumeingts, G.

    2013-01-01

    In today’s global market, the competitiveness of enterprises is strongly dictated by their ability to collaborate with other enterprises. Ontologies enable common understanding of concepts and have been acknowledged as a powerful means to foster collaboration, both within the boundaries of an

  2. Gradient Learning Algorithms for Ontology Computing

    Science.gov (United States)

    Gao, Wei; Zhu, Linli

    2014-01-01

    The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting. PMID:25530752

  3. Gradient Learning Algorithms for Ontology Computing

    Directory of Open Access Journals (Sweden)

    Wei Gao

    2014-01-01

    Full Text Available The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting.

  4. History Matters: Incremental Ontology Reasoning Using Modules

    Science.gov (United States)

    Cuenca Grau, Bernardo; Halaschek-Wiener, Christian; Kazakov, Yevgeny

    The development of ontologies involves continuous but relatively small modifications. Existing ontology reasoners, however, do not take advantage of the similarities between different versions of an ontology. In this paper, we propose a technique for incremental reasoning—that is, reasoning that reuses information obtained from previous versions of an ontology—based on the notion of a module. Our technique does not depend on a particular reasoning calculus and thus can be used in combination with any reasoner. We have applied our results to incremental classification of OWL DL ontologies and found significant improvement over regular classification time on a set of real-world ontologies.

  5. The epistemology and ontology of human-computer interaction

    NARCIS (Netherlands)

    Brey, Philip A.E.

    2005-01-01

    This paper analyzes epistemological and ontological dimensions of Human-Computer Interaction (HCI) through an analysis of the functions of computer systems in relation to their users. It is argued that the primary relation between humans and computer systems has historically been epistemic:

  6. Utility and Limitations of Using Gene Expression Data to Identify Functional Associations.

    Directory of Open Access Journals (Sweden)

    Sahra Uygun

    2016-12-01

    Full Text Available Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.

  7. Integration of biological networks and gene expression data using Cytoscape

    DEFF Research Database (Denmark)

    Cline, M.S.; Smoot, M.; Cerami, E.

    2007-01-01

    of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules......Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context...... and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape....

  8. MicroRNA-124-3p expression and its prospective functional pathways in hepatocellular carcinoma: A quantitative polymerase chain reaction, gene expression omnibus and bioinformatics study.

    Science.gov (United States)

    He, Rong-Quan; Yang, Xia; Liang, Liang; Chen, Gang; Ma, Jie

    2018-04-01

    The present study aimed to explore the potential clinical significance of microRNA (miR)-124-3p expression in the hepatocarcinogenesis and development of hepatocellular carcinoma (HCC), as well as the potential target genes of functional HCC pathways. Reverse transcription-quantitative polymerase chain reaction was performed to evaluate the expression of miR-124-3p in 101 HCC and adjacent non-cancerous tissue samples. Additionally, the association between miR-124-3p expression and clinical parameters was also analyzed. Differentially