WorldWideScience

Sample records for gene ontology pathways

  1. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network.

    Science.gov (United States)

    Qin, Tingting; Matmati, Nabil; Tsoi, Lam C; Mohanty, Bidyut K; Gao, Nan; Tang, Jijun; Lawson, Andrew B; Hannun, Yusuf A; Zheng, W Jim

    2014-10-01

    To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  3. Gene Ontology

    Directory of Open Access Journals (Sweden)

    Gaston K. Mazandu

    2012-01-01

    Full Text Available The wide coverage and biological relevance of the Gene Ontology (GO, confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.

  4. Analysis of tumor suppressor genes based on gene ontology and the KEGG pathway.

    Science.gov (United States)

    Yang, Jing; Chen, Lei; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs) are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO) and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS), were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments.

  5. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology.

    Science.gov (United States)

    Clemente, José C; Satou, Kenji; Valiente, Gabriel

    2005-01-01

    There has been much interest in the structural comparison and alignment of metabolic pathways. Several techniques have been conceived to assess the similarity of metabolic pathways of different organisms. In this paper, we show that the combination of a new heuristic algorithm for the comparison of metabolic pathways together with any of three enzyme similarity measures (hierarchical, information content, and gene ontology) can be used to derive a metabolic pathway similarity measure that is suitable for reconstructing phylogenetic relationships from metabolic pathways. Experimental results on the Glycolysis pathway of 73 organisms representing the three domains of life show that our method outperforms previous techniques.

  6. A measure of semantic similarity between gene ontology terms based on semantic pathway covering

    Institute of Scientific and Technical Information of China (English)

    LI Rong; CAO Shunliang; LI Yuanyuan; TAN Hao; ZHU Yangyong; ZHONG Yang; LI Yixue

    2006-01-01

    Semantic similarity between Gene Ontology (GO) terms is critical in resolving semantic heterogeneousness when integrating heterogeneous biological databases. Traditionally, distance based and information content based measures are two major methods.In this paper, a new method based on semantic pathway covering is proposed and an algorithm, COMBINE algorithm, is presented,which considers information contents of two given nodes and those of all nodes included in the two nodes' pathways. Experiments show that COMBINE algorithm obtains the highest correlation index compared with those distance based and information content based algorithms.

  7. The Functional Genetics of Handedness and Language Lateralization: Insights from Gene Ontology, Pathway and Disease Association Analyses.

    Science.gov (United States)

    Schmitz, Judith; Lor, Stephanie; Klose, Rena; Güntürkün, Onur; Ocklenburg, Sebastian

    2017-01-01

    Handedness and language lateralization are partially determined by genetic influences. It has been estimated that at least 40 (and potentially more) possibly interacting genes may influence the ontogenesis of hemispheric asymmetries. Recently, it has been suggested that analyzing the genetics of hemispheric asymmetries on the level of gene ontology sets, rather than at the level of individual genes, might be more informative for understanding the underlying functional cascades. Here, we performed gene ontology, pathway and disease association analyses on genes that have previously been associated with handedness and language lateralization. Significant gene ontology sets for handedness were anatomical structure development, pattern specification (especially asymmetry formation) and biological regulation. Pathway analysis highlighted the importance of the TGF-beta signaling pathway for handedness ontogenesis. Significant gene ontology sets for language lateralization were responses to different stimuli, nervous system development, transport, signaling, and biological regulation. Despite the fact that some authors assume that handedness and language lateralization share a common ontogenetic basis, gene ontology sets barely overlap between phenotypes. Compared to genes involved in handedness, which mostly contribute to structural development, genes involved in language lateralization rather contribute to activity-dependent cognitive processes. Disease association analysis revealed associations of genes involved in handedness with diseases affecting the whole body, while genes involved in language lateralization were specifically engaged in mental and neurological diseases. These findings further support the idea that handedness and language lateralization are ontogenetically independent, complex phenotypes.

  8. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.

    Science.gov (United States)

    Xing, Zhihao; Chu, Chen; Chen, Lei; Kong, Xiangyin

    2016-11-01

    Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor

  9. Database for exchangeable gene trap clones: pathway and gene ontology analysis of exchangeable gene trap clone mouse lines.

    Science.gov (United States)

    Araki, Masatake; Nakahara, Mai; Muta, Mayumi; Itou, Miharu; Yanai, Chika; Yamazoe, Fumika; Miyake, Mikiko; Morita, Ayaka; Araki, Miyuki; Okamoto, Yoshiyuki; Nakagata, Naomi; Yoshinobu, Kumiko; Yamamura, Ken-ichi; Araki, Kimi

    2014-02-01

    Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes. © 2014 The Authors Development, Growth & Differentiation © 2014 Japanese Society of Developmental Biologists.

  10. The pathway ontology - updates and applications.

    Science.gov (United States)

    Petri, Victoria; Jayaraman, Pushkala; Tutaj, Marek; Hayman, G Thomas; Smith, Jennifer R; De Pons, Jeff; Laulederkind, Stanley Jf; Lowry, Timothy F; Nigam, Rajni; Wang, Shur-Jen; Shimoyama, Mary; Dwinell, Melinda R; Munzenmaier, Diane H; Worthey, Elizabeth A; Jacob, Howard J

    2014-02-05

    The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. The two released pipelines - the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD "Immune and Inflammatory Disease Portal" at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the 'infectious disease pathway' parent term category. The 'drug pathway' node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by over 75%. Ongoing development of

  11. The pathway ontology – updates and applications

    Science.gov (United States)

    2014-01-01

    Background The Pathway Ontology (PW) developed at the Rat Genome Database (RGD), covers all types of biological pathways, including altered and disease pathways and captures the relationships between them within the hierarchical structure of a directed acyclic graph. The ontology allows for the standardized annotation of rat, and of human and mouse genes to pathway terms. It also constitutes a vehicle for easy navigation between gene and ontology report pages, between reports and interactive pathway diagrams, between pathways directly connected within a diagram and between those that are globally related in pathway suites and suite networks. Surveys of the literature and the development of the Pathway and Disease Portals are important sources for the ongoing development of the ontology. User requests and mapping of pathways in other databases to terms in the ontology further contribute to increasing its content. Recently built automated pipelines use the mapped terms to make available the annotations generated by other groups. Results The two released pipelines – the Pathway Interaction Database (PID) Annotation Import Pipeline and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Annotation Import Pipeline, make available over 7,400 and 31,000 pathway gene annotations, respectively. Building the PID pipeline lead to the addition of new terms within the signaling node, also augmented by the release of the RGD “Immune and Inflammatory Disease Portal” at that time. Building the KEGG pipeline lead to a substantial increase in the number of disease pathway terms, such as those within the ‘infectious disease pathway’ parent term category. The ‘drug pathway’ node has also seen increases in the number of terms as well as a restructuring of the node. Literature surveys, disease portal deployments and user requests have contributed and continue to contribute additional new terms across the ontology. Since first presented, the content of PW has increased by

  12. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    Science.gov (United States)

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery.

  13. Identification of oral cancer related candidate genes by integrating protein-protein interactions, gene ontology, pathway analysis and immunohistochemistry.

    Science.gov (United States)

    Kumar, Ravindra; Samal, Sabindra K; Routray, Samapika; Dash, Rupesh; Dixit, Anshuman

    2017-05-30

    In the recent years, bioinformatics methods have been reported with a high degree of success for candidate gene identification. In this milieu, we have used an integrated bioinformatics approach assimilating information from gene ontologies (GO), protein-protein interaction (PPI) and network analysis to predict candidate genes related to oral squamous cell carcinoma (OSCC). A total of 40973 PPIs were considered for 4704 cancer-related genes to construct human cancer gene network (HCGN). The importance of each node was measured in HCGN by ten different centrality measures. We have shown that the top ranking genes are related to a significantly higher number of diseases as compared to other genes in HCGN. A total of 39 candidate oral cancer target genes were predicted by combining top ranked genes and the genes corresponding to significantly enriched oral cancer related GO terms. Initial verification using literature and available experimental data indicated that 29 genes were related with OSCC. A detailed pathway analysis led us to propose a role for the selected candidate genes in the invasion and metastasis in OSCC. We further validated our predictions using immunohistochemistry (IHC) and found that the gene FLNA was upregulated while the genes ARRB1 and HTT were downregulated in the OSCC tissue samples.

  14. An Ontology of Gene

    OpenAIRE

    Masuya, Hiroshi; Mizoguchi, Riichiro

    2012-01-01

    The concept of a gene was established in the era of classical genetics and is now essential for life science for elucidating the molecular basis of the coding of genetic information necessary to realize the body of an organism and its biological functions. However, an ontology fully representing multiple aspects of a gene is still not available. In this study, we dissected the biological and ontological definitions of bearers of genetic information, including genes and alleles. Based on this ...

  15. Extending the Interpretation of Gene Profiling Microarray Experiments to Pathway Analysis Through the Use of Gene Ontology Terms

    Science.gov (United States)

    Chatziioannou, Aristotelis; Moulos, Panagiotis

    Microarray technology allows the survey of gene expression at a global level by measuring mRNA abundance. However, the grand complexity characterizing a microarray experiment entails the development of computationally powerful tools apt for probing the biological problem studied. Here we propose a suite for flexible, adaptable to a wide range of possible needs of the biological end-user, data-driven interpretation of microarray experiments. The suite is implemented in MATLAB and is making use of two modules, able to perform all steps of typical microarray data analysis starting from data standardization and normalization up to statistical selection and pathway analysis utilizing Gene Ontology Term annotations for the species genomes interrogated, whereas due to its modular structure it is scalable thus enabling the incorporation or its seamless assembly with other existing tools.

  16. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution.

  17. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D.

    2017-01-01

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new ‘hierarchical view’ of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. PMID:27899595

  18. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome.

    Science.gov (United States)

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2016-01-04

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈ 7-10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate and link these data. Our group, for the first time, has compiled PCOS-related information available through scientific literature; cross-linked it with molecular, biochemical and clinical databases and presented it as a user-friendly, web-based online knowledgebase for the benefit of the scientific and clinical community. Manually curated information on associated genes, single nucleotide polymorphisms, diseases, gene ontology terms and pathways along with supporting reference literature has been collated and included in PCOSKB (http://pcoskb.bicnirrh.res.in).

  19. The Ontology of the Gene Ontology

    Science.gov (United States)

    Smith, Barry; Williams, Jennifer; Steffen, Schulze-Kremer

    2003-01-01

    The rapidly increasing wealth of genomic data has driven the development of tools to assist in the task of representing and processing information about genes, their products and their functions. One of the most important of these tools is the Gene Ontology (GO), which is being developed in tandem with work on a variety of bioinformatics databases. An examination of the structure of GO, however, reveals a number of problems, which we believe can be resolved by taking account of certain organizing principles drawn from philosophical ontology. We shall explore the results of applying such principles to GO with a view to improving GO’s consistency and coherence and thus its future applicability in the automated processing of biological data. PMID:14728245

  20. Gene ontology study of methyl jasmonate-treated and non-treated hairy roots of Panax ginseng to identify genes involved in secondary metabolic pathway.

    Science.gov (United States)

    Sathiyamoorthy, S; In, J G; Gayathri, S; Kim, Y Ju; Yang, D Ch

    2010-07-01

    The roots of Panax ginseng C.A. Meyer, known as Korean ginseng have been a valuable and important folk medicine in East Asian countries. It mainly used to maintain the homeostasis of the human body, with the presence ofginsenosides and non-saponin compounds like phenol compounds, acidic polysaccharides and polyethylene compounds. Functional genomics aid to annotate based on gene ontology. In this study, we focused on the genes involving in secondary metabolic pathways and to visualize temporal changes of gene expression in ginseng hairy roots with methyl ester methyl jasmonate (MeJA) along with non-treated hairy roots. A 5.774 EST clones were clustered and assembled as 501 contigs and 2.955 singletons. Annotations categorized with molecular functions, biological processes, cellular compounds of gene ontological terms and biochemical functions, enzyme commission to sequences were assigned to metabolic pathways of Kyoto Encyclopedia of Genes and Genomes database. Comparatively, EST sequences are assigned to cellular process, metabolic process, biotic and abiotic stress stimuli, developmental and biological regulations and transports are up-regulated 2-3 fold in MeJA treated hairy roots. 46 different sub groups of enzymes found in the MeJA treated plants. These annotated ESTs represents a significant proportion of the P. ginseng and provides molecular resource for developmental of microarrays for gene expression studies concerning development, metabolism and reproduction.

  1. Gene Ontology Consortium: going forward.

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Gene Ontology Consortium: going forward

    Science.gov (United States)

    2015-01-01

    The Gene Ontology (GO; http://www.geneontology.org) is a community-based bioinformatics resource that supplies information about gene product function using ontologies to represent biological knowledge. Here we describe improvements and expansions to several branches of the ontology, as well as updates that have allowed us to more efficiently disseminate the GO and capture feedback from the research community. The Gene Ontology Consortium (GOC) has expanded areas of the ontology such as cilia-related terms, cell-cycle terms and multicellular organism processes. We have also implemented new tools for generating ontology terms based on a set of logical rules making use of templates, and we have made efforts to increase our use of logical definitions. The GOC has a new and improved web site summarizing new developments and documentation, serving as a portal to GO data. Users can perform GO enrichment analysis, and search the GO for terms, annotations to gene products, and associated metadata across multiple species using the all-new AmiGO 2 browser. We encourage and welcome the input of the research community in all biological areas in our continued effort to improve the Gene Ontology. PMID:25428369

  3. How the gene ontology evolves.

    Science.gov (United States)

    Leonelli, Sabina; Diehl, Alexander D; Christie, Karen R; Harris, Midori A; Lomax, Jane

    2011-08-05

    Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource. Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms. This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.

  4. Improvements to cardiovascular gene ontology.

    Science.gov (United States)

    Lovering, Ruth C; Dimmer, Emily C; Talmud, Philippa J

    2009-07-01

    Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of genes and gene products in any organism. Although one might initially wonder what relevance a 'controlled vocabulary' might have for cardiovascular science, such a resource is proving highly useful for researchers investigating complex cardiovascular disease phenotypes as well as those interpreting results from high-throughput methodologies. GO enables the current functional knowledge of individual genes to be used to annotate genomic or proteomic datasets. In this way, the GO data provides a very effective way of linking biological knowledge with the analysis of the large datasets of post-genomics research. Consequently, users of high-throughput methodologies such as expression arrays or proteomics will be the main beneficiaries of such annotation sets. However, as GO annotations increase in quality and quantity, groups using small-scale approaches will gradually begin to benefit too. For example, genome wide association scans for coronary heart disease are identifying novel genes, with previously unknown connections to cardiovascular processes, and the comprehensive annotation of these novel genes might provide clues to their cardiovascular link. At least 4000 genes, to date, have been implicated in cardiovascular processes and an initiative is underway to focus on annotating these genes for the benefit of the cardiovascular community. In this article we review the current uses of Gene Ontology annotation to highlight why Gene Ontology should be of interest to all those involved in cardiovascular research.

  5. Gene Ontology annotations and resources.

    Science.gov (United States)

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

  6. Witnessing stressful events induces glutamatergic synapse pathway alterations and gene set enrichment of positive EPSP regulation within the VTA of adult mice: An ontology based approach

    Science.gov (United States)

    Brewer, Jacob S.

    It is well known that exposure to severe stress increases the risk for developing mood disorders. Currently, the neurobiological and genetic mechanisms underlying the functional effects of psychological stress are poorly understood. Presenting a major obstacle to the study of psychological stress is the inability of current animal models of stress to distinguish between physical and psychological stressors. A novel paradigm recently developed by Warren et al., is able to tease apart the effects of physical and psychological stress in adult mice by allowing these mice to "witness," the social defeat of another mouse thus removing confounding variables associated with physical stressors. Using this 'witness' model of stress and RNA-Seq technology, the current study aims to study the genetic effects of psychological stress. After, witnessing the social defeat of another mouse, VTA tissue was extracted, sequenced, and analyzed for differential expression. Since genes often work together in complex networks, a pathway and gene ontology (GO) analysis was performed using data from the differential expression analysis. The pathway and GO analyzes revealed a perturbation of the glutamatergic synapse pathway and an enrichment of positive excitatory post-synaptic potential regulation. This is consistent with the excitatory synapse theory of depression. Together these findings demonstrate a dysregulation of the mesolimbic reward pathway at the gene level as a result of psychological stress potentially contributing to depressive like behaviors.

  7. The Gene Ontology (GO) project in 2006

    National Research Council Canada - National Science Library

    2006-01-01

    The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net...

  8. The Gene Ontology project in 2008

    National Research Council Canada - National Science Library

    The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org...

  9. Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL.

    Science.gov (United States)

    Jupp, Simon; Stevens, Robert; Hoehndorf, Robert

    2012-04-24

    Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of

  10. Ontology modeling for generation of clinical pathways

    Directory of Open Access Journals (Sweden)

    Jasmine Tehrani

    2012-12-01

    Full Text Available Purpose: Increasing costs of health care, fuelled by demand for high quality, cost-effective healthcare has drove hospitals to streamline their patient care delivery systems. One such systematic approach is the adaptation of Clinical Pathways (CP as a tool to increase the quality of healthcare delivery. However, most organizations still rely on are paper-based pathway guidelines or specifications, which have limitations in process management and as a result can influence patient safety outcomes. In this paper, we present a method for generating clinical pathways based on organizational semiotics by capturing knowledge from syntactic, semantic and pragmatic to social level. Design/methodology/approach: The proposed modeling approach to generation of CPs adopts organizational semiotics and enables the generation of semantically rich representation of CP knowledge. Semantic Analysis Method (SAM is applied to explicitly represent the semantics of the concepts, their relationships and patterns of behavior in terms of an ontology chart. Norm Analysis Method (NAM is adopted to identify and formally specify patterns of behavior and rules that govern the actions identified on the ontology chart. Information collected during semantic and norm analysis is integrated to guide the generation of CPs using best practice represented in BPMN thus enabling the automation of CP. Findings: This research confirms the necessity of taking into consideration social aspects in designing information systems and automating CP. The complexity of healthcare processes can be best tackled by analyzing stakeholders, which we treat as social agents, their goals and patterns of action within the agent network. Originality/value: The current modeling methods describe CPs from a structural aspect comprising activities, properties and interrelationships. However, these methods lack a mechanism to describe possible patterns of human behavior and the conditions under which the

  11. Practical Applications of the Gene Ontology Resource

    Science.gov (United States)

    Huntley, Rachael P.; Dimmer, Emily C.; Apweiler, Rolf

    The Gene Ontology (GO) is a controlled vocabulary that represents knowledge about the functional attributes of gene products in a structured manner and can be used in both computational and human analyses. This vocabulary has been used by diverse curation groups to associate functional information to individual gene products in the form of annotations. GO has proven an invaluable resource for evaluating and interpreting the biological significance of large data sets, enabling researchers to create hypotheses to direct their future research. This chapter provides an overview of the Gene Ontology, how it can be used, and tips on getting the most out of GO analyses.

  12. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome

    OpenAIRE

    Joseph, Shaini; Barai, Ram Shankar; Bhujbalrao, Rasika; Idicula-Thomas, Susan

    2015-01-01

    Polycystic ovary syndrome (PCOS) is one of the major causes of female subfertility worldwide and ≈7–10% of women in reproductive age are affected by it. The affected individuals exhibit varying types and levels of comorbid conditions, along with the classical PCOS symptoms. Extensive studies on PCOS across diverse ethnic populations have resulted in a plethora of information on dysregulated genes, gene polymorphisms and diseases linked to PCOS. However, efforts have not been taken to collate ...

  13. Gene function prediction based on the Gene Ontology hierarchical structure.

    Science.gov (United States)

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  14. Gene ontology and KEGG enrichment analyses of genes related to age-related macular degeneration.

    Science.gov (United States)

    Zhang, Jian; Xing, ZhiHao; Ma, Mingming; Wang, Ning; Cai, Yu-Dong; Chen, Lei; Xu, Xun

    2014-01-01

    Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  15. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community. PMID:24093723

  16. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  17. Correlating Expression Data with Gene Function Using Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    LIU,Qi; DENG,Yong; WANG,Chuan; SHI,Tie-Liu; LI,Yi-Xue

    2006-01-01

    Clustering is perhaps one of the most widely used tools for microarray data analysis. Proposed roles for genes of unknown function are inferred from clusters of genes similarity expressed across many biological conditions.However, whether function annotation by similarity metrics is reliable or not and to what extent the similarity in gene expression patterns is useful for annotation of gene functions, has not been evaluated. This paper made a comprehensive research on the correlation between the similarity of expression data and of gene functions using Gene Ontology. It has been found that although the similarity in expression patterns and the similarity in gene functions are significantly dependent on each other, this association is rather weak. In addition, among the three categories of Gene Ontology, the similarity of expression data is more useful for cellular component annotation than for biological process and molecular function. The results presented are interesting for the gene functions prediction research area.

  18. Representing Kidney Development Using the Gene Ontology

    Science.gov (United States)

    Alam-Faruque, Yasmin; Hill, David P.; Dimmer, Emily C.; Harris, Midori A.; Foulger, Rebecca E.; Tweedie, Susan; Attrill, Helen; Howe, Douglas G.; Thomas, Stephen Randall; Davidson, Duncan; Woolf, Adrian S.; Blake, Judith A.; Mungall, Christopher J.; O’Donovan, Claire; Apweiler, Rolf; Huntley, Rachael P.

    2014-01-01

    Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease. PMID:24941002

  19. Gene Ontology-Based Analysis of Zebrafish Omics Data Using the Web Tool Comparative Gene Ontology.

    Science.gov (United States)

    Ebrahimie, Esmaeil; Fruzangohar, Mario; Moussavi Nik, Seyyed Hani; Newman, Morgan

    2017-09-05

    Gene Ontology (GO) analysis is a powerful tool in systems biology, which uses a defined nomenclature to annotate genes/proteins within three categories: "Molecular Function," "Biological Process," and "Cellular Component." GO analysis can assist in revealing functional mechanisms underlying observed patterns in transcriptomic, genomic, and proteomic data. The already extensive and increasing use of zebrafish for modeling genetic and other diseases highlights the need to develop a GO analytical tool for this organism. The web tool Comparative GO was originally developed for GO analysis of bacterial data in 2013 ( www.comparativego.com ). We have now upgraded and elaborated this web tool for analysis of zebrafish genetic data using GOs and annotations from the Gene Ontology Consortium.

  20. The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems

    Directory of Open Access Journals (Sweden)

    Rubin Eitan

    2010-09-01

    Full Text Available Abstract Background The Gene Ontology (GO is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis. Here, we propose a new approach to editing the gene ontology, clipping, which is the editing of GO according to biological relevance. Creation of a GO subset by clipping is achieved by removing terms (from all hierarchal levels if they are not functionally relevant to a given domain of interest. Terms that are located in levels higher to relevant terms are kept, thus, biologically irrelevant terms are only removed if they are not parental to terms that are relevant. Results Using this approach, we have created the Neural-Immune Gene Ontology (NIGO subset of GO directed for neurological and immunological systems. We tested the performance of NIGO in extracting knowledge from microarray experiments by conducting functional analysis and comparing the results to those obtained using the full GO and a generic GO slim. NIGO not only improved the statistical scores given to relevant terms, but was also able to retrieve functionally relevant terms that did not pass statistical cutoffs when using the full GO or the slim subset. Conclusions Our results validate the pipeline used to generate NIGO, suggesting it is indeed enriched with terms that are specific to the neural/immune domains. The results suggest that NIGO can enhance the analysis of microarray experiments involving neural and/or immune related systems. They also directly demonstrate the potential such a domain-specific GO has in generating meaningful hypotheses.

  1. Cross-Ontology multi-level association rule mining in the Gene Ontology.

    Directory of Open Access Journals (Sweden)

    Prashanti Manda

    Full Text Available The Gene Ontology (GO has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

  2. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  3. The Representation of Heart Development in the Gene Ontology

    Science.gov (United States)

    Khodiyar, Varsha K.; Hill, David P.; Howe, Doug; Berardini, Tanya Z.; Tweedie, Susan; Talmud, Philippa J.; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C.

    2012-01-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area. PMID:21419760

  4. The representation of heart development in the gene ontology.

    Science.gov (United States)

    Khodiyar, Varsha K; Hill, David P; Howe, Doug; Berardini, Tanya Z; Tweedie, Susan; Talmud, Philippa J; Breckenridge, Ross; Bhattarcharya, Shoumo; Riley, Paul; Scambler, Peter; Lovering, Ruth C

    2011-06-01

    An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.

  5. A task-based approach for Gene Ontology evaluation.

    Science.gov (United States)

    Clarke, Erik L; Loguercio, Salvatore; Good, Benjamin M; Su, Andrew I

    2013-04-15

    The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.

  6. Gene Ontology and KEGG Enrichment Analyses of Genes Related to Age-Related Macular Degeneration

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.

  7. Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Tratz, Stephen C.; Gregory, Michelle L.

    2006-06-08

    With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  8. A method for increasing expressivity of Gene Ontology annotations using a compositional approach.

    Science.gov (United States)

    Huntley, Rachael P; Harris, Midori A; Alam-Faruque, Yasmin; Blake, Judith A; Carbon, Seth; Dietze, Heiko; Dimmer, Emily C; Foulger, Rebecca E; Hill, David P; Khodiyar, Varsha K; Lock, Antonia; Lomax, Jane; Lovering, Ruth C; Mutowo-Meullenet, Prudence; Sawford, Tony; Van Auken, Kimberly; Wood, Valerie; Mungall, Christopher J

    2014-05-21

    The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.

  9. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  10. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science.

    Science.gov (United States)

    Klie, Sebastian; Nikoloski, Zoran

    2012-01-01

    Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  11. The choice between MapMan and Gene Ontology for automated gene function prediction in plant science

    Directory of Open Access Journals (Sweden)

    Sebastian eKlie

    2012-06-01

    Full Text Available Since the introduction of the Gene Ontology (GO, the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of coexpression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  12. Detecting Inconsistencies in the Gene Ontology Using Ontology Databases with Not-gadgets

    Science.gov (United States)

    Lependu, Paea; Dou, Dejing; Howe, Doug

    We present ontology databases with not-gadgets, a method for detecting inconsistencies in an ontology with large numbers of annotated instances by using triggers and exclusion dependencies in a unique way. What makes this work relevant is the use of the database itself, rather than an external reasoner, to detect logical inconsistencies given large numbers of annotated instances. What distinguishes this work is the use of event-driven triggers together with the introduction of explicit negations. We applied this approach toward the serotonin example, an open problem in biomedical informatics which aims to use annotations to help identify inconsistencies in the Gene Ontology. We discovered 75 inconsistencies that have important implications in biology, which include: (1) methods for refining transfer rules used for inferring electronic annotations, and (2) highlighting possible biological differences across species worth investigating.

  13. Aligning ontologies and integrating textual evidence for pathway analysis of microarray data

    Energy Technology Data Exchange (ETDEWEB)

    Gopalan, Banu; Posse, Christian; Sanfilippo, Antonio P.; Stenzel-Poore, Mary; Stevens, S.L.; Castano, Jose; Beagley, Nathaniel; Riensche, Roderick M.; Baddeley, Bob; Simon, R.P.; Pustejovsky, James

    2006-10-08

    Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

  14. Integrating Gene Ontology and Blast to predict gene functions

    Institute of Scientific and Technical Information of China (English)

    WANG Cheng-gang; MO Zhi-hong

    2007-01-01

    A GoBlast system was built to predict gene function by integrating Blast search and Gene Ontology (GO) annotations together. The operation system was based on Debian Linux 3.1, with Apache as the web server and Mysql database as the data storage system. FASTA files with GO annotations were taken as the sequence source for blast alignment, which were formatted by wu-formatdb program. The GoBlast system includes three Bioperl modules in Perl: a data input module, a data process module and a data output module. A GoBlast query starts with an amino acid or nucleotide sequence. It ends with an output in an html page, presenting high scoring gene products which are of a high homology to the queried sequence and listing associated GO terms beside respective gene poducts. A simple click on a GO term leads to the detailed explanation of the specific gene function. This avails gene function prediction by Blast. GoBlast can be a very useful tool for functional genome research and is available for free at http://bioq.org/goblast.

  15. Guidelines for the functional annotation of microRNAs using the Gene Ontology.

    Science.gov (United States)

    Huntley, Rachael P; Sitnikov, Dmitry; Orlic-Milacic, Marija; Balakrishnan, Rama; D'Eustachio, Peter; Gillespie, Marc E; Howe, Doug; Kalea, Anastasia Z; Maegdefessel, Lars; Osumi-Sutherland, David; Petri, Victoria; Smith, Jennifer R; Van Auken, Kimberly; Wood, Valerie; Zampetaki, Anna; Mayr, Manuel; Lovering, Ruth C

    2016-05-01

    MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual). © 2016 Huntley et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. Measuring the evolution of ontology complexity: the gene ontology case study.

    Science.gov (United States)

    Dameron, Olivier; Bettembourg, Charles; Le Meur, Nolwenn

    2013-01-01

    Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure. The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred. The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.

  17. The Vision and Challenges of the Gene Ontology.

    Science.gov (United States)

    Lewis, Suzanna E

    2017-01-01

    The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.

  18. Fast Gene Ontology based clustering for microarray experiments

    OpenAIRE

    Ovaska Kristian; Laakso Marko; Hautaniemi Sampsa

    2008-01-01

    Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fa...

  19. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  20. Automatic, context-specific generation of Gene Ontology slims

    Directory of Open Access Journals (Sweden)

    Sehgal Muhammad

    2010-10-01

    Full Text Available Abstract Background The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual. Results Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power. Conclusions Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.

  1. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

    Science.gov (United States)

    2013-01-01

    Background The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. Conclusions The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl. PMID:23895341

  2. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology.

    Science.gov (United States)

    Hill, David P; Adams, Nico; Bada, Mike; Batchelor, Colin; Berardini, Tanya Z; Dietze, Heiko; Drabkin, Harold J; Ennis, Marcus; Foulger, Rebecca E; Harris, Midori A; Hastings, Janna; Kale, Namrata S; de Matos, Paula; Mungall, Christopher J; Owen, Gareth; Roncaglia, Paola; Steinbeck, Christoph; Turner, Steve; Lomax, Jane

    2013-07-29

    The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from http://purl.obolibrary.org/obo/go/extensions/go-plus.owl.

  3. Quality assurance of the gene ontology using abstraction networks.

    Science.gov (United States)

    Ochs, Christopher; Perl, Yehoshua; Halper, Michael; Geller, James; Lomax, Jane

    2016-06-01

    The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.

  4. A robust data-driven approach for gene ontology annotation

    OpenAIRE

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For th...

  5. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

    Science.gov (United States)

    Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

    2016-03-11

    Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.

  6. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    Science.gov (United States)

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  7. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2013-01-01

    Full Text Available One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR method followed by incremental feature selection (IFS. 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  8. Expansion of the Gene Ontology knowledgebase and resources

    Science.gov (United States)

    2017-01-01

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. PMID:27899567

  9. Expansion of the Gene Ontology knowledgebase and resources.

    Science.gov (United States)

    2017-01-04

    The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Representing Ontogeny Through Ontology: A Developmental Biologist’s Guide to The Gene Ontology

    Science.gov (United States)

    Hill, David P.; Berardini, Tanya Z.; Howe, Douglas G.; Van Auken, Kimberly M.

    2010-01-01

    Developmental biology, like many other areas of biology, has undergone a dramatic shift in the perspective from which developmental processes are viewed. Instead of focusing on the actions of a handful of genes or functional RNAs, we now consider the interactions of large functional gene networks and study how these complex systems orchestrate the unfolding of an organism, from gametes to adult. Developmental biologists are beginning to realize that understanding ontogeny on this scale requires the utilization of computational methods to capture, store and represent the knowledge we have about the underlying processes. Here we review the use of the Gene Ontology (GO) to study developmental biology. We describe the organization and structure of the GO and illustrate some of the ways we use it to capture the current understanding of many common developmental processes. We also discuss ways in which gene product annotations using the GO have been used to ask and answer developmental questions in a variety of model developmental systems. We provide suggestions as to how the GO might be used in more powerful ways to address questions about development. Our goal is to provide developmental biologists with enough background about the GO that they can begin to think about how they might use the ontology efficiently and in the most powerful ways possible. PMID:19921742

  11. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    OpenAIRE

    Tsatsoulis Costas; Amthauer Heather A

    2010-01-01

    Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces ce...

  12. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  13. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL

    Directory of Open Access Journals (Sweden)

    Aranguren Mikel

    2007-02-01

    Full Text Available Abstract The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.

  14. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  15. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning.

    Science.gov (United States)

    Amthauer, Heather A; Tsatsoulis, Costas

    2010-05-28

    There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  16. Brief isoflurane anaesthesia affects differential gene expression, gene ontology and gene networks in rat brain.

    Science.gov (United States)

    Lowes, Damon A; Galley, Helen F; Moura, Alessandro P S; Webster, Nigel R

    2017-01-15

    Much is still unknown about the mechanisms of effects of even brief anaesthesia on the brain and previous studies have simply compared differential expression profiles with and without anaesthesia. We hypothesised that network analysis, in addition to the traditional differential gene expression and ontology analysis, would enable identification of the effects of anaesthesia on interactions between genes. Rats (n=10 per group) were randomised to anaesthesia with isoflurane in oxygen or oxygen only for 15min, and 6h later brains were removed. Differential gene expression and gene ontology analysis of microarray data was performed. Standard clustering techniques and principal component analysis with Bayesian rules were used along with social network analysis methods, to quantitatively model and describe the gene networks. Anaesthesia had marked effects on genes in the brain with differential regulation of 416 probe sets by at least 2 fold. Gene ontology analysis showed 23 genes were functionally related to the anaesthesia and of these, 12 were involved with neurotransmitter release, transport and secretion. Gene network analysis revealed much greater connectivity in genes from brains from anaesthetised rats compared to controls. Other importance measures were also altered after anaesthesia; median [range] closeness centrality (shortest path) was lower in anaesthetized animals (0.07 [0-0.30]) than controls (0.39 [0.30-0.53], pgenes after anaesthesia and suggests future targets for investigation. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. GOseek: a gene ontology search engine using enhanced keywords.

    Science.gov (United States)

    Taha, Kamal

    2013-01-01

    We propose in this paper a biological search engine called GOseek, which overcomes the limitation of current gene similarity tools. Given a set of genes, GOseek returns the most significant genes that are semantically related to the given genes. These returned genes are usually annotated to one of the Lowest Common Ancestors (LCA) of the Gene Ontology (GO) terms annotating the given genes. Most genes have several annotation GO terms. Therefore, there may be more than one LCA for the GO terms annotating the given genes. The LCA annotating the genes that are most semantically related to the given gene is the one that receives the most aggregate semantic contribution from the GO terms annotating the given genes. To identify this LCA, GOseek quantifies the contribution of the GO terms annotating the given genes to the semantics of their LCAs. That is, it encodes the semantic contribution into a numeric format. GOseek uses microarray experiment data to rank result genes based on their significance. We evaluated GOseek experimentally and compared it with a comparable gene prediction tool. Results showed marked improvement over the tool.

  18. [Key effect genes responding to nerve injury identified by gene ontology and computer pattern recognition].

    Science.gov (United States)

    Pan, Qian; Peng, Jin; Zhou, Xue; Yang, Hao; Zhang, Wei

    2012-07-01

    In order to screen out important genes from large gene data of gene microarray after nerve injury, we combine gene ontology (GO) method and computer pattern recognition technology to find key genes responding to nerve injury, and then verify one of these screened-out genes. Data mining and gene ontology analysis of gene chip data GSE26350 was carried out through MATLAB software. Cd44 was selected from screened-out key gene molecular spectrum by comparing genes' different GO terms and positions on score map of principal component. Function interferences were employed to influence the normal binding of Cd44 and one of its ligands, chondroitin sulfate C (CSC), to observe neurite extension. Gene ontology analysis showed that the first genes on score map (marked by red *) mainly distributed in molecular transducer activity, receptor activity, protein binding et al molecular function GO terms. Cd44 is one of six effector protein genes, and attracted us with its function diversity. After adding different reagents into the medium to interfere the normal binding of CSC and Cd44, varying-degree remissions of CSC's inhibition on neurite extension were observed. CSC can inhibit neurite extension through binding Cd44 on the neuron membrane. This verifies that important genes in given physiological processes can be identified by gene ontology analysis of gene chip data.

  19. Bi-directional semantic similarity for gene ontology to optimize biological and clinical analyses.

    Science.gov (United States)

    Bien, Sang Jay; Park, Chan Hee; Shim, Hae Jin; Yang, Woongcheol; Kim, Jihun; Kim, Ju Han

    2012-01-01

    Semantic similarity analysis facilitates automated semantic explanations of biological and clinical data annotated by biomedical ontologies. Gene ontology (GO) has become one of the most important biomedical ontologies with a set of controlled vocabularies, providing rich semantic annotations for genes and molecular phenotypes for diseases. Current methods for measuring GO semantic similarities are limited to considering only the ancestor terms while neglecting the descendants. One can find many GO term pairs whose ancestors are identical but whose descendants are very different and vice versa. Moreover, the lower parts of GO trees are full of terms with more specific semantics. This study proposed a method of measuring semantic similarities between GO terms using the entire GO tree structure, including both the upper (ancestral) and the lower (descendant) parts. Comprehensive comparison studies were performed with well-known information content-based and graph structure-based semantic similarity measures with protein sequence similarities, gene expression-profile correlations, protein-protein interactions, and biological pathway analyses. The proposed bidirectional measure of semantic similarity outperformed other graph-based and information content-based methods.

  20. Gene ontology based transfer learning for protein subcellular localization

    Directory of Open Access Journals (Sweden)

    Zhou Shuigeng

    2011-02-01

    Full Text Available Abstract Background Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. Results In this paper, we propose a Gene Ontology Based Transfer Learning Model (GO-TLM for large-scale protein subcellular localization. The model transfers the signature-based homologous GO terms to the target proteins, and further constructs a reliable learning system to reduce the adverse affect of the potential false GO terms that are resulted from evolutionary divergence. We derive three GO kernels from the three aspects of gene ontology to measure the GO similarity of two proteins, and derive two other spectrum kernels to measure the similarity of two protein sequences. We use simple non-parametric cross validation to explicitly weigh the discriminative abilities of the five kernels, such that the time & space computational complexities are greatly reduced when compared to the complicated semi-definite programming and semi-indefinite linear programming. The five kernels are then linearly merged into one single kernel for

  1. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  2. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...... can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties...

  3. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  5. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  6. Correlating information contents of gene ontology terms to infer semantic similarity of gene products.

    Science.gov (United States)

    Gan, Mingxin

    2014-01-01

    Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  7. Ontological Discovery Environment: a system for integrating gene-phenotype associations.

    Science.gov (United States)

    Baker, Erich J; Jay, Jeremy J; Philip, Vivek M; Zhang, Yun; Li, Zuopan; Kirova, Roumyana; Langston, Michael A; Chesler, Elissa J

    2009-12-01

    The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental

  8. Identification of genes involved in radioresistance of nasopharyngeal carcinoma by integrating gene ontology and protein-protein interaction networks.

    Science.gov (United States)

    Guo, Ya; Zhu, Xiao-Dong; Qu, Song; Li, Ling; Su, Fang; Li, Ye; Huang, Shi-Ting; Li, Dan-Rong

    2012-01-01

    Radioresistance remains one of the important factors in relapse and metastasis of nasopharyngeal carcinoma. Thus, it is imperative to identify genes involved in radioresistance and explore the underlying biological processes in the development of radioresistance. In this study, we used cDNA microarrays to select differential genes between radioresistant CNE-2R and parental CNE-2 cell lines. One hundred and eighty-three significantly differentially expressed genes (pgenes were upregulated and 45 genes were downregulated in CNE-2R. We further employed publicly available bioinformatics related software, such as GOEAST and STRING to examine the relationship among differentially expressed genes. The results show that these genes were involved in type I interferon-mediated signaling pathway biological processes; the nodes tended to have high connectivity with the EGFR pathway, IFN-related pathways, NF-κB. The node STAT1 has high connectivity with other nodes in the protein-protein interaction (PPI) networks. Finally, the reliability of microarray data was validated for selected genes by semi-quantitative RT-PCR and Western blotting. The results were consistent with the microarray data. Our study suggests that microarrays combined with gene ontology and protein interaction networks have great value in the identification of genes of radioresistance in nasopharyngeal carcinoma; genes involved in several biological processes and protein interaction networks may be relevant to NPC radioresistance; in particular, the verified genes CCL5, STAT1-α, STAT2 and GSTP1 may become potential biomarkers for predicting NPC response to radiotherapy.

  9. Gene-based and semantic structure of the Gene Ontology as a complex network

    Science.gov (United States)

    Coronnello, Claudia; Tumminello, Michele; Miccichè, Salvatore

    2016-09-01

    The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The Gene Ontology (GO) is constantly evolving over time. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. Here we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium. Moreover, the GO is a natural example of bipartite network of terms and genes. Here we are interested in studying the properties of the projected network of terms, i.e. a gene-based weighted network of GO terms, in which a link between any two terms is set if at least one gene is annotated in both terms. One aim of the present paper is to compare the structural properties of the semantic and the gene-based network. The relative importance of terms is very similar in the two networks, but the community structure changes. We show that in some cases GO terms that appear to be distinct from a semantic point of view are instead connected, and appear in the same community when considering their gene content. The identification of such gene-based communities of terms might therefore be the basis of a simple protocol aiming at improving the semantic structure of GO. Information about terms that share large gene content might also be important from a biomedical point of view, as it might reveal how genes over-expressed in a certain term also affect other biological processes, molecular functions and cellular components not directly linked according to GO semantics.

  10. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  11. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  12. Identifying redundant and missing relations in the gene ontology.

    Science.gov (United States)

    Mougin, Fleur

    2015-01-01

    Significant efforts have been undertaken for providing the Gene Ontology (GO) in a computable format as well as for enriching it with logical definitions. Automated approaches can thus be applied to GO for assisting its maintenance and for checking its internal coherence. However, inconsistencies may still remain within GO. In this frame, the objective of this work was to audit GO relationships. First, reasoning over relationships was exploited for detecting redundant relations existing between GO concepts. Missing necessary and sufficient conditions were then identified based on the compositional structure of the preferred names of GO concepts. More than one thousand redundant relations and 500 missing necessary and sufficient conditions were found. The proposed approach was thus successful for detecting inconsistencies within GO relations. The application of lexical approaches as well as the exploitation of synonyms and textual definitions could be useful for identifying additional necessary and sufficient conditions. Multiple necessary and sufficient conditions for a given GO concept may be indicative of inconsistencies.

  13. Codon bias and gene ontology in holometabolous and hemimetabolous insects.

    Science.gov (United States)

    Carlini, David B; Makowski, Matthew

    2015-12-01

    The relationship between preferred codon use (PCU), developmental mode, and gene ontology (GO) was investigated in a sample of nine insect species with sequenced genomes. These species were selected to represent two distinct modes of insect development, holometabolism and hemimetabolism, with an aim toward determining whether the differences in developmental timing concomitant with developmental mode would be mirrored by differences in PCU in their developmental genes. We hypothesized that the developmental genes of holometabolous insects should be under greater selective pressure for efficient translation, manifest as increased PCU, than those of hemimetabolous insects because holometabolism requires abundant protein expression over shorter time intervals than hemimetabolism, where proteins are required more uniformly in time. Preferred codon sets were defined for each species, from which the frequency of PCU for each gene was obtained. Although there were substantial differences in the genomic base composition of holometabolous and hemimetabolous insects, both groups exhibited a general preference for GC-ending codons, with the former group having higher PCU averaged across all genes. For each species, the biological process GO term for each gene was assigned that of its Drosophila homolog(s), and PCU was calculated for each GO term category. The top two GO term categories for PCU enrichment in the holometabolous insects were anatomical structure development and cell differentiation. The increased PCU in the developmental genes of holometabolous insects may reflect a general strategy to maximize the protein production of genes expressed in bursts over short time periods, e.g., heat shock proteins. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 686-698, 2015. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  14. Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.

    Science.gov (United States)

    Karadeniz, İlknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan

    2015-01-01

    Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our

  15. Integration of the Gene Ontology into an object-oriented architecture

    Directory of Open Access Journals (Sweden)

    Zheng W Jim

    2005-05-01

    Full Text Available Abstract Background To standardize gene product descriptions, a formal vocabulary defined as the Gene Ontology (GO has been developed. GO terms have been categorized into biological processes, molecular functions, and cellular components. However, there is no single representation that integrates all the terms into one cohesive model. Furthermore, GO definitions have little information explaining the underlying architecture that forms these terms, such as the dynamic and static events occurring in a process. In contrast, object-oriented models have been developed to show dynamic and static events. A portion of the TGF-beta signaling pathway, which is involved in numerous cellular events including cancer, differentiation and development, was used to demonstrate the feasibility of integrating the Gene Ontology into an object-oriented model. Results Using object-oriented models we have captured the static and dynamic events that occur during a representative GO process, "transforming growth factor-beta (TGF-beta receptor complex assembly" (GO:0007181. Conclusion We demonstrate that the utility of GO terms can be enhanced by object-oriented technology, and that the GO terms can be integrated into an object-oriented model by serving as a basis for the generation of object functions and attributes.

  16. goSTAG: gene ontology subtrees to tag and annotate genes within a set.

    Science.gov (United States)

    Bennett, Brian D; Bushel, Pierre R

    2017-01-01

    Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. goSTAG converts gene lists from genomic analyses into biological themes

  17. A new gene ontology-based measure for the functional similarity of gene products

    Institute of Scientific and Technical Information of China (English)

    QI Guo-long; QIAN Shi-yu; FANG Ji-qian

    2013-01-01

    Background Although biomedical ontologies have standardized the representation of gene products across species and databases,a method for determining the functional similarities of gene products has not yet been developed.Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph.Our measure was compared with Resnik's measure in two applications,which were based on the association of the measure used with the gene co-expression and the proteinprotein interactions.Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions,and our measure performed the best overall.Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.

  18. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data.

    Science.gov (United States)

    Lewin, Alex; Grieve, Ian C

    2006-10-03

    Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  19. Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Directory of Open Access Journals (Sweden)

    Grieve Ian C

    2006-10-01

    Full Text Available Abstract Background Gene Ontology (GO terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. Results We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. Conclusion Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data.

  20. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    Science.gov (United States)

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  1. OAHG: an integrated resource for annotating human genes with multi-level ontologies

    Science.gov (United States)

    Cheng, Liang; Sun, Jie; Xu, Wanying; Dong, Lixiang; Hu, Yang; Zhou, Meng

    2016-01-01

    OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e–16). PMID:27703231

  2. Semantic particularity measure for functional characterization of gene sets using gene ontology.

    Science.gov (United States)

    Bettembourg, Charles; Diot, Christian; Dameron, Olivier

    2014-01-01

    Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.

  3. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  4. Combining Hierarchical and Associative Gene Ontology Relations with Textual Evidence in Estimating Gene and Gene Product Similarity

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Riensche, Roderick M.; Beagley, Nathaniel; Baddeley, Bob L.; Tratz, Stephen C.; Gregory, Michelle L.

    2007-03-01

    Gene and gene product similarity is a fundamental diagnostic measure in analyzing biological data and constructing predictive models for functional genomics. With the rising influence of the Gene Ontology, two complementary approaches have emerged where the similarity between two genes or gene products is obtained by comparing Gene Ontology (GO) annotations associated with the genes or gene products. One approach captures GO-based similarity in terms of hierarchical relations within each gene subontology. The other approach identifies GO-based similarity in terms of associative relations across the three gene subontologies. We propose a novel methodology where the two approaches can be merged with ensuing benefits in coverage and accuracy, and demonstrate that further improvements can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  5. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  6. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

    Science.gov (United States)

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.

  7. Formal modeling of Gene Ontology annotation predictions based on factor graphs

    Science.gov (United States)

    Spetale, Flavio; Murillo, Javier; Tapia, Elizabeth; Arce, Débora; Ponce, Sergio; Bulacio, Pilar

    2016-04-01

    Gene Ontology (GO) is a hierarchical vocabulary for gene product annotation. Its synergy with machine learning classification methods has been widely used for the prediction of protein functions. Current classification methods rely on heuristic solutions to check the consistency with some aspects of the underlying GO structure. In this work we formalize the GO is-a relationship through predicate logic. Moreover, an ontology model based on Forney Factor Graph (FFG) is shown on a general fragment of Cellular Component GO.

  8. Protein-Protein Interaction Network and Gene Ontology

    Science.gov (United States)

    Choi, Yunkyu; Kim, Seok; Yi, Gwan-Su; Park, Jinah

    Evolution of computer technologies makes it possible to access a large amount and various kinds of biological data via internet such as DNA sequences, proteomics data and information discovered about them. It is expected that the combination of various data could help researchers find further knowledge about them. Roles of a visualization system are to invoke human abilities to integrate information and to recognize certain patterns in the data. Thus, when the various kinds of data are examined and analyzed manually, an effective visualization system is an essential part. One instance of these integrated visualizations can be combination of protein-protein interaction (PPI) data and Gene Ontology (GO) which could help enhance the analysis of PPI network. We introduce a simple but comprehensive visualization system that integrates GO and PPI data where GO and PPI graphs are visualized side-by-side and supports quick reference functions between them. Furthermore, the proposed system provides several interactive visualization methods for efficiently analyzing the PPI network and GO directedacyclic- graph such as context-based browsing and common ancestors finding.

  9. Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis.

    Science.gov (United States)

    Pashaiasl, Maryam; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2016-09-01

    Diminished ovarian reserve (DOR) is one of the reasons for infertility that not only affects both older and young women. Ovarian reserve assessment can be used as a new prognostic tool for infertility treatment decision making. Here, up- and down-regulated gene expression profiles of granulosa cells were analysed to generate a putative interaction map of the involved genes. In addition, gene ontology (GO) analysis was used to get insight intol the biological processes and molecular functions of involved proteins in DOR. Eleven up-regulated genes and nine down-regulated genes were identified and assessed by constructing interaction networks based on their biological processes. PTGS2, CTGF, LHCGR, CITED, SOCS2, STAR and FSTL3 were the key nodes in the up-regulated networks, while the IGF2, AMH, GREM, and FOXC1 proteins were key in the down-regulated networks. MIRN101-1, MIRN153-1 and MIRN194-1 inhibited the expression of SOCS2, while CSH1 and BMP2 positively regulated IGF1 and IGF2. Ossification, ovarian follicle development, vasculogenesis, sequence-specific DNA binding transcription factor activity, and golgi apparatus are the major differential groups between up-regulated and down-regulated genes in DOR. Meta-analysis of publicly available transcriptomic data highlighted the high coexpression of CTGF, connective tissue growth factor, with the other key regulators of DOR. CTGF is involved in organ senescence and focal adhesion pathway according to GO analysis. These findings provide a comprehensive system biology based insight into the aetiology of DOR through network and gene ontology analyses.

  10. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  11. Visualization of mappings between the gene ontology and cluster trees

    Science.gov (United States)

    Jusufi, Ilir; Kerren, Andreas; Aleksakhin, Vladyslav; Schreiber, Falk

    2012-01-01

    Ontologies and hierarchical clustering are both important tools in biology and medicine to study high-throughput data such as transcriptomics and metabolomics data. Enrichment of ontology terms in the data is used to identify statistically overrepresented ontology terms, giving insight into relevant biological processes or functional modules. Hierarchical clustering is a standard method to analyze and visualize data to find relatively homogeneous clusters of experimental data points. Both methods support the analysis of the same data set, but are usually considered independently. However, often a combined view is desired: visualizing a large data set in the context of an ontology under consideration of a clustering of the data. This paper proposes a new visualization method for this task.

  12. Using Ontology Fingerprints to Evaluate Genome-wide Association Results

    OpenAIRE

    Lam Tsoi; Michael Boehnke; Richard Klein; Jim Zheng

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  13. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.

  14. Automatic annotation of protein motif function with Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Gopalakrishnan Vanathi

    2004-09-01

    Full Text Available Abstract Background Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

  15. Identification of disease-causing genes using microarray data mining and Gene Ontology.

    Science.gov (United States)

    Mohammadi, Azadeh; Saraee, Mohammad H; Salehi, Mansoor

    2011-01-26

    One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene

  16. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Directory of Open Access Journals (Sweden)

    Saraee Mohammad H

    2011-01-01

    Full Text Available Abstract Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional

  17. Identification of disease-causing genes using microarray data mining and Gene Ontology

    Science.gov (United States)

    2011-01-01

    Background One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions The proposed method addresses the weakness of conventional methods by adding a redundancy

  18. Semantic Mining based on graph theory and ontologies. Case Study: Cell Signaling Pathways

    Directory of Open Access Journals (Sweden)

    Carlos R. Rangel

    2016-08-01

    Full Text Available In this paper we use concepts from graph theory and cellular biology represented as ontologies, to carry out semantic mining tasks on signaling pathway networks. Specifically, the paper describes the semantic enrichment of signaling pathway networks. A cell signaling network describes the basic cellular activities and their interactions. The main contribution of this paper is in the signaling pathway research area, it proposes a new technique to analyze and understand how changes in these networks may affect the transmission and flow of information, which produce diseases such as cancer and diabetes. Our approach is based on three concepts from graph theory (modularity, clustering and centrality frequently used on social networks analysis. Our approach consists into two phases: the first uses the graph theory concepts to determine the cellular groups in the network, which we will call them communities; the second uses ontologies for the semantic enrichment of the cellular communities. The measures used from the graph theory allow us to determine the set of cells that are close (for example, in a disease, and the main cells in each community. We analyze our approach in two cases: TGF-ß and the Alzheimer Disease.

  19. Gene-ontology enrichment analysis in two independent family-based samples highlights biologically plausible processes for autism spectrum disorders.

    LENUS (Irish Health Repository)

    Anney, Richard J L

    2012-02-01

    Recent genome-wide association studies (GWAS) have implicated a range of genes from discrete biological pathways in the aetiology of autism. However, despite the strong influence of genetic factors, association studies have yet to identify statistically robust, replicated major effect genes or SNPs. We apply the principle of the SNP ratio test methodology described by O\\'Dushlaine et al to over 2100 families from the Autism Genome Project (AGP). Using a two-stage design we examine association enrichment in 5955 unique gene-ontology classifications across four groupings based on two phenotypic and two ancestral classifications. Based on estimates from simulation we identify excess of association enrichment across all analyses. We observe enrichment in association for sets of genes involved in diverse biological processes, including pyruvate metabolism, transcription factor activation, cell-signalling and cell-cycle regulation. Both genes and processes that show enrichment have previously been examined in autistic disorders and offer biologically plausibility to these findings.

  20. Functional discrimination of gene expression patterns in terms of the gene ontology.

    Science.gov (United States)

    Badea, Liviu

    2003-01-01

    The ever-growing amount of experimental data in molecular biology and genetics requires its automated analysis, by employing sophisticated knowledge discovery tools. We use an Inductive Logic Programming (ILP) learner to induce functional discrimination rules between genes studied using microarrays and found to be differentially expressed in three recently discovered subtypes of adenocarcinoma of the lung. The discrimination rules involve functional annotations from the Proteome HumanPSD database in terms of the Gene Ontology, whose hierarchical structure is essential for this task. While most of the lower levels of gene expression data (pre)processing have been automated, our work can be seen as a step toward automating the higher level functional analysis of the data. We view our application not just as a prototypical example of applying more sophisticated machine learning techniques to the functional analysis of genes, but also as an incentive for developing increasingly more sophisticated functional annotations and ontologies, that can be automatically processed by such learning algorithms.

  1. TopoICSim: a new semantic similarity measure based on gene ontology.

    Science.gov (United States)

    Ehsani, Rezvan; Drabløs, Finn

    2016-07-29

    The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at http://bigr.medisin.ntnu.no/tools/TopoICSim.R .

  2. Multimodal probabilistic generative models for time-course gene expression data and Gene Ontology (GO) tags.

    Science.gov (United States)

    Gabbur, Prasad; Hoying, James; Barnard, Kobus

    2015-10-01

    We propose four probabilistic generative models for simultaneously modeling gene expression levels and Gene Ontology (GO) tags. Unlike previous approaches for using GO tags, the joint modeling framework allows the two sources of information to complement and reinforce each other. We fit our models to three time-course datasets collected to study biological processes, specifically blood vessel growth (angiogenesis) and mitotic cell cycles. The proposed models result in a joint clustering of genes and GO annotations. Different models group genes based on GO tags and their behavior over the entire time-course, within biological stages, or even individual time points. We show how such models can be used for biological stage boundary estimation de novo. We also evaluate our models on biological stage prediction accuracy of held out samples. Our results suggest that the models usually perform better when GO tag information is included. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  4. Aspergillus flavus Blast2GO gene ontology database: elevated growth temperature alters amino acid metabolism

    Science.gov (United States)

    The availability of a representative gene ontology (GO) database is a prerequisite for a successful functional genomics study. Using online Blast2GO resources we constructed a GO database of Aspergillus flavus. Of the predicted total 13,485 A. flavus genes 8,987 were annotated with GO terms. The mea...

  5. Cellular functions of genetically imprinted genes in human and mouse as annotated in the gene ontology.

    Science.gov (United States)

    Hamed, Mohamed; Ismael, Siba; Paulsen, Martina; Helms, Volkhard

    2012-01-01

    By analyzing the cellular functions of genetically imprinted genes as annotated in the Gene Ontology for human and mouse, we found that imprinted genes are often involved in developmental, transport and regulatory processes. In the human, paternally expressed genes are enriched in GO terms related to the development of organs and of anatomical structures. In the mouse, maternally expressed genes regulate cation transport as well as G-protein signaling processes. Furthermore, we investigated if imprinted genes are regulated by common transcription factors. We identified 25 TF families that showed an enrichment of binding sites in the set of imprinted genes in human and 40 TF families in mouse. In general, maternally and paternally expressed genes are not regulated by different transcription factors. The genes Nnat, Klf14, Blcap, Gnas and Ube3a contribute most to the enrichment of TF families. In the mouse, genes that are maternally expressed in placenta are enriched for AP1 binding sites. In the human, we found that these genes possessed binding sites for both, AP1 and SP1.

  6. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  7. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  8. Approaching the axiomatic enrichment of the Gene Ontology from a lexical perspective.

    Science.gov (United States)

    Quesada-Martínez, Manuel; Mikroyannidi, Eleni; Fernández-Breis, Jesualdo Tomás; Stevens, Robert

    2015-09-01

    The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO). In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium. The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value. We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of

  9. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

    Science.gov (United States)

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; Wang, Yadong; Rhee, Seung Y; Chen, Jin

    2015-02-14

    Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/~jinchen/NETSIM .

  10. Ontological Enrichment of the Genes-to-Systems Breast Cancer Database

    Science.gov (United States)

    Viti, Federica; Mosca, Ettore; Merelli, Ivan; Calabria, Andrea; Alfieri, Roberta; Milanesi, Luciano

    Breast cancer research need the development of specific and suitable tools to appropriately manage biomolecular knowledge. The presented work deals with the integrative storage of breast cancer related biological data, in order to promote a system biology approach to this network disease. To increase data standardization and resource integration, annotations maintained in Genes-to-Systems Breast Cancer (G2SBC) database are associated to ontological terms, which provide a hierarchical structure to organize data enabling more effective queries, statistical analysis and semantic web searching. Exploited ontologies, which cover all levels of the molecular environment, from genes to systems, are among the most known and widely used bioinformatics resources. In G2SBC database ontology terms both provide a semantic layer to improve data storage, accessibility and analysis and represent a user friendly instrument to identify relations among biological components.

  11. Interactome and Gene Ontology provide congruent yet subtly different views of a eukaryotic cell

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2009-07-01

    Full Text Available Abstract Background The characterization of the global functional structure of a cell is a major goal in bioinformatics and systems biology. Gene Ontology (GO and the protein-protein interaction network offer alternative views of that structure. Results This study presents a comparison of the global structures of the Gene Ontology and the interactome of Saccharomyces cerevisiae. Sensitive, unsupervised methods of clustering applied to a large fraction of the proteome led to establish a GO-interactome correlation value of +0.47 for a general dataset that contains both high and low-confidence interactions and +0.58 for a smaller, high-confidence dataset. Conclusion The structures of the yeast cell deduced from GO and interactome are substantially congruent. However, some significant differences were also detected, which may contribute to a better understanding of cell function and also to a refinement of the current ontologies.

  12. The effects of shared information on semantic calculations in the gene ontology.

    Science.gov (United States)

    Bible, Paul W; Sun, Hong-Wei; Morasso, Maria I; Loganantharaj, Rasiah; Wei, Lai

    2017-01-01

    The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.com/paulbible/ggtk).

  13. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorporates interaction terms from the PSI Molecular Interactions (PSI-MI) and Gene Ontology (GO). Using INO-based literature mining results, a modified Fisher's exact test was established to analyze significantly over- and under-represented enriched gene-gene interaction types within a specific area. Such a strategy was applied to study the vaccine-mediated gene-gene interactions using all PubMed abstracts. The Vaccine Ontology (VO) and INO were used to support the retrieval of vaccine terms and interaction keywords from the literature. INO is aligned with the Basic Formal Ontology (BFO) and imports terms from 10 other existing ontologies. Current INO includes 540 terms. In terms of interaction-related terms, INO imports and aligns PSI-MI and GO interaction terms and includes over 100 newly generated ontology terms with 'INO_' prefix. A new annotation property, 'has literature mining keywords', was generated to allow the listing of different keywords mapping to the interaction types in INO. Using all PubMed documents published as of 12/31/2013, approximately 266,000 vaccine-associated documents were identified, and a total of 6,116 gene-pairs were associated with at least one INO term. Out of 78 INO interaction terms associated with at least five gene-pairs of the vaccine-associated sub-network, 14 terms were significantly over-represented (i.e., more frequently used) and 17 under-represented based on our modified Fisher's exact test. These over-represented and under-represented terms share some common top-level terms but are distinct at the bottom levels of the INO hierarchy. The analysis of these

  14. Muscle Research and Gene Ontology: New standards for improved data integration

    Directory of Open Access Journals (Sweden)

    Nori Alessandra

    2009-01-01

    Full Text Available Abstract Background The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community. Results The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature. Conclusion The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.

  15. Dictionary and Gene Ontology Based Similarity for Named Entity Relationship Protein-protein Interaction Prediction from Biotext Corpus

    Directory of Open Access Journals (Sweden)

    Smt K. Prabavathy

    2014-12-01

    Full Text Available Protein-protein interactions functions as a significant key role in several biological systems. These involves in complex formation and many pathways which are used to perform biological processes. By accurate identification of the set of interacting proteins can get rid of new light on the functional role of various proteins in the complex surroundings of the cell. The ability to construct biologically consequential gene networks and identification of the exact relationship in the gene network is critical for present-day systems biology. In earlier research, the power of presented gene modules to shed light on the functioning of complex biological systems is studied. Most of modules in these networks have shown small link with meaningful biological function, because these methods doesn’t exactly calculate the semantic relationship between the entities. In order to overcome these problems and improve the PPI results in the biotext corpus a new method is proposed in this research. The proposed method which directly incorporates Gene Ontology (GO annotation in construction of gene modules and Dictionary-based text is proposed to extract biotext information. Dictionary-Based Text and Gene Ontology (DBTGO approach that integrates with various gene-gene pairwise similarity values, protein-protein interaction relationship obtained from gene expression, in order to gain better biotext information retrieval result. A result analysis has been carried out on Biotext Project at UC Berkley. Testing the DBTGO algorithm indicates that it is able to improve PPI relationship identification result with all previously suggested methods in terms of the precision, recall, F measure and Normalized Discounted Cumulative Gain (NDCG. The proposed DBTGO algorithm can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  16. From "glycosyltransferase" to "congenital muscular dystrophy": integrating knowledge from NCBI Entrez Gene and the Gene Ontology.

    Science.gov (United States)

    Sahoo, Satya S; Zeng, Kelly; Bodenreider, Olivier; Sheth, Amit

    2007-01-01

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.

  17. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  18. Extending gene ontology in the context of extracellular RNA and vesicle communication

    NARCIS (Netherlands)

    Cheung, Kei-Hoi; Keerthikumar, Shivakumar; Roncaglia, Paola; Subramanian, Sai Lakshmi; Roth, Matthew E; Samuel, Monisha; Anand, Sushma; Gangoda, Lahiru; Gould, Stephen; Alexander, Roger; Galas, David; Gerstein, Mark B; Hill, Andrew F; Kitchen, Robert R; Lötvall, Jan; Patel, Tushar; Procaccini, Dena C; Quesenberry, Peter; Rozowsky, Joel; Raffai, Robert L; Shypitsyna, Aleksandra; Su, Andrew I; Théry, Clotilde; Vickers, Kasey; Wauben, Marca H M; Mathivanan, Suresh; Milosavljevic, Aleksandar; Laurent, Louise C

    2016-01-01

    BACKGROUND: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA

  19. Ontology based molecular signatures for immune cell types via gene expression analysis

    Science.gov (United States)

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  20. Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach.

    Science.gov (United States)

    Patel, Sejal; Park, Min Tae M; Chakravarty, M Mallar; Knight, Jo

    2016-01-01

    Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease.

  1. Evaluation of clustering algorithms for gene expression data using gene ontology annotations

    Institute of Scientific and Technical Information of China (English)

    MA Ning; ZHANG Zheng-guo

    2012-01-01

    Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.

  2. A multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors for functional gene analysis.

    Science.gov (United States)

    Weber, Kristoffer; Bartsch, Udo; Stocking, Carol; Fehse, Boris

    2008-04-01

    Functional gene analysis requires the possibility of overexpression, as well as downregulation of one, or ideally several, potentially interacting genes. Lentiviral vectors are well suited for this purpose as they ensure stable expression of complementary DNAs (cDNAs), as well as short-hairpin RNAs (shRNAs), and can efficiently transduce a wide spectrum of cell targets when packaged within the coat proteins of other viruses. Here we introduce a multicolor panel of novel lentiviral "gene ontology" (LeGO) vectors designed according to the "building blocks" principle. Using a wide spectrum of different fluorescent markers, including drug-selectable enhanced green fluorescent protein (eGFP)- and dTomato-blasticidin-S resistance fusion proteins, LeGO vectors allow simultaneous analysis of multiple genes and shRNAs of interest within single, easily identifiable cells. Furthermore, each functional module is flanked by unique cloning sites, ensuring flexibility and individual optimization. The efficacy of these vectors for analyzing multiple genes in a single cell was demonstrated in several different cell types, including hematopoietic, endothelial, and neural stem and progenitor cells, as well as hepatocytes. LeGO vectors thus represent a valuable tool for investigating gene networks using conditional ectopic expression and knock-down approaches simultaneously.

  3. A relation based measure of semantic similarity for Gene Ontology annotations

    Directory of Open Access Journals (Sweden)

    Gaudin Benoit

    2008-11-01

    Full Text Available Abstract Background Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description. Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. Results We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. Conclusion We derive a measure of semantic

  4. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  5. Ontology-based literature mining of E. coli vaccine-associated gene interaction networks.

    Science.gov (United States)

    Hur, Junguk; Özgür, Arzucan; He, Yongqun

    2017-03-14

    Pathogenic Escherichia coli infections cause various diseases in humans and many animal species. However, with extensive E. coli vaccine research, we are still unable to fully protect ourselves against E. coli infections. To more rational development of effective and safe E. coli vaccine, it is important to better understand E. coli vaccine-associated gene interaction networks. In this study, we first extended the Vaccine Ontology (VO) to semantically represent various E. coli vaccines and genes used in the vaccine development. We also normalized E. coli gene names compiled from the annotations of various E. coli strains using a pan-genome-based annotation strategy. The Interaction Network Ontology (INO) includes a hierarchy of various interaction-related keywords useful for literature mining. Using VO, INO, and normalized E. coli gene names, we applied an ontology-based SciMiner literature mining strategy to mine all PubMed abstracts and retrieve E. coli vaccine-associated E. coli gene interactions. Four centrality metrics (i.e., degree, eigenvector, closeness, and betweenness) were calculated for identifying highly ranked genes and interaction types. Using vaccine-related PubMed abstracts, our study identified 11,350 sentences that contain 88 unique INO interactions types and 1,781 unique E. coli genes. Each sentence contained at least one interaction type and two unique E. coli genes. An E. coli gene interaction network of genes and INO interaction types was created. From this big network, a sub-network consisting of 5 E. coli vaccine genes, including carA, carB, fimH, fepA, and vat, and 62 other E. coli genes, and 25 INO interaction types was identified. While many interaction types represent direct interactions between two indicated genes, our study has also shown that many of these retrieved interaction types are indirect in that the two genes participated in the specified interaction process in a required but indirect process. Our centrality analysis of

  6. Evaluating the significance of protein functional similarity based on gene ontology.

    Science.gov (United States)

    Konopka, Bogumil M; Golda, Tomasz; Kotulska, Malgorzata

    2014-11-01

    Gene ontology is among the most successful ontologies in the biomedical domain. It is used to describe, unambiguously, protein molecular functions, cellular localizations, and processes in which proteins participate. The hierarchical structure of gene ontology allows quantifying protein functional similarity by application of algorithms that calculate semantic similarities. The scores, however, are meaningless without a given context. Here, we propose how to evaluate the significance of protein function semantic similarity scores by comparing them to reference distributions calculated for randomly chosen proteins. In the study, thresholds for significant functional semantic similarity, in four representative annotation corpuses, were estimated. We also show that the score significance is influenced by the number and specificity of gene ontology terms that are annotated to compared proteins. While proteins with a greater number of terms tend to yield higher similarity scores, proteins with more specific terms produce lower scores. The estimated significance thresholds were validated using protein sequence-function and structure-function relationships. Taking into account the term number and term specificity improves the distinction between significant and insignificant semantic similarity comparisons.

  7. Globaltest and GOEAST: two different approaches for Gene Ontology analysis

    NARCIS (Netherlands)

    Hulsegge, B.; Kommadath, A.; Smits, M.A.

    2009-01-01

    Background Gene set analysis is a commonly used method for analysing microarray data by considering groups of functionally related genes instead of individual genes. Here we present the use of two gene set analysis approaches: Globaltest and GOEAST. Globaltest is a method for testing whether sets of

  8. Using Ontology Fingerprints to evaluate genome-wide association study results

    OpenAIRE

    Tsoi, Lam C.; Michael Boehnke; Klein, Richard L.; Jim Zheng, W.

    2009-01-01

    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach...

  9. The mammalian adult neurogenesis gene ontology (MANGO) provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Science.gov (United States)

    Overall, Rupert W; Paszkowski-Rogacz, Maciej; Kempermann, Gerd

    2012-01-01

    Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes) to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already successful 'top-down' approach of the Gene Ontology.

  10. The mammalian adult neurogenesis gene ontology (MANGO provides a structural framework for published information on genes regulating adult hippocampal neurogenesis.

    Directory of Open Access Journals (Sweden)

    Rupert W Overall

    Full Text Available BACKGROUND: Adult hippocampal neurogenesis is not a single phenotype, but consists of a number of sub-processes, each of which is under complex genetic control. Interpretation of gene expression studies using existing resources often does not lead to results that address the interrelatedness of these processes. Formal structure, such as provided by ontologies, is essential in any field for comprehensive interpretation of existing knowledge but, until now, such a structure has been lacking for adult neurogenesis. METHODOLOGY/PRINCIPAL FINDINGS: We have created a resource with three components 1. A structured ontology describing the key stages in the development of adult hippocampal neural stem cells into functional granule cell neurons. 2. A comprehensive survey of the literature to annotate the results of all published reports on gene function in adult hippocampal neurogenesis (257 manuscripts covering 228 genes to the appropriate terms in our ontology. 3. An easy-to-use searchable interface to the resulting database made freely available online. The manuscript presents an overview of the database highlighting global trends such as the current bias towards research on early proliferative stages, and an example gene set enrichment analysis. A limitation of the resource is the current scope of the literature which, however, is growing by around 100 publications per year. With the ontology and database in place, new findings can be rapidly annotated and regular updates of the database will be made publicly available. CONCLUSIONS/SIGNIFICANCE: The resource we present allows relevant interpretation of gene expression screens in terms of defined stages of postnatal neuronal development. Annotation of genes by hand from the adult neurogenesis literature ensures the data are directly applicable to the system under study. We believe this approach could also serve as an example to other fields in a 'bottom-up' community effort complementing the already

  11. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  12. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Science.gov (United States)

    Vashisht, Shikha; Bagler, Ganesh

    2012-01-01

    Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC) is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  13. An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

    Directory of Open Access Journals (Sweden)

    Shikha Vashisht

    Full Text Available Metastasis is one of the most enigmatic aspects of cancer pathogenesis and is a major cause of cancer-associated mortality. Secondary bone cancer (SBC is a complex disease caused by metastasis of tumor cells from their primary site and is characterized by intricate interplay of molecular interactions. Identification of targets for multifactorial diseases such as SBC, the most frequent complication of breast and prostate cancers, is a challenge. Towards achieving our aim of identification of targets specific to SBC, we constructed a 'Cancer Genes Network', a representative protein interactome of cancer genes. Using graph theoretical methods, we obtained a set of key genes that are relevant for generic mechanisms of cancers and have a role in biological essentiality. We also compiled a curated dataset of 391 SBC genes from published literature which serves as a basis of ontological correlates of secondary bone cancer. Building on these results, we implement a strategy based on generic cancer genes, SBC genes and gene ontology enrichment method, to obtain a set of targets that are specific to bone metastasis. Through this study, we present an approach for probing one of the major complications in cancers, namely, metastasis. The results on genes that play generic roles in cancer phenotype, obtained by network analysis of 'Cancer Genes Network', have broader implications in understanding the role of molecular regulators in mechanisms of cancers. Specifically, our study provides a set of potential targets that are of ontological and regulatory relevance to secondary bone cancer.

  14. Systematically characterizing and prioritizing chemosensitivity related gene based on Gene Ontology and protein interaction network

    Directory of Open Access Journals (Sweden)

    Chen Xin

    2012-10-01

    Full Text Available Abstract Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN. Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable

  15. Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology.

    Science.gov (United States)

    Mortensen, Jonathan M; Telis, Natalie; Hughey, Jacob J; Fan-Minogue, Hua; Van Auken, Kimberly; Dumontier, Michel; Musen, Mark A

    2016-04-01

    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

  16. Using Network Extracted Ontologies to Identify Novel Genes with Roles in Appressorium Development in the Rice Blast Fungus Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Ryan M. Ames

    2017-01-01

    Full Text Available Magnaporthe oryzae is the causal agent of rice blast disease, the most important infection of rice worldwide. Half the world’s population depends on rice for its primary caloric intake and, as such, rice blast poses a serious threat to food security. The stages of M. oryzae infection are well defined, with the formation of an appressorium, a cell type that allows penetration of the plant cuticle, particularly well studied. However, many of the key pathways and genes involved in this disease stage are yet to be identified. In this study, I have used network-extracted ontologies (NeXOs, hierarchical structures inferred from RNA-Seq data, to identify pathways involved in appressorium development, which in turn highlights novel genes with potential roles in this process. This study illustrates the use of NeXOs for pathway identification from large-scale genomics data and also identifies novel genes with potential roles in disease. The methods presented here will be useful to study disease processes in other pathogenic species and these data represent predictions of novel targets for intervention in M. oryzae.

  17. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    Science.gov (United States)

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  18. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Science.gov (United States)

    Zhang, Shu-Bo; Tang, Qiang-Rong

    2016-07-21

    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction.

  19. Unifying themes in microbial associations with animal and plant hosts described using the gene ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Collmer, Candace W; Gwinn-Giglio, Michelle; Lindeberg, Magdalen; Meng, Shaowu; Chibucos, Marcus C; Tseng, Tsai-Tien; Lomax, Jane; Biehl, Bryan; Ireland, Amelia; Bird, David; Dean, Ralph A; Glasner, Jeremy D; Perna, Nicole; Setubal, Joao C; Collmer, Alan; Tyler, Brett M

    2010-12-01

    Microbes form intimate relationships with hosts (symbioses) that range from mutualism to parasitism. Common microbial mechanisms involved in a successful host association include adhesion, entry of the microbe or its effector proteins into the host cell, mitigation of host defenses, and nutrient acquisition. Genes associated with these microbial mechanisms are known for a broad range of symbioses, revealing both divergent and convergent strategies. Effective comparisons among these symbioses, however, are hampered by inconsistent descriptive terms in the literature for functionally similar genes. Bioinformatic approaches that use homology-based tools are limited to identifying functionally similar genes based on similarities in their sequences. An effective solution to these limitations is provided by the Gene Ontology (GO), which provides a standardized language to describe gene products from all organisms. The GO comprises three ontologies that enable one to describe the molecular function(s) of gene products, the biological processes to which they contribute, and their cellular locations. Beginning in 2004, the Plant-Associated Microbe Gene Ontology (PAMGO) interest group collaborated with the GO consortium to extend the GO to accommodate terms for describing gene products associated with microbe-host interactions. Currently, over 900 terms that describe biological processes common to diverse plant- and animal-associated microbes are incorporated into the GO database. Here we review some unifying themes common to diverse host-microbe associations and illustrate how the new GO terms facilitate a standardized description of the gene products involved. We also highlight areas where new terms need to be developed, an ongoing process that should involve the whole community.

  20. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining.

    Science.gov (United States)

    Hur, Junguk; Ozgür, Arzucan; Xiang, Zuoshuang; He, Yongqun

    2012-12-20

    Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Over 170,000 fever-related articles from PubMed abstracts and titles were retrieved and analysed at the sentence level using natural language processing techniques to identify genes and vaccines (including 186 Vaccine Ontology terms) as well as their interactions. This resulted in a generic fever network consisting of 403 genes and 577 gene interactions. A vaccine-specific fever sub-network consisting of 29 genes and 28 gene interactions was extracted from articles that are related to both fever and vaccines. In addition, gene-vaccine interactions were identified. Vaccines (including 4 specific vaccine names) were found to directly interact with 26 genes. Gene set enrichment analysis was performed using the genes in the generated interaction networks. Moreover, the genes in these networks were prioritized using network centrality metrics. Making scientific discoveries and generating new hypotheses were possible by using network centrality and gene set enrichment analyses. For example, our study found that the genes in the generic fever network were more enriched in cell death and responses to wounding, and the vaccine sub-network had more gene enrichment in leukocyte activation and phosphorylation regulation. The most central genes in the vaccine-specific fever network are predicted to be highly relevant to vaccine-induced fever, whereas genes that are central only in the generic fever network are likely to be highly relevant to generic fever responses. Interestingly, no Toll-like receptors (TLRs) were found in the gene-vaccine interaction network. Since

  1. Combining sequence and Gene Ontology for protein module detection in the Weighted Network.

    Science.gov (United States)

    Yu, Yang; Liu, Jie; Feng, Nuan; Song, Bo; Zheng, Zeyu

    2017-01-07

    Studies of protein modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in locating protein modules. In this paper, a new approach combining Gene Ontology and amino acid background frequency is introduced to detect the protein modules in the weighted PPI networks. The proposed approach mainly consists of three parts: the feature extraction, the weighted graph construction and the protein complex detection. Firstly, the topology-sequence information is utilized to present the feature of protein complex. Secondly, six types of the weighed graph are constructed by combining PPI network and Gene Ontology information. Lastly, protein complex algorithm is applied to the weighted graph, which locates the clusters based on three conditions, including density, network diameter and the included angle cosine. Experiments have been conducted on two protein complex benchmark sets for yeast and the results show that the approach is more effective compared to five typical algorithms with the performance of f-measure and precision. The combination of protein interaction network with sequence and gene ontology data is helpful to improve the performance and provide a optional method for protein module detection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. An improved method for functional similarity analysis of genes based on Gene Ontology.

    Science.gov (United States)

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  3. Ontology-Driven Co-clustering of Gene Expression Data

    Science.gov (United States)

    Cordero, Francesca; Pensa, Ruggero G.; Visconti, Alessia; Ienco, Dino; Botta, Marco

    The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

  4. Genetic resources for advanced biofuel production described with the Gene Ontology.

    Science.gov (United States)

    Torto-Alalibo, Trudy; Purwantini, Endang; Lomax, Jane; Setubal, João C; Mukhopadhyay, Biswarup; Tyler, Brett M

    2014-01-01

    Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology () project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat), can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  5. Genetic Resources for Advanced Biofuel Production Described with the Gene Ontology

    Directory of Open Access Journals (Sweden)

    Trudy eTorto-Alalibo

    2014-10-01

    Full Text Available Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial Energy Gene Ontology (MENGO: http://www.mengo.biochem.vt.edu project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat, can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.

  6. GOParGenPy: a high throughput method to generate gene ontology data matrices.

    Science.gov (United States)

    Kumar, Ajay Anand; Holm, Liisa; Toronen, Petri

    2013-08-08

    Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.

  7. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

    Science.gov (United States)

    Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin M

    2016-09-09

    Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

  8. Onto-CC: a web server for identifying Gene Ontology conceptual clusters

    Science.gov (United States)

    Romero-Zaliz, R.; del Val, C.; Cobb, J. P.; Zwir, I.

    2008-01-01

    The Gene Ontology (GO) vocabulary has been extensively explored to analyze the functions of coexpressed genes. However, despite its extended use in Biology and Medical Sciences, there are still high levels of uncertainty about which ontology (i.e. Molecular Process, Cellular Component or Molecular Function) should be used, and at which level of specificity. Moreover, the GO database can contain incomplete information resulting from human annotations, or highly influenced by the available knowledge about a specific branch in an ontology. In spite of these drawbacks, there is a trend to ignore these problems and even use GO terms to conduct searches of gene expression profiles (i.e. expression + GO) instead of more cautious approaches that just consider them as an independent source of validation (i.e. expression versus GO). Consequently, propagating the uncertainty and producing biased analysis of the required gene grouping hypotheses. We proposed a web tool, Onto-CC, as an automatic method specially suited for independent explanation/validation of gene grouping hypotheses (e.g. coexpressed genes) based on GO clusters (i.e. expression versus GO). Onto-CC approach reduces the uncertainty of the queries by identifying optimal conceptual clusters that combine terms from different ontologies simultaneously, as well as terms defined at different levels of specificity in the GO hierarchy. To do so, we implemented the EMO-CC methodology to find clusters in structural databases [GO Directed acyclic Graph (DAG) tree], inspired on Conceptual Clustering algorithms. This approach allows the management of optimal cluster sets as potential parallel hypotheses, guided by multiobjective/multimodal optimization techniques. Therefore, we can generate alternative and, still, optimal explanations of queries that can provide new insights for a given problem. Onto-CC has been successfully used to test different medical and biological hypotheses including the explanation and prediction of

  9. Ontology design patterns to disambiguate relations between genes and gene products in GENIA.

    Science.gov (United States)

    Hoehndorf, Robert; Ngonga Ngomo, Axel-Cyrille; Pyysalo, Sampo; Ohta, Tomoko; Oellrich, Anika; Rebholz-Schuhmann, Dietrich

    2011-10-06

    Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  10. Ontology design patterns to disambiguate relations between genes and gene products in GENIA

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2011-10-01

    Full Text Available Abstract Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.

  11. Gene expression, signal transduction pathways and functional networks associated with growth of sporadic vestibular schwannomas

    DEFF Research Database (Denmark)

    Sass, Hjalte Christian Reeberg; Borup, Rehannah; Alanin, Mikkel

    2017-01-01

    The objective of this study was to determine global gene expression in relation to Vestibular schwannomas (VS) growth rate and to identify signal transduction pathways and functional molecular networks associated with growth. Repeated magnetic resonance imaging (MRI) prior to surgery determined...... of signal transduction pathways and functional molecular networks associated with tumor growth. In total 109 genes were deregulated in relation to tumor growth rate. Genes associated with apoptosis, growth and cell proliferation were deregulated. Gene ontology included regulation of the cell cycle, cell...... differentiation and proliferation, among other functions. Fourteen pathways were associated with tumor growth. Five functional molecular networks were generated. This first study on global gene expression in relation to vestibular schwannoma growth rate identified several genes, signal transduction pathways...

  12. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

    Science.gov (United States)

    Diehl, Alexander D; Meehan, Terrence F; Bradford, Yvonne M; Brush, Matthew H; Dahdul, Wasila M; Dougall, David S; He, Yongqun; Osumi-Sutherland, David; Ruttenberg, Alan; Sarntivijai, Sirarat; Van Slyke, Ceri E; Vasilevsky, Nicole A; Haendel, Melissa A; Blake, Judith A; Mungall, Christopher J

    2016-07-04

    The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the

  13. PPDB: A Tool for Investigation of Plants Physiology Based on Gene Ontology.

    Science.gov (United States)

    Sharma, Ajay Shiv; Gupta, Hari Om; Prasad, Rajendra

    2015-09-01

    Representing the way forward, from functional genomics and its ontology to functional understanding and physiological model, in a computationally tractable fashion is one of the ongoing challenges faced by computational biology. To tackle the standpoint, we herein feature the applications of contemporary database management to the development of PPDB, a searching and browsing tool for the Plants Physiology Database that is based upon the mining of a large amount of gene ontology data currently available. The working principles and search options associated with the PPDB are publicly available and freely accessible online ( http://www.iitr.ac.in/ajayshiv/ ) through a user-friendly environment generated by means of Drupal-6.24. By knowing that genes are expressed in temporally and spatially characteristic patterns and that their functionally distinct products often reside in specific cellular compartments and may be part of one or more multicomponent complexes, this sort of work is intended to be relevant for investigating the functional relationships of gene products at a system level and, thus, helps us approach to the full physiology.

  14. Visualization and analysis of microarray and gene ontology data with treemaps

    Directory of Open Access Journals (Sweden)

    Babaria Ketan

    2004-06-01

    Full Text Available Abstract Background The increasing complexity of genomic data presents several challenges for biologists. Limited computer monitor views of data complexity and the dynamic nature of data in the midst of discovery increase the challenge of integrating experimental results with information resources. The use of Gene Ontology enables researchers to summarize results of quantitative analyses in this framework, but the limitations of typical browser presentation restrict data access. Results Here we describe extensions to the treemap design to visualize and query genome data. Treemaps are a space-filling visualization technique for hierarchical structures that show attributes of leaf nodes by size and color-coding. Treemaps enable users to rapidly compare sizes of nodes and sub-trees, and we use Gene Ontology categories, levels of RNA, and other quantitative attributes of DNA microarray experiments as examples. Our implementation of treemaps, Treemap 4.0, allows user-defined filtering to focus on the data of greatest interest, and these queried files can be exported for secondary analyses. Links to model system web pages from Treemap 4.0 enable users access to details about specific genes without leaving the query platform. Conclusions Treemaps allow users to view and query the data from an experiment on a single computer monitor screen. Treemap 4.0 can be used to visualize various genome data, and is particularly useful for revealing patterns and details within complex data sets.

  15. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    Science.gov (United States)

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  16. Ontology or formal ontology

    Science.gov (United States)

    Žáček, Martin

    2017-07-01

    Ontology or formal ontology? Which word is correct? The aim of this article is to introduce correct terms and explain their basis. Ontology describes a particular area of interest (domain) in a formal way - defines the classes of objects that are in that area, and relationships that may exist between them. Meaning of ontology consists mainly in facilitating communication between people, improve collaboration of software systems and in the improvement of systems engineering. Ontology in all these areas offer the possibility of unification of view, maintaining consistency and unambiguity.

  17. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

    Science.gov (United States)

    Köhler, Sebastian; Doelken, Sandra C; Ruef, Barbara J; Bauer, Sebastian; Washington, Nicole; Westerfield, Monte; Gkoutos, George; Schofield, Paul; Smedley, Damian; Lewis, Suzanna E; Robinson, Peter N; Mungall, Christopher J

    2013-01-01

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  18. SoyBase Soybean Ontologies: Pathways to Soybean Growth and Developmental Description

    Science.gov (United States)

    Plant anatomy has been a recognized discipline for many years. As a result, it has a very structured ontology for the anatomical parts of most plants. The same cannot be said for the complicated phenotypic traits of most plants. Listing analogous traits between different plant species is extremel...

  19. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research.

    Science.gov (United States)

    Primmer, C R; Papakostas, S; Leder, E H; Davis, M J; Ragan, M A

    2013-06-01

    Recent advances in molecular technologies have opened up unprecedented opportunities for molecular ecologists to better understand the molecular basis of traits of ecological and evolutionary importance in almost any organism. Nevertheless, reliable and systematic inference of functionally relevant information from these masses of data remains challenging. The aim of this review is to highlight how the Gene Ontology (GO) database can be of use in resolving this challenge. The GO provides a largely species-neutral source of information on the molecular function, biological role and cellular location of tens of thousands of gene products. As it is designed to be species-neutral, the GO is well suited for cross-species use, meaning that, functional annotation derived from model organisms can be transferred to inferred orthologues in newly sequenced species. In other words, the GO can provide gene annotation information for species with nonannotated genomes. In this review, we describe the GO database, how functional information is linked with genes/gene products in model organisms, and how molecular ecologists can utilize this information to annotate their own data. Then, we outline various applications of GO for enhancing the understanding of molecular basis of traits in ecologically relevant species. We also highlight potential pitfalls, provide step-by-step recommendations for conducting a sound study in nonmodel organisms, suggest avenues for future research and outline a strategy for maximizing the benefits of a more ecological and evolutionary genomics-oriented ontology by ensuring its compatibility with the GO. © 2013 John Wiley & Sons Ltd.

  20. Changes in winter depression phenotype correlate with white blood cell gene expression profiles: a combined metagene and gene ontology approach.

    Science.gov (United States)

    Bosker, Fokko J; Terpstra, Peter; Gladkevich, Anatoliy V; Janneke Dijck-Brouwer, D A; te Meerman, Gerard; Nolen, Willem A; Schoevers, Robert A; Meesters, Ybe

    2015-04-03

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior and following bright light therapy and in summer. RNA was isolated, converted into cRNA, amplified and hybridized on Illumina® gene expression arrays. The raw optical array data were quantile normalized and thereafter analyzed using a metagene approach, based on previously published Affymetrix gene array data. The raw data were also subjected to a secondary analysis focusing on circadian genes and genes involved in serotonergic neurotransmission. Differences between the conditions were analyzed, using analysis of variance on the principal components of the metagene score matrix. After correction for multiple testing no statistically significant differences were found. Another approach uses the correlation between metagene factor weights and the actual expression values, averaged over conditions. When comparing the correlations of winter vs. summer and bright light therapy vs. summer significant changes for several metagenes were found. Subsequent gene ontology analyses (DAVID and GeneTrail) of 5 major metagenes suggest an interaction between brain and white blood cells. The hypothesis driven analysis with a smaller group of genes failed to demonstrate any significant effects. The results from the combined metagene and gene ontology analyses support the idea of communication between brain and white blood cells. Future studies will need a much larger sample size to obtain information at the level of single genes. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

    Science.gov (United States)

    Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J

    2016-02-01

    Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  3. Orymold: ontology based gene expression data integration and analysis tool applied to rice

    Directory of Open Access Journals (Sweden)

    Segura Jordi

    2009-05-01

    Full Text Available Abstract Background Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies. Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces. Results We designed and implemented a software tool that allows investigators to create their own semantic model of an organism and to use it to dynamically integrate expression data obtained from DNA microarrays and other probe based technologies. The software provides tools to use the semantic model to postulate and validate of hypotheses on the spatial and temporal expression and function of genes. In order to illustrate the software's use and features, we used it to build a semantic model of rice (Oryza sativa and integrated experimental data into it. Conclusion In this paper we describe the development and features of a flexible software application for dynamic gene expression data annotation, integration, and exploration called Orymold. Orymold is freely available for non-commercial users from http://www.oryzon.com/media/orymold.html

  4. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

    Science.gov (United States)

    Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

    2017-03-20

    The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .

  5. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data.

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer's disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology.

  6. A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data

    Science.gov (United States)

    Fruzangohar, Mario; Ebrahimie, Esmaeil; Adelson, David L.

    2017-01-01

    Gene Ontology (GO) classification of statistically significantly differentially expressed genes is commonly used to interpret transcriptomics data as a part of functional genomic analysis. In this approach, all significantly expressed genes contribute equally to the final GO classification regardless of their actual expression levels. Gene expression levels can significantly affect protein production and hence should be reflected in GO term enrichment. Genes with low expression levels can also participate in GO term enrichment through cumulative effects. In this report, we have introduced a new GO enrichment method that is suitable for multiple samples and time series experiments that uses a statistical outlier test to detect GO categories with special patterns of variation that can potentially identify candidate biological mechanisms. To demonstrate the value of our approach, we have performed two case studies. Whole transcriptome expression profiles of Salmonella enteritidis and Alzheimer’s disease (AD) were analysed in order to determine GO term enrichment across the entire transcriptome instead of a subset of differentially expressed genes used in traditional GO analysis. Our result highlights the key role of inflammation related functional groups in AD pathology as granulocyte colony-stimulating factor receptor binding, neuromedin U binding, and interleukin were remarkably upregulated in AD brain when all using all of the gene expression data in the transcriptome. Mitochondrial components and the molybdopterin synthase complex were identified as potential key cellular components involved in AD pathology. PMID:28199395

  7. Genetic resources for methane production from biomass described with gene ontology

    Directory of Open Access Journals (Sweden)

    Endang ePurwantini

    2014-12-01

    Full Text Available Methane (CH4 is a valuable fuel, constituting 70-95% of natural gas, and a potent greenhouse gas. Release of CH4 into the atmosphere contributes to climate change. Biological CH4 production or methanogenesis is mostly performed by methanogens, a group of strictly anaerobic archaea. The direct substrates for methanogenesis are H2 plus CO2, acetate, formate, methylamines, methanol, methyl sulfides, and ethanol or a secondary alcohol plus CO2. In numerous anaerobic niches in nature, methanogenesis facilitates mineralization of complex biopolymers such as carbohydrates, lipids and proteins generated by primary producers. Thus, methanogens are critical players in the global carbon cycle. The same process is used in anaerobic treatment of municipal, industrial and agricultural wastes, reducing the biological pollutants in the wastes and generating methane. It also holds potential for commercial production of natural gas from renewable resources. This process operates in digestive systems of many animals, including cattle, and humans. In contrast, in deep-sea hydrothermal vents methanogenesis is a primary production process, allowing chemosynthesis of biomaterials from H2 plus CO2. In this report we present Gene Ontology (GO terms that can be used to describe processes, functions and cellular components involved in methanogenic biodegradation and biosynthesis of specialized coenzymes that methanogens use. Some of these GO terms were previously available and the rest were generated in our Microbial Energy Gene Ontology (MENGO project. A recently discovered non-canonical CH4 production process is also described. We have performed manual GO annotation of selected methanogenesis genes, based on experimental evidence, providing gold standards for machine annotation and automated discovery of methanogenesis genes or systems in diverse genomes. Most of the GO-related information presented in this report is available at the MENGO website (http://www.mengo.biochem.vt.edu/.

  8. Assessing identity, redundancy and confounds in Gene Ontology annotations over time.

    Science.gov (United States)

    Gillis, Jesse; Pavlidis, Paul

    2013-02-15

    The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. Data available at http://chibi.ubc.ca/assessGO.

  9. DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Directory of Open Access Journals (Sweden)

    Wu Cathy H

    2005-08-01

    Full Text Available Abstract Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations. Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms. Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO. DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete

  10. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.

  11. The language of gene ontology: a Zipf’s law analysis

    Directory of Open Access Journals (Sweden)

    Kalankesh Leila

    2012-06-01

    Full Text Available Abstract Background Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf’s law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results Annotations from the Gene Ontology Annotation project were found to follow Zipf’s law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component. On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

  12. Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

    Science.gov (United States)

    Mazandu, Gaston K.; Mulder, Nicola J.

    2013-01-01

    Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG. PMID:24078912

  13. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  14. An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

    Directory of Open Access Journals (Sweden)

    Jain Shobhit

    2010-11-01

    Full Text Available Abstract Background Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs. They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO. Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity. Results We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS, to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. Conclusions The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F1 score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.

  15. 医学本体临床路径知识库建设方法学研究%Construction Methodology of Medical Ontology Clinical Pathway Knowledge Base

    Institute of Scientific and Technical Information of China (English)

    郑西川; 谭申生; 于广军

    2012-01-01

    Objective To abstract practice-oriented knowledge from a cohort of real clinical pathways and to represent this knowledge as a clinical pathway ontology. Methods Engineering methodology was adopted to construct clinical pathway knowledge base, which included(l)knowledge source identification and classification of clinical pathways according to variations in setting, stage of care, patient type, outcome and specialty;(2)iterative knowledge abstraction using grounded theory; (3)ontology engineering as adapted from the Model -based Incremental Knowledge Engineering approach; and(4)ontology evaluation through encoding a sample of real clinical pathways. Results The clinical pathway ontology knowledge base was developed, which included 58 class diseases, and was applied in real clinical setting. Conclusion Clinical pathway ontology knowledge base has significant function in the electronic application of clinical pathway, which will be a reference fordeveloping a new generation of electronic medical records.%目的:从一组实际运行的临床路径应用中提取面向实践的临床知识,实现临床路径本体知识库的规范表达.方法:采取工程化方法建立临床路径知识库,包括临床路径知识源标识与分类、知识抽象方法、本体工程方法和临床路径本体评价.结果:建立了58个病种临床路径本体知识库,并在实际应用中进行了验证.结论:临床路径本体知识库建设对临床路径电子化深度应用有重要意义,对新一代电子病历研发有参考借鉴作用.

  16. Autism: Many Genes, Common Pathways?

    OpenAIRE

    Geschwind, Daniel H.

    2008-01-01

    Autism is a heterogeneous neurodevelopmental syndrome with a complex genetic etiology. It is still not clear whether autism comprises a vast collection of different disorders akin to intellectual disability or a few disorders sharing common aberrant pathways. Unifying principles among cases of autism are likely to be at the level of brain circuitry in addition to molecular pathways.

  17. Autism: many genes, common pathways?

    Science.gov (United States)

    Geschwind, Daniel H

    2008-10-31

    Autism is a heterogeneous neurodevelopmental syndrome with a complex genetic etiology. It is still not clear whether autism comprises a vast collection of different disorders akin to intellectual disability or a few disorders sharing common aberrant pathways. Unifying principles among cases of autism are likely to be at the level of brain circuitry in addition to molecular pathways.

  18. Grouping miRNAs of similar functions via weighted information content of gene ontology.

    Science.gov (United States)

    Lan, Chaowang; Chen, Qingfeng; Li, Jinyan

    2016-12-22

    Regulation mechanisms between miRNAs and genes are complicated. To accomplish a biological function, a miRNA may regulate multiple target genes, and similarly a target gene may be regulated by multiple miRNAs. Wet-lab knowledge of co-regulating miRNAs is limited. This work introduces a computational method to group miRNAs of similar functions to identify co-regulating miRNAsfrom a similarity matrix of miRNAs. We define a novel information content of gene ontology (GO) to measure similarity between two sets of GO graphs corresponding to the two sets of target genes of two miRNAs. This between-graph similarity is then transferred as a functional similarity between the two miRNAs. Our definition of the information content is based on the size of a GO term's descendants, but adjusted by a weight derived from its depth level and the GO relationships at its path to the root node or to the most informative common ancestor (MICA). Further, a self-tuning technique and the eigenvalues of the normalized Laplacian matrix are applied to determine the optimal parameters for the spectral clustering of the similarity matrix of the miRNAs. Experimental results demonstrate that our method has better clustering performance than the existing edge-based, node-based or hybrid methods. Our method has also demonstrated a novel usefulness for the function annotation of new miRNAs, as reported in the detailed case studies.

  19. GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology

    Directory of Open Access Journals (Sweden)

    Yang Da

    2007-01-01

    Full Text Available Abstract Background Rapid progress in high-throughput biotechnologies (e.g. microarrays and exponential accumulation of gene functional knowledge make it promising for systematic understanding of complex human diseases at functional modules level. Based on Gene Ontology, a large number of automatic tools have been developed for the functional analysis and biological interpretation of the high-throughput microarray data. Results Different from the existing tools such as Onto-Express and FatiGO, we develop a tool named GO-2D for identifying 2-dimensional functional modules based on combined GO categories. For example, it refines biological process categories by sorting their genes into different cellular component categories, and then extracts those combined categories enriched with the interesting genes (e.g., the differentially expressed genes for identifying the cellular-localized functional modules. Applications of GO-2D to the analyses of two human cancer datasets show that very specific disease-relevant processes can be identified by using cellular location information. Conclusion For studying complex human diseases, GO-2D can extract functionally compact and detailed modules such as the cellular-localized ones, characterizing disease-relevant modules in terms of both biological processes and cellular locations. The application results clearly demonstrate that 2-dimensional approach complementary to current 1-dimensional approach is powerful for finding modules highly relevant to diseases.

  20. SoFoCles: feature filtering for microarray classification based on gene ontology.

    Science.gov (United States)

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  1. Large-scale Gene Ontology analysis of plant transcriptome-derived sequences retrieved by AFLP technology

    Directory of Open Access Journals (Sweden)

    Ramina Angelo

    2008-07-01

    Full Text Available Abstract Background After 10-year-use of AFLP (Amplified Fragment Length Polymorphism technology for DNA fingerprinting and mRNA profiling, large repertories of genome- and transcriptome-derived sequences are available in public databases for model, crop and tree species. AFLP marker systems have been and are being extensively exploited for genome scanning and gene mapping, as well as cDNA-AFLP for transcriptome profiling and differentially expressed gene cloning. The evaluation, annotation and classification of genomic markers and expressed transcripts would be of great utility for both functional genomics and systems biology research in plants. This may be achieved by means of the Gene Ontology (GO, consisting in three structured vocabularies (i.e. ontologies describing genes, transcripts and proteins of any organism in terms of their associated cellular component, biological process and molecular function in a species-independent manner. In this paper, the functional annotation of about 8,000 AFLP-derived ESTs retrieved in the NCBI databases was carried out by using GO terminology. Results Descriptive statistics on the type, size and nature of gene sequences obtained by means of AFLP technology were calculated. The gene products associated with mRNA transcripts were then classified according to the three main GO vocabularies. A comparison of the functional content of cDNA-AFLP records was also performed by splitting the sequence dataset into monocots and dicots and by comparing them to all annotated ESTs of Arabidopsis and rice, respectively. On the whole, the statistical parameters adopted for the in silico AFLP-derived transcriptome-anchored sequence analysis proved to be critical for obtaining reliable GO results. Such an exhaustive annotation may offer a suitable platform for functional genomics, particularly useful in non-model species. Conclusion Reliable GO annotations of AFLP-derived sequences can be gathered through the optimization

  2. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

    Science.gov (United States)

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin

    2016-08-31

    The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .

  3. Age distribution patterns of human gene families: divergent for Gene Ontology categories and concordant between different subcellular localizations.

    Science.gov (United States)

    Liu, Gangbiao; Zou, Yangyun; Cheng, Qiqun; Zeng, Yanwu; Gu, Xun; Su, Zhixi

    2014-04-01

    The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional

  4. Gene dosage, expression, and ontology analysis identifies driver genes in the carcinogenesis and chemoradioresistance of cervical cancer.

    Science.gov (United States)

    Lando, Malin; Holden, Marit; Bergersen, Linn C; Svendsrud, Debbie H; Stokke, Trond; Sundfør, Kolbein; Glad, Ingrid K; Kristensen, Gunnar B; Lyng, Heidi

    2009-11-01

    Integrative analysis of gene dosage, expression, and ontology (GO) data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q) associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1) and 13q (FAM48A, MED4) correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers.

  5. Genes and (common pathways underlying drug addiction.

    Directory of Open Access Journals (Sweden)

    Chuan-Yun Li

    2008-01-01

    Full Text Available Drug addiction is a serious worldwide problem with strong genetic and environmental influences. Different technologies have revealed a variety of genes and pathways underlying addiction; however, each individual technology can be biased and incomplete. We integrated 2,343 items of evidence from peer-reviewed publications between 1976 and 2006 linking genes and chromosome regions to addiction by single-gene strategies, microrray, proteomics, or genetic studies. We identified 1,500 human addiction-related genes and developed KARG (http://karg.cbi.pku.edu.cn, the first molecular database for addiction-related genes with extensive annotations and a friendly Web interface. We then performed a meta-analysis of 396 genes that were supported by two or more independent items of evidence to identify 18 molecular pathways that were statistically significantly enriched, covering both upstream signaling events and downstream effects. Five molecular pathways significantly enriched for all four different types of addictive drugs were identified as common pathways which may underlie shared rewarding and addictive actions, including two new ones, GnRH signaling pathway and gap junction. We connected the common pathways into a hypothetical common molecular network for addiction. We observed that fast and slow positive feedback loops were interlinked through CAMKII, which may provide clues to explain some of the irreversible features of addiction.

  6. Handling multiple testing while interpreting microarrays with the Gene Ontology Database

    Directory of Open Access Journals (Sweden)

    Zhao Hongyu

    2004-09-01

    Full Text Available Abstract Background The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally. Results We present the results from a preliminary study of the distribution one would expect for analyzing sets of genes extracted from Drosophila, S. cerevisiae, Wormbase, and Gramene databases using the Gene Ontology Database. Conclusions We found that the estimated distribution is not regular and is not predictable outside of a particular set of genes. Permutation-based simulations may be necessary to determine the confidence in results of such analyses.

  7. Characterization of differentially expressed genes involved in pathways associated with gastric cancer.

    Directory of Open Access Journals (Sweden)

    Hao Li

    Full Text Available To explore the patterns of gene expression in gastric cancer, a total of 26 paired gastric cancer and noncancerous tissues from patients were enrolled for gene expression microarray analyses. Limma methods were applied to analyze the data, and genes were considered to be significantly differentially expressed if the False Discovery Rate (FDR value was 2. Subsequently, Gene Ontology (GO categories were used to analyze the main functions of the differentially expressed genes. According to the Kyoto Encyclopedia of Genes and Genomes (KEGG database, we found pathways significantly associated with the differential genes. Gene-Act network and co-expression network were built respectively based on the relationships among the genes, proteins and compounds in the database. 2371 mRNAs and 350 lncRNAs considered as significantly differentially expressed genes were selected for the further analysis. The GO categories, pathway analyses and the Gene-Act network showed a consistent result that up-regulated genes were responsible for tumorigenesis, migration, angiogenesis and microenvironment formation, while down-regulated genes were involved in metabolism. These results of this study provide some novel findings on coding RNAs, lncRNAs, pathways and the co-expression network in gastric cancer which will be useful to guide further investigation and target therapy for this disease.

  8. Genes and (Common) Pathways Underlying Drug Addiction

    OpenAIRE

    Chuan-Yun Li; Xizeng Mao; Liping Wei

    2008-01-01

    Drug addiction is a serious worldwide problem with strong genetic and environmental influences. Different technologies have revealed a variety of genes and pathways underlying addiction; however, each individual technology can be biased and incomplete. We integrated 2,343 items of evidence from peer-reviewed publications between 1976 and 2006 linking genes and chromosome regions to addiction by single-gene strategies, microrray, proteomics, or genetic studies. We identified 1,500 human addict...

  9. Identification of key pathways and genes influencing prognosis in bladder urothelial carcinoma

    Directory of Open Access Journals (Sweden)

    Ning X

    2017-03-01

    Full Text Available Xin Ning, Yaoliang Deng Department of Urology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi Province, People’s Republic of China Background: Genomic profiling can be used to identify the predictive effect of genomic subsets for determining prognosis in bladder urothelial carcinoma (BUC after radical cystectomy. This study aimed to investigate potential gene and pathway markers associated with prognosis in BUC.Methods: A microarray dataset of BUC was obtained from The Cancer Genome Atlas database. Differentially expressed genes (DEGs were identified by DESeq of the R platform. Kaplan–Meier analysis was applied for prognostic markers. Key pathways and genes were identified using bioinformatics tools, such as gene set enrichment analysis, gene ontology, the Kyoto Encyclopedia of Genes and Genomes, gene multiple association network integration algorithm (GeneMANIA, Search Tool for the Retrieval of Interacting Genes/Proteins, and Molecular Complex Detection.Results: A comparative gene set enrichment analysis of tumor and adjacent normal tissues suggested BUC tumorigenesis resulted mainly from enrichment of cell cycle and DNA damage and repair-related biological processes and pathways, including TP53 and mitotic recombination. Two hundred and fifty-six genes were identified as potential prognosis-related DEGs. Gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that the potential prognosis-related DEGs were enriched in angiogenesis, including the cyclic adenosine monophosphate biosynthetic process, cyclic guanosine monophosphate-protein kinase G, mitogen-activated protein kinase, Rap1, and phosphoinositide-3-kinase-AKT signaling pathway. Nine hub genes, TAGLN, ACTA2, MYH11, CALD1, MYLK, GEM, PRELP, TPM2, and OGN, were identified from the intersection of protein–protein interaction and GeneMANIA networks. Module analysis of protein–protein interaction and GeneMANIA networks mainly showed

  10. Non-lexical approaches to identifying associative relations in the gene ontology.

    Science.gov (United States)

    Bodenreider, Olivier; Aubry, Marc; Burgun, Anita

    2005-01-01

    The Gene Ontology (GO) is a controlled vocabulary widely used for the annotation of gene products. GO is organized in three hierarchies for molecular functions, cellular components, and biological processes but no relations are provided among terms across hierarchies. The objective of this study is to investigate three non-lexical approaches to identifying such associative relations in GO and compare them among themselves and to lexical approaches. The three approaches are: computing similarity in a vector space model, statistical analysis of co-occurrence of GO terms in annotation databases, and association rule mining. Five annotation databases (FlyBase, the Human subset of GOA, MGI, SGD, and WormBase) are used in this study. A total of 7,665 associations were identified by at least one of the three non-lexical approaches. Of these, 12% were identified by more than one approach. While there are almost 6,000 lexical relations among GO terms, only 203 associations were identified by both non-lexical and lexical approaches. The associations identified in this study could serve as the starting point for adding associative relations across hierarchies to GO, but would require manual curation. The application to quality assurance of annotation databases is also discussed.

  11. How to learn about gene function: text-mining or ontologies?

    Science.gov (United States)

    Soldatos, Theodoros G; Perdigão, Nelson; Brown, Nigel P; Sabir, Kenneth S; O'Donoghue, Seán I

    2015-03-01

    As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic

  12. The Proteasix Ontology.

    Science.gov (United States)

    Arguello Casteleiro, Mercedes; Klein, Julie; Stevens, Robert

    2016-06-04

    The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.

  13. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

    Science.gov (United States)

    Zhang, Shu-Bo; Lai, Jian-Huang

    2016-07-15

    Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Gene Ontology based housekeeping gene selection for RNA-seq normalization.

    Science.gov (United States)

    Chen, Chien-Ming; Lu, Yu-Lun; Sio, Chi-Pong; Wu, Guan-Chung; Tzou, Wen-Shyong; Pai, Tun-Wen

    2014-06-01

    RNA-seq analysis provides a powerful tool for revealing relationships between gene expression level and biological function of proteins. In order to identify differentially expressed genes among various RNA-seq datasets obtained from different experimental designs, an appropriate normalization method for calibrating multiple experimental datasets is the first challenging problem. We propose a novel method to facilitate biologists in selecting a set of suitable housekeeping genes for inter-sample normalization. The approach is achieved by adopting user defined experimentally related keywords, GO annotations, GO term distance matrices, orthologous housekeeping gene candidates, and stability ranking of housekeeping genes. By identifying the most distanced GO terms from query keywords and selecting housekeeping gene candidates with low coefficients of variation among different spatio-temporal datasets, the proposed method can automatically enumerate a set of functionally irrelevant housekeeping genes for pratical normalization. Novel and benchmark testing RNA-seq datasets were applied to demostrate that different selections of housekeeping gene lead to strong impact on differential gene expression analysis, and compared results have shown that our proposed method outperformed other traditional approaches in terms of both sensitivity and specificity. The proposed mechanism of selecting appropriate houskeeping genes for inter-dataset normalization is robust and accurate for differential expression analyses. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    OpenAIRE

    Zhen Li; Bi-Qing Li; Min Jiang; Lei Chen; Jian Zhang; Lin Liu; Tao Huang

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance...

  16. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    Science.gov (United States)

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  17. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  18. PPISEARCHENGINE: gene ontology-based search for protein-protein interactions.

    Science.gov (United States)

    Park, Byungkyu; Cui, Guangyu; Lee, Hyunjin; Huang, De-Shuang; Han, Kyungsook

    2013-01-01

    This paper presents a new search engine called PPISearchEngine which finds protein-protein interactions (PPIs) using the gene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a prime number and the relation between the terms is represented by the product of prime numbers. This representation is hidden from users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number that represents the query protein. For a query protein, PPISearchEngine considers not only the GO term associated with the query protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions of the query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching search method cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To the best of our knowledge, this search engine is the first method that can process queries like 'for protein p with GO [Formula: see text], find p's interaction partners with GO [Formula: see text]'. PPISearchEngine is freely available to academics at http://search.hpid.org/.

  19. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    Science.gov (United States)

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  20. GoMapMan: integration, consolidation and visualization of plant gene annotations within the MapMan ontology.

    Science.gov (United States)

    Ramsak, Živa; Baebler, Špela; Rotter, Ana; Korbar, Matej; Mozetic, Igor; Usadel, Björn; Gruden, Kristina

    2014-01-01

    GoMapMan (http://www.gomapman.org) is an open web-accessible resource for gene functional annotations in the plant sciences. It was developed to facilitate improvement, consolidation and visualization of gene annotations across several plant species. GoMapMan is based on the MapMan ontology, organized in the form of a hierarchical tree of biological concepts, which describe gene functions. Currently, genes of the model species Arabidopsis and three crop species (potato, tomato and rice) are included. The main features of GoMapMan are (i) dynamic and interactive gene product annotation through various curation options; (ii) consolidation of gene annotations for different plant species through the integration of orthologue group information; (iii) traceability of gene ontology changes and annotations; (iv) integration of external knowledge about genes from different public resources; and (v) providing gathered information to high-throughput analysis tools via dynamically generated export files. All of the GoMapMan functionalities are openly available, with the restriction on the curation functions, which require prior registration to ensure traceability of the implemented changes.

  1. Pathways: Strategies for Susceptibility Genes in SLE

    Science.gov (United States)

    Kelley, James M.; Edberg, Jeffrey C.; Kimberly, Robert P.

    2010-01-01

    Systemic lupus erythematosus (SLE) is a complex autoimmune disorder marked by an inappropriate immune response to nuclear antigens. Recent whole genome association and more focused studies have revealed numerous genes implicated in this disease process, including ITGAM, Fc gamma receptors, complement components, C-reactive protein, and others. One common feature of these molecules is their involvement in the immune opsonins pathway and phagocytic clearing of nuclear antigens and apoptotic debris which provide excessive exposure of lupus-related antigens to immune cells. Analysis of gene-gene interactions in the opsonin pathway and its relationship to SLE may provide a systems-based approach to identify additional candidate genes associated with disease able to account for a larger part of lupus susceptibility. PMID:20144911

  2. ToxPlorerTM: A Comprehensive Knowledgebase of Toxicity Pathways Using Ontology-driven Information Extraction

    Science.gov (United States)

    Realizing the potential of pathway-based toxicity testing requires a fresh look at how we describe phenomena leading to adverse effects in vivo, how we assess them in vitro and how we extrapolate them in silico across chemicals, doses and species. We developed the ToxPlorer™ fram...

  3. Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: application to Brugada syndrome.

    Science.gov (United States)

    Massanet, Raimon; Gallardo-Chacon, Joan-Josep; Caminal, Pere; Perera, Alexandre

    2009-01-01

    This work presents a methodology for finding phenotype candidate genes starting from a set of known related genes. This is accomplished by automatically mining and organizing the available scientific literature using Gene Ontology-based semantic similarity. As a case study, Brugada syndrome related genes have been used as input in order to obtain a list of other possible candidate genes related with this disease. Brugada anomaly produces a typical alteration in the Electrocardiogram and carriers of the disease show an increased probability of sudden death. Results show a set of semantically coherent proteins that are shown to be related with synaptic transmission and muscle contraction physiological processes.

  4. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Systematic enrichment analysis of gene expression profiling studies identifies consensus pathways implicated in colorectal cancer development

    Directory of Open Access Journals (Sweden)

    Jesús Lascorz

    2011-01-01

    Full Text Available Background: A large number of gene expression profiling (GEP studies on colorectal carcinogenesis have been performed but no reliable gene signature has been identified so far due to the lack of reproducibility in the reported genes. There is growing evidence that functionally related genes, rather than individual genes, contribute to the etiology of complex traits. We used, as a novel approach, pathway enrichment tools to define functionally related genes that are consistently up- or down-regulated in colorectal carcinogenesis. Materials and Methods: We started the analysis with 242 unique annotated genes that had been reported by any of three recent meta-analyses covering GEP studies on genes differentially expressed in carcinoma vs normal mucosa. Most of these genes (218, 91.9% had been reported in at least three GEP studies. These 242 genes were submitted to bioinformatic analysis using a total of nine tools to detect enrichment of Gene Ontology (GO categories or Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. As a final consistency criterion the pathway categories had to be enriched by several tools to be taken into consideration. Results: Our pathway-based enrichment analysis identified the categories of ribosomal protein constituents, extracellular matrix receptor interaction, carbonic anhydrase isozymes, and a general category related to inflammation and cellular response as significantly and consistently overrepresented entities. Conclusions: We triaged the genes covered by the published GEP literature on colorectal carcinogenesis and subjected them to multiple enrichment tools in order to identify the consistently enriched gene categories. These turned out to have known functional relationships to cancer development and thus deserve further investigation.

  6. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data.

    Science.gov (United States)

    Koç, Ibrahim; Caetano-Anollés, Gustavo

    2017-01-01

    The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.

  7. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.

    Science.gov (United States)

    Khodiyar, Varsha K; Howe, Doug; Talmud, Philippa J; Breckenridge, Ross; Lovering, Ruth C

    2013-01-01

    For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer's vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer's vesicle determine asymmetry in the developing heart, the direction of 'heart jogging' and the direction of 'heart looping'.  'Heart jogging' is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward 'jog'. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development.    We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging.  We found that the human, mouse and zebrafish 'heart jogging orthologs' are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging.    This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.

  8. Performing ontology.

    Science.gov (United States)

    Aspers, Patrik

    2015-06-01

    Ontology, and in particular, the so-called ontological turn, is the topic of a recent themed issue of Social Studies of Science (Volume 43, Issue 3, 2013). Ontology, or metaphysics, is in philosophy concerned with what there is, how it is, and forms of being. But to what is the science and technology studies researcher turning when he or she talks of ontology? It is argued that it is unclear what is gained by arguing that ontology also refers to constructed elements. The 'ontological turn' comes with the risk of creating a pseudo-debate or pseudo-activity, in which energy is used for no end, at the expense of empirical studies. This text rebuts the idea of an ontological turn as foreshadowed in the texts of the themed issue. It argues that there is no fundamental qualitative difference between the ontological turn and what we know as constructivism.

  9. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  10. Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task.

    Science.gov (United States)

    Kim, Jin-Dong; Kim, Jung-Jae; Han, Xu; Rebholz-Schuhmann, Dietrich

    2015-01-01

    The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. To further explore the grand theme, we extended the evaluation from a perspective of KB construction. Also, the Gene Regulation Ontology (GRO) task was newly introduced in the third edition. The final evaluation of the participating systems resulted in relatively low performance. The reason was attributed to the large size and complex semantic representation of the ontology. To investigate potential benefits of resource exchange between the presumably similar tasks, we measured the overlap between the datasets of the two tasks, and tested whether the dataset for one task can be used to enhance performance on the other. We report an extended evaluation on all the participating systems in the GE task, incoporating a KB perspective. For the evaluation, the final submission of each participant was converted to RDF statements, and evaluated using 8 queries that were formulated in SPARQL. The results suggest that the evaluation may be concluded differently between the two different perspectives, annotation vs. KB. We also provide a comparison of the GE and GRO tasks by converting their datasets into each other's format. More than 90% of the GE data could be converted into the GRO task format, while only half of the GRO data could be mapped to the GE task format. The imbalance in conversion indicates that the GRO is a comprehensive extension of the GE task ontology. We further used the converted GRO data as additional training data for the GE task, which helped improve GE task participant system performance. However, the converted GE data did not help GRO task participants, due to overfitting and the ontology gap.

  11. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    Science.gov (United States)

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  12. GOlorize: a Cytoscape plug-in for network visualization with Gene Ontology-based layout and coloring

    OpenAIRE

    Garcia, O.; Saveanu, C.; Cline, M.; Fromont-Racine, M; Jacquier, A; Schwikowski, B.; Aittokallio, T.

    2007-01-01

    International audience; We have implemented a graph layout algorithm that exposes Gene Ontology (GO) class structure on the network nodes. It can be used in conjunction with BiNGO plug-in to Cytoscape, which finds the GO categories over-represented in a given network. Our plug-in, named GOlorize, first highlights the class members with category-specific color-coding and then constructs an enhanced visualization of the network using a class-directed layout algorithm. AVAILABILITY: http://www.c...

  13. GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

    Science.gov (United States)

    Caniza, Horacio; Romero, Alfonso E; Heron, Samuel; Yang, Haixuan; Devoto, Alessandra; Frasca, Marco; Mesiti, Marco; Valentini, Giorgio; Paccanaro, Alberto

    2014-08-01

    We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. alberto@cs.rhul.ac.uk GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines. © The Author 2014. Published by Oxford University Press.

  14. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  15. DMPD: Signalling pathways mediating type I interferon gene expression. [Dynamic Macrophage Pathway CSML Database

    Lifescience Database Archive (English)

    Full Text Available 17904888 Signalling pathways mediating type I interferon gene expression. Edwards M...csml) Show Signalling pathways mediating type I interferon gene expression. PubmedID 17904888 Title Signalli...ng pathways mediating type I interferon gene expression. Authors Edwards MR, Slat

  16. Amelogenesis Imperfecta; Genes, Proteins, and Pathways

    Directory of Open Access Journals (Sweden)

    Claire E. L. Smith

    2017-06-01

    Full Text Available Amelogenesis imperfecta (AI is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX, encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/ containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and

  17. Amelogenesis Imperfecta; Genes, Proteins, and Pathways

    Science.gov (United States)

    Smith, Claire E. L.; Poulter, James A.; Antanaviciute, Agne; Kirkham, Jennifer; Brookes, Steven J.; Inglehearn, Chris F.; Mighell, Alan J.

    2017-01-01

    Amelogenesis imperfecta (AI) is the name given to a heterogeneous group of conditions characterized by inherited developmental enamel defects. AI enamel is abnormally thin, soft, fragile, pitted and/or badly discolored, with poor function and aesthetics, causing patients problems such as early tooth loss, severe embarrassment, eating difficulties, and pain. It was first described separately from diseases of dentine nearly 80 years ago, but the underlying genetic and mechanistic basis of the condition is only now coming to light. Mutations in the gene AMELX, encoding an extracellular matrix protein secreted by ameloblasts during enamel formation, were first identified as a cause of AI in 1991. Since then, mutations in at least eighteen genes have been shown to cause AI presenting in isolation of other health problems, with many more implicated in syndromic AI. Some of the encoded proteins have well documented roles in amelogenesis, acting as enamel matrix proteins or the proteases that degrade them, cell adhesion molecules or regulators of calcium homeostasis. However, for others, function is less clear and further research is needed to understand the pathways and processes essential for the development of healthy enamel. Here, we review the genes and mutations underlying AI presenting in isolation of other health problems, the proteins they encode and knowledge of their roles in amelogenesis, combining evidence from human phenotypes, inheritance patterns, mouse models, and in vitro studies. An LOVD resource (http://dna2.leeds.ac.uk/LOVD/) containing all published gene mutations for AI presenting in isolation of other health problems is described. We use this resource to identify trends in the genes and mutations reported to cause AI in the 270 families for which molecular diagnoses have been reported by 23rd May 2017. Finally we discuss the potential value of the translation of AI genetics to clinical care with improved patient pathways and speculate on the

  18. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  19. Identification of protein features encoded by alternative exons using Exon Ontology.

    Science.gov (United States)

    Tranchevent, Léon-Charles; Aubé, Fabien; Dulaurier, Louis; Benoit-Pilven, Clara; Rey, Amandine; Poret, Arnaud; Chautard, Emilie; Mortada, Hussein; Desmet, François-Olivier; Chakrama, Fatima Zahra; Moreno-Garcia, Maira Alejandra; Goillot, Evelyne; Janczarski, Stéphane; Mortreux, Franck; Bourgeois, Cyril F; Auboeuf, Didier

    2017-06-01

    Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named "Exon Ontology," based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information. © 2017 Tranchevent et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2016-12-01

    Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.

  1. Expression profiling and gene ontology analysis in fathead minnow (Pimephales promelas) liver following exposure to pulp and paper mill effluents

    Energy Technology Data Exchange (ETDEWEB)

    Costigan, Shannon L.; Werner, Julieta; Ouellet, Jacob D.; Hill, Lauren G. [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada); Law, R. David, E-mail: dlaw@lakeheadu.ca [Department of Biology, Lakehead University, 955 Oliver Road, Ontario P7B 5E1, (Canada)

    2012-10-15

    Many studies link pulp and paper mill effluent (PPME) exposure to adverse effects in fish populations present in the mill receiving environments. These impacts are often characteristic of endocrine disruption and may include impaired reproduction, development and survival. While these physiological endpoints are well-characterized, the molecular mechanisms causing them are not yet understood. To investigate changes in gene transcription induced by exposure to a PPME at several stages of treatment, male and female fathead minnows (FHMs) were exposed for 6 days to 25% (v/v) secondary (biologically) treated kraft effluent (TK) or 100% (v/v) combined mill outfall (CMO) from a mill producing both kraft pulp and newsprint. The gene expression changes in the livers of these fish were analyzed using a 22 K oligonucleotide microarray. Exposure to TK or CMO resulted in significant changes in the expression levels of 105 and 238 targets in male FHMs and 296 and 133 targets in females, respectively. Targets were then functionally analyzed using gene ontology tools to identify the biological processes in fish hepatocytes that were affected by exposure to PPME after its secondary treatment. Proteolysis was affected in female FHMs exposed to both TK and CMO. In male FHMs, no processes were affected by TK exposure, while sterol, isoprenoid, steroid and cholesterol biosynthesis and electron transport were up-regulated by CMO exposure. The results presented in this study indicate that short-term exposure to PPMEs affects the expression of reproduction-related genes in the livers of both male and female FHMs, and that secondary treatment of PPMEs may not neutralize all of their metabolic effects in fish. Gene ontology analysis of microarray data may enable identification of biological processes altered by toxicant exposure and thus provide an additional tool for monitoring the impact of PPMEs on fish populations.

  2. Engineering Ontologies

    OpenAIRE

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering systems modelling, simulation and design. A general and formal ontology, called PHYSSYS, for dynamic physical systems is presented and its structuring principles are discussed. We show how the PHYSSYS ...

  3. Using biologically interrelated experiments to identify pathway genes in Arabidopsis

    OpenAIRE

    Kim, Kyungpil; Jiang, Keni; Teng, Siew Leng; Feldman, Lewis J.; Huang, Haiyan

    2012-01-01

    Motivation: Pathway genes are considered as a group of genes that work cooperatively in the same pathway constituting a fundamental functional grouping in a biological process. Identifying pathway genes has been one of the major tasks in understanding biological processes. However, due to the difficulty in characterizing/inferring different types of biological gene relationships, as well as several computational issues arising from dealing with high-dimensional biological data, deducing ge...

  4. Separate enrichment analysis of pathways for up- and downregulated genes.

    Science.gov (United States)

    Hong, Guini; Zhang, Wenjing; Li, Hongdong; Shen, Xiaopei; Guo, Zheng

    2014-03-06

    Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.

  5. Evolutionary rate patterns of the Gibberellin pathway genes

    Directory of Open Access Journals (Sweden)

    Zhang Fu-min

    2009-08-01

    Full Text Available Abstract Background Analysis of molecular evolutionary patterns of different genes within metabolic pathways allows us to determine whether these genes are subject to equivalent evolutionary forces and how natural selection shapes the evolution of proteins in an interacting system. Although previous studies found that upstream genes in the pathway evolved more slowly than downstream genes, the correlation between evolutionary rate and position of the genes in metabolic pathways as well as its implications in molecular evolution are still less understood. Results We sequenced and characterized 7 core structural genes of the gibberellin biosynthetic pathway from 8 representative species of the rice tribe (Oryzeae to address alternative hypotheses regarding evolutionary rates and patterns of metabolic pathway genes. We have detected significant rate heterogeneity among 7 GA pathway genes for both synonymous and nonsynonymous sites. Such rate variation is mostly likely attributed to differences of selection intensity rather than differential mutation pressures on the genes. Unlike previous argument that downstream genes in metabolic pathways would evolve more slowly than upstream genes, the downstream genes in the GA pathway did not exhibited the elevated substitution rate and instead, the genes that encode either the enzyme at the branch point (GA20ox or enzymes catalyzing multiple steps (KO, KAO and GA3ox in the pathway had the lowest evolutionary rates due to strong purifying selection. Our branch and codon models failed to detect signature of positive selection for any lineage and codon of the GA pathway genes. Conclusion This study suggests that significant heterogeneity of evolutionary rate of the GA pathway genes is mainly ascribed to differential constraint relaxation rather than the positive selection and supports the pathway flux theory that predicts that natural selection primarily targets enzymes that have the greatest control on fluxes.

  6. Comprehensive gene expression atlas for the Arabidopsis MAP kinase signalling pathways.

    Science.gov (United States)

    Menges, Margit; Dóczi, Róbert; Okrész, László; Morandini, Piero; Mizzi, Luca; Soloviev, Mikhail; Murray, James A H; Bögre, László

    2008-01-01

    * Mitogen activated protein kinase (MAPK) pathways are signal transduction modules with layers of protein kinases having c. 120 genes in Arabidopsis, but only a few have been linked experimentally to functions. * We analysed microarray expression data for 114 MAPK signalling genes represented on the ATH1 Affymetrix arrays; determined their expression patterns during development, and in a wide range of time-course microarray experiments for their signal-dependent transcriptional regulation and their coregulation with other signalling components and transcription factors. * Global expression correlation of the MAPK genes with each of the represented 21 692 Arabidopsis genes was determined by calculating Pearson correlation coefficients. To group MAPK signalling genes based on similarities in global regulation, we performed hierarchical clustering on the pairwise correlation values. This should allow inferring functional information from well-studied MAPK components to functionally uncharacterized ones. Statistical overrepresentation of specific gene ontology (GO) categories in the gene lists showing high expression correlation values with each of the MAPK components predicted biological themes for the gene functions. * The combination of these methods provides functional information for many uncharacterized MAPK genes, and a framework for complementary future experimental dissection of the function of this complex family.

  7. Engineering Ontologies

    NARCIS (Netherlands)

    Borst, Pim; Akkermans, Hans; Top, Jan

    1997-01-01

    We analyse the construction as well as the role of ontologies in knowledge sharing and reuse for complex industrial applications. In this article, the practical use of ontologies in large-scale applications not restricted to knowledge-based systems is demonstrated, for the domain of engineering syst

  8. Ontology searching and browsing at the Rat Genome Database

    Science.gov (United States)

    Laulederkind, Stanley J. F.; Tutaj, Marek; Shimoyama, Mary; Hayman, G. Thomas; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R.; Jacob, Howard J.

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a ‘driller’ type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. Database URL: http://rgd.mcw.edu/rgdweb/ontology/search.html PMID:22434847

  9. Gene expression profiling identifies molecular pathways associated with collagen VI deficiency and provides novel therapeutic targets.

    Directory of Open Access Journals (Sweden)

    Sonia Paco

    Full Text Available Ullrich congenital muscular dystrophy (UCMD, caused by collagen VI deficiency, is a common congenital muscular dystrophy. At present, the role of collagen VI in muscle and the mechanism of disease are not fully understood. To address this we have applied microarrays to analyse the transcriptome of UCMD muscle and compare it to healthy muscle and other muscular dystrophies. We identified 389 genes which are differentially regulated in UCMD relative to controls. In addition, there were 718 genes differentially expressed between UCMD and dystrophin deficient muscle. In contrast, only 29 genes were altered relative to other congenital muscular dystrophies. Changes in gene expression were confirmed by real-time PCR. The set of regulated genes was analysed by Gene Ontology, KEGG pathways and Ingenuity Pathway analysis to reveal the molecular functions and gene networks associated with collagen VI defects. The most significantly regulated pathways were those involved in muscle regeneration, extracellular matrix remodelling and inflammation. We characterised the immune response in UCMD biopsies as being mainly mediated via M2 macrophages and the complement pathway indicating that anti-inflammatory treatment may be beneficial to UCMD as for other dystrophies. We studied the immunolocalisation of ECM components and found that biglycan, a collagen VI interacting proteoglycan, was reduced in the basal lamina of UCMD patients. We propose that biglycan reduction is secondary to collagen VI loss and that it may be contributing towards UCMD pathophysiology. Consequently, strategies aimed at over-expressing biglycan and restore the link between the muscle cell surface and the extracellular matrix should be considered.

  10. Gene Expression Profiling Identifies Molecular Pathways Associated with Collagen VI Deficiency and Provides Novel Therapeutic Targets

    Science.gov (United States)

    Paco, Sonia; Kalko, Susana G.; Jou, Cristina; Rodríguez, María A.; Corbera, Joan; Muntoni, Francesco; Feng, Lucy; Rivas, Eloy; Torner, Ferran; Gualandi, Francesca; Gomez-Foix, Anna M.; Ferrer, Anna; Ortez, Carlos; Nascimento, Andrés; Colomer, Jaume; Jimenez-Mallebrera, Cecilia

    2013-01-01

    Ullrich congenital muscular dystrophy (UCMD), caused by collagen VI deficiency, is a common congenital muscular dystrophy. At present, the role of collagen VI in muscle and the mechanism of disease are not fully understood. To address this we have applied microarrays to analyse the transcriptome of UCMD muscle and compare it to healthy muscle and other muscular dystrophies. We identified 389 genes which are differentially regulated in UCMD relative to controls. In addition, there were 718 genes differentially expressed between UCMD and dystrophin deficient muscle. In contrast, only 29 genes were altered relative to other congenital muscular dystrophies. Changes in gene expression were confirmed by real-time PCR. The set of regulated genes was analysed by Gene Ontology, KEGG pathways and Ingenuity Pathway analysis to reveal the molecular functions and gene networks associated with collagen VI defects. The most significantly regulated pathways were those involved in muscle regeneration, extracellular matrix remodelling and inflammation. We characterised the immune response in UCMD biopsies as being mainly mediated via M2 macrophages and the complement pathway indicating that anti-inflammatory treatment may be beneficial to UCMD as for other dystrophies. We studied the immunolocalisation of ECM components and found that biglycan, a collagen VI interacting proteoglycan, was reduced in the basal lamina of UCMD patients. We propose that biglycan reduction is secondary to collagen VI loss and that it may be contributing towards UCMD pathophysiology. Consequently, strategies aimed at over-expressing biglycan and restore the link between the muscle cell surface and the extracellular matrix should be considered. PMID:24223098

  11. Polymorphism of starch pathway genes in cassava.

    Science.gov (United States)

    Vasconcelos, L M; Brito, A C; Carmo, C D; Oliveira, E J

    2016-12-02

    The distribution and frequency of single nucleotide polymorphisms (SNPs) can help to understand changes associated with characteristics of interest. We aimed to evaluate nucleotide diversity in six genes involved in starch biosynthesis in cassava using a panel of 96 unrelated accessions. The genes were sequenced, aligned, and used to obtain values for nucleotide diversity (π), segregating sites (θ), Tajima's D test, and neighbor-joining (NJ) clustering. On average, one SNP per 147 and 171 bp was identified in exon and intron regions, respectively. Thirteen heterozygous loci were found. Three of seven SNPs in the exon region resulted in non-synonymous replacement or four synonymous substitutions. However, no associations were noted between SNPs and root dry-matter content. The parameter π ranged from 0.0001 (granule bound starch synthase I) to 0.0033 (α-amylase), averaging 0.0011, while θ ranged from 0.00014 (starch branching enzyme) to 0.00584 (starch synthase I), averaging 0.002353. The θ diversity value was typically double that of the π. Results of the D test did not suggest any evidence of deviance of neutrality in these genes. Among the evaluated accession, 82/96 were clustered using the NJ method but without a clear separation of the root dry-matter content, root pulp coloration, and classification of the cyanogenic compound content. High variation in genes of the starch biosynthetic pathway can be used to identify associations with the functional properties of starch for the use of polymorphisms for selection purposes.

  12. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Science.gov (United States)

    Chen, Xiaoshu; Zhang, Jianzhi

    2012-01-01

    The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and

  13. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  14. The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaoshu Chen

    Full Text Available The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species paralogs than (between-species orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny

  15. Integrated analysis of differentially expressed genes and pathways in triple-negative breast cancer

    Science.gov (United States)

    Peng, Cancan; Ma, Wenli; Xia, Wei; Zheng, Wenling

    2017-01-01

    Triple-negative breast cancer (TNBC) is a heterogeneous disease characterized by an aggressive phenotype and reduced survival. The aim of the present study was to investigate the molecular mechanisms involved in the carcinogenesis of TNBC and to identify novel target molecules for therapy. The differentially expressed genes (DEGs) in TNBC and normal adjacent tissue were assessed by analyzing the GSE41970 microarray data using Qlucore Omics Explorer, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes. Pathway enrichment analyses for DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery online resource. A protein-protein interaction (PPI) network was constructed using Search Tool for the Retrieval of Interacting Genes, and subnetworks were analyzed by ClusterONE. The PPI network and subnetworks were visualized using Cytoscape software. A total of 121 DEGs were obtained, of which 101 were upregulated and 20 were downregulated. The upregulated DEGs were significantly enriched in 14 pathways and 83 GO biological processes, while the downregulated DEGs were significantly enriched in 18 GO biological processes. The PPI network with 118 nodes and 1,264 edges was constructed and three subnetworks were extracted from the entire network. The significant hub DEGs with high degrees were identified, including TP53, glyceraldehyde-3-phosphate dehydrogenase, cyclin D1, HRAS and proliferating cell nuclear antigen, which were predominantly enriched in the cell cycle pathway and pathways in cancer. A number of critical genes and pathways were revealed to be associated with TNBC. The present study may provide an improved understanding of the pathogenesis of TNBC and contribute to the development of therapeutic targets for TNBC. PMID:28075450

  16. DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

    Science.gov (United States)

    Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...

  17. Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer.

    Directory of Open Access Journals (Sweden)

    Jennifer S Myers

    Full Text Available Genomic technologies including microarrays and next-generation sequencing have enabled the generation of molecular signatures of prostate cancer. Lists of differentially expressed genes between malignant and non-malignant states are thought to be fertile sources of putative prostate cancer biomarkers. However such lists of differentially expressed genes can be highly variable for multiple reasons. As such, looking at differential expression in the context of gene sets and pathways has been more robust. Using next-generation genome sequencing data from The Cancer Genome Atlas, differential gene expression between age- and stage- matched human prostate tumors and non-malignant samples was assessed and used to craft a pathway signature of prostate cancer. Up- and down-regulated genes were assigned to pathways composed of curated groups of related genes from multiple databases. The significance of these pathways was then evaluated according to the number of differentially expressed genes found in the pathway and their position within the pathway using Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis. The "transforming growth factor-beta signaling" and "Ran regulation of mitotic spindle formation" pathways were strongly associated with prostate cancer. Several other significant pathways confirm reported findings from microarray data that suggest actin cytoskeleton regulation, cell cycle, mitogen-activated protein kinase signaling, and calcium signaling are also altered in prostate cancer. Thus we have demonstrated feasibility of pathway analysis and identified an underexplored area (Ran for investigation in prostate cancer pathogenesis.

  18. Efficient Management of Biomedical Ontology Versions

    Science.gov (United States)

    Kirsten, Toralf; Hartung, Michael; Groß, Anika; Rahm, Erhard

    Ontologies have become very popular in life sciences and other domains. They mostly undergo continuous changes and new ontology versions are frequently released. However, current analysis studies do not consider the ontology changes reflected in different versions but typically limit themselves to a specific ontology version which may quickly become obsolete. To allow applications easy access to different ontology versions we propose a central and uniform management of the versions of different biomedical ontologies. The proposed database approach takes concept and structural changes of succeeding ontology versions into account thereby supporting different kinds of change analysis. Furthermore, it is very space-efficient by avoiding redundant storage of ontology components which remain unchanged in different versions. We evaluate the storage requirements and query performance of the proposed approach for the Gene Ontology.

  19. FYPO: the fission yeast phenotype ontology.

    Science.gov (United States)

    Harris, Midori A; Lock, Antonia; Bähler, Jürg; Oliver, Stephen G; Wood, Valerie

    2013-07-01

    To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).

  20. CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

    Science.gov (United States)

    Park, Julie; Costanzo, Maria C; Balakrishnan, Rama; Cherry, J Michael; Hong, Eurie L

    2012-01-01

    The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

  1. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Directory of Open Access Journals (Sweden)

    Shibiao Wan

    Full Text Available Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  2. HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/. PMID:24647341

  3. Comparative metabolic pathway analysis with special reference to nucleotide metabolism-related genes in chicken primordial germ cells.

    Science.gov (United States)

    Rengaraj, Deivendran; Lee, Bo Ram; Jang, Hyun-Jun; Kim, Young Min; Han, Jae Yong

    2013-01-01

    Metabolism provides energy and nutrients required for the cellular growth, maintenance, and reproduction. When compared with genomics and proteomics, metabolism studies provide novel findings in terms of cellular functions. In this study, we examined significant and differentially expressed genes in primordial germ cells (PGCs), gonadal stromal cells, and chicken embryonic fibroblasts compared with blastoderms using microarray. All upregulated genes (1001, 1118, and 974, respectively) and downregulated genes (504, 627, and 1317, respectively) in three test samples were categorized into functional groups according to gene ontology. Then all selected genes were tested to examine their involvement in metabolic pathways through Kyoto Encyclopedia of Genes and Genomes pathway database using overrepresentation analysis. In our results, most of the upregulated and downregulated genes were involved in at least one subcategory of seven major metabolic pathways. The main objective of this study is to compare the PGC expressed genes and their metabolic pathways with blastoderms, gonadal stromal cells, and chicken embryonic fibroblasts. Among the genes involved in metabolic pathways, a higher number of PGC upregulated genes were identified in retinol metabolism, and a higher number of PGC downregulated genes were identified in sphingolipid metabolism. In terms of the fold change, acyl-CoA synthetase medium-chain family member 3 (ACSM3), which is involved in butanoate metabolism, and N-acetyltransferase, pineal gland isozyme NAT-10 (PNAT10), which is involved in energy metabolism, showed higher expression in PGCs. To validate these gene changes, the expression of 12 nucleotide metabolism-related genes in chicken PGCs was examined by real-time polymerase chain reaction. The results of this study provide new information on the expression of genes associated with metabolism function of PGCs and will facilitate more basic research on animal PGC differentiation and function

  4. Ontology Research

    OpenAIRE

    Welty, Christopher

    2003-01-01

    In this issue, I have collected a fairly broad, although by no means exhaustive, sampling of work in the field of ontology research. To define a field is often quite difficult; it is more a collection of people and ideas than it is a specific technology. To represent our field, I present six articles that cover several of the major thrusts of ontology research from the past decade.

  5. Gene expression profiling provides insights into pathways of oxaliplatin-related sinusoidal obstruction syndrome in humans.

    Science.gov (United States)

    Rubbia-Brandt, Laura; Tauzin, Sébastien; Brezault, Catherine; Delucinge-Vivier, Céline; Descombes, Patrick; Dousset, Bertand; Majno, Pietro E; Mentha, Gilles; Terris, Benoit

    2011-04-01

    Sinusoidal obstruction syndrome (SOS; formerly veno-occlusive disease) is a well-established complication of hematopoietic stem cell transplantation, pyrrolizidine alkaloid intoxication, and widely used chemotherapeutic agents such as oxaliplatin. It is associated with substantial morbidity and mortality. Pathogenesis of SOS in humans is poorly understood. To explore its molecular mechanisms, we used Affymetrix U133 Plus 2.0 microarrays to investigate the gene expression profile of 11 human livers with oxaliplatin-related SOS and compared it to 12 matched controls. Hierarchical clustering analysis showed that profiles from SOS and controls formed distinct clusters. To identify functional networks and gene ontologies, data were analyzed by the Ingenuity Pathway Analysis Tool. A total of 913 genes were differentially expressed in SOS: 613 being upregulated and 300 downregulated. Reverse transcriptase-PCR results showed excellent concordance with microarray data. Pathway analysis showed major gene upregulation in six pathways in SOS compared with controls: acute phase response (notably interleukin 6), coagulation system (Serpine1, THBD, and VWF), hepatic fibrosis/hepatic stellate cell activation (COL3a1, COL3a2, PDGF-A, TIMP1, and MMP2), and oxidative stress. Angiogenic factors (VEGF-C) and hypoxic factors (HIF1A) were upregulated. The most significant increase was seen in CCL20 mRNA. In conclusion, oxaliplatin-related SOS can be readily distinguished according to morphologic characteristics but also by a molecular signature. Global gene analysis provides new insights into mechanisms underlying chemotherapy-related hepatotoxicity in humans and potential targets relating to its diagnosis, prevention, and treatment. Activation of VEGF and coagulation (vWF) pathways could partially explain at a molecular level the clinical observations that bevacizumab and aspirin have a preventive effect in SOS.

  6. Gene Expression Profile Reveals Abnormalities of Multiple Signaling Pathways in Mesenchymal Stem Cell Derived from Patients with Systemic Lupus Erythematosus

    Directory of Open Access Journals (Sweden)

    Yu Tang

    2012-01-01

    Full Text Available We aimed to compare bone-marrow-derived mesenchymal stem cells (BMMSCs between systemic lupus erythematosus (SLE and normal controls by means of cDNA microarray, immunohistochemistry, immunofluorescence, and immunoblotting. Our results showed there were a total of 1, 905 genes which were differentially expressed by BMMSCs derived from SLE patients, of which, 652 genes were upregulated and 1, 253 were downregulated. Gene ontology (GO analysis showed that the majority of these genes were related to cell cycle and protein binding. Pathway analysis exhibited that differentially regulated signal pathways involved actin cytoskeleton, focal adhesion, tight junction, and TGF-β pathway. The high protein level of BMP-5 and low expression of Id-1 indicated that there might be dysregulation in BMP/TGF-β signaling pathway. The expression of Id-1 in SLE BMMSCs was reversely correlated with serum TNF-α levels. The protein level of cyclin E decreased in the cell cycling regulation pathway. Moreover, the MAPK signaling pathway was activated in BMMSCs from SLE patients via phosphorylation of ERK1/2 and SAPK/JNK. The actin distribution pattern of BMMSCs from SLE patients was also found disordered. Our results suggested that there were distinguished differences of BMMSCs between SLE patients and normal controls.

  7. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  8. Generating Application Ontologies from Reference Ontologies

    OpenAIRE

    Shaw, Marianne; Detwiler, Landon T.; Brinkley, James F.; Suciu, Dan

    2008-01-01

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies.

  9. Transcriptome and Gene Ontology (GO) Enrichment Analysis Reveals Genes Involved in Biotin Metabolism That Affect L-Lysine Production in Corynebacterium glutamicum.

    Science.gov (United States)

    Kim, Hong-Il; Kim, Jong-Hyeon; Park, Young-Jin

    2016-03-09

    Corynebacterium glutamicum is widely used for amino acid production. In the present study, 543 genes showed a significant change in their mRNA expression levels in L-lysine-producing C. glutamicum ATCC21300 than that in the wild-type C. glutamicum ATCC13032. Among these 543 differentially expressed genes (DEGs), 28 genes were up- or downregulated. In addition, 454 DEGs were functionally enriched and categorized based on BLAST sequence homologies and gene ontology (GO) annotations using the Blast2GO software. Interestingly, NCgl0071 (bioB, encoding biotin synthase) was expressed at levels ~20-fold higher in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain. Five other genes involved in biotin metabolism or transport--NCgl2515 (bioA, encoding adenosylmethionine-8-amino-7-oxononanoate aminotransferase), NCgl2516 (bioD, encoding dithiobiotin synthetase), NCgl1883, NCgl1884, and NCgl1885--were also expressed at significantly higher levels in the L-lysine-producing ATCC21300 strain than that in the wild-type ATCC13032 strain, which we determined using both next-generation RNA sequencing and quantitative real-time PCR analysis. When we disrupted the bioB gene in C. glutamicum ATCC21300, L-lysine production decreased by approximately 76%, and the three genes involved in biotin transport (NCgl1883, NCgl1884, and NCgl1885) were significantly downregulated. These results will be helpful to improve our understanding of C. glutamicum for industrial amino acid production.

  10. Exploring developmental gene toolkit and associated pathways in a potential new model crustacean using transcriptomic analysis.

    Science.gov (United States)

    Jaramillo, Michael L; Guzman, Frank; Paese, Christian L B; Margis, Rogerio; Nazari, Evelise M; Ammar, Dib; Müller, Yara Maria Rauh

    2016-09-01

    The crustaceans are one of the largest, most diverse, and most successful groups of invertebrates. The diversity among the crustaceans is also reflected in embryonic development models. However, the molecular genetics that regulates embryonic development is not known in those crustaceans that have a short germ-band development with superficial cleavage, such as Macrobrachium olfersi. This species is a freshwater decapod and has great potential to become a model for developmental biology, as well as for evolutionary and environmental studies. To obtain sequence data of M. olfersi from an embryonic developmental perspective, we performed de novo assembly and annotation of the embryonic transcriptome. Using a pooling strategy of total RNA, paired-end Illumina sequencing, and assembly with multiple k-mers, a total of 25,636,097 pair reads were generated. In total, 99,751 unigenes were identified, and 20,893 of these returned a Blastx hit. KEGG pathway analysis mapped a total of 6866 unigenes related to 129 metabolic pathways. In general, 21,845 unigenes were assigned to gene ontology (GO) categories: molecular function (19,604), cellular components (10,254), and biological processes (13,841). Of these, 2142 unigenes were assigned to the developmental process category. More specifically, a total of 35 homologs of embryonic development toolkit genes were identified, which included maternal effect (one gene), gap (six), pair-rule (six), segment polarity (seven), Hox (four), Wnt (eight), and dorsoventral patterning genes (three). In addition, genes of developmental pathways were found, including TGF-β, Wnt, Notch, MAPK, Hedgehog, Jak-STAT, VEGF, and ecdysteroid-inducible nuclear receptors. RT-PCR analysis of eight genes related to embryonic development from gastrulation to late morphogenesis/organogenesis confirmed the applicability of the transcriptome analysis.

  11. OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data.

    Science.gov (United States)

    Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan

    2016-01-01

    As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.

  12. Building Ontologies in DAML + OIL

    Science.gov (United States)

    Wroe, Chris; Bechhofer, Sean; Lord, Phillip; Rector, Alan; Goble, Carole

    2003-01-01

    In this article we describe an approach to representing and building ontologies advocated by the Bioinformatics and Medical Informatics groups at the University of Manchester. The hand-crafting of ontologies offers an easy and rapid avenue to delivering ontologies. Experience has shown that such approaches are unsustainable. Description logic approaches have been shown to offer computational support for building sound, complete and logically consistent ontologies. A new knowledge representation language, DAML + OIL, offers a new standard that is able to support many styles of ontology, from hand-crafted to full logic-based descriptions with reasoning support. We describe this language, the OilEd editing tool, reasoning support and a strategy for the language’s use. We finish with a current example, in the Gene Ontology Next Generation (GONG) project, that uses DAML + OIL as the basis for moving the Gene Ontology from its current hand-crafted, form to one that uses logical descriptions of a concept’s properties to deliver a more complete version of the ontology. PMID:18629114

  13. Genome-wide gene pathway analysis of psychotic illness symptom dimensions based on a new schizophrenia-specific model of the OPCRIT.

    Science.gov (United States)

    Docherty, Anna R; Bigdeli, T Bernard; Edwards, Alexis C; Bacanu, Silviu; Lee, Donghyung; Neale, Michael C; Wormley, Brandon K; Walsh, Dermot; O'Neill, F Anthony; Riley, Brien P; Kendler, Kenneth S; Fanous, Ayman H

    2015-05-01

    Empirically derived phenotypic measurements have the potential to enhance gene-finding efforts in schizophrenia. Previous research based on factor analyses of symptoms has typically included schizoaffective cases. Deriving factor loadings from analysis of only narrowly defined schizophrenia cases could yield more sensitive factor scores for gene pathway and gene ontology analyses. Using an Irish family sample, this study 1) factor analyzed clinician-rated Operational Criteria Checklist items in cases with schizophrenia only, 2) scored the full sample based on these factor loadings, and 3) implemented genome-wide association, gene-based, and gene-pathway analysis of these SCZ-based symptom factors (final N=507). Three factors emerged from the analysis of the schizophrenia cases: a manic, a depressive, and a positive symptom factor. In gene-based analyses of these factors, multiple genes had qschizophrenia.

  14. Using phylogenomic patterns and gene ontology to identify proteins of importance in plant evolution.

    Science.gov (United States)

    Cibrián-Jaramillo, Angélica; De la Torre-Bárcena, Jose E; Lee, Ernest K; Katari, Manpreet S; Little, Damon P; Stevenson, Dennis W; Martienssen, Rob; Coruzzi, Gloria M; DeSalle, Rob

    2010-07-12

    We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants.

  15. SUGOI: automated ontology interchangeability

    CSIR Research Space (South Africa)

    Khan, ZC

    2015-04-01

    Full Text Available A foundational ontology can solve interoperability issues among the domain ontologies aligned to it. However, several foundational ontologies have been developed, hence such interoperability issues exist among domain ontologies. The novel SUGOI tool...

  16. Analysis of Differentially Expressed Genes and Signaling Pathways Related to Intramuscular Fat Deposition in Skeletal Muscle of Sex-Linked Dwarf Chickens

    Directory of Open Access Journals (Sweden)

    Yaqiong Ye

    2014-01-01

    Full Text Available Intramuscular fat (IMF plays an important role in meat quality. However, the molecular mechanisms underlying IMF deposition in skeletal muscle have not been addressed for the sex-linked dwarf (SLD chicken. In this study, potential candidate genes and signaling pathways related to IMF deposition in chicken leg muscle tissue were characterized using gene expression profiling of both 7-week-old SLD and normal chickens. A total of 173 differentially expressed genes (DEGs were identified between the two breeds. Subsequently, 6 DEGs related to lipid metabolism or muscle development were verified in each breed based on gene ontology (GO analysis. In addition, KEGG pathway analysis of DEGs indicated that some of them (GHR, SOCS3, and IGF2BP3 participate in adipocytokine and insulin signaling pathways. To investigate the role of the above signaling pathways in IMF deposition, the gene expression of pathway factors and other downstream genes were measured by using qRT-PCR and Western blot analyses. Collectively, the results identified potential candidate genes related to IMF deposition and suggested that IMF deposition in skeletal muscle of SLD chicken is regulated partially by pathways of adipocytokine and insulin and other downstream signaling pathways (TGF-β/SMAD3 and Wnt/catenin-β pathway.

  17. Inference of gene pathways using mixture Bayesian networks

    Directory of Open Access Journals (Sweden)

    Ko Younhee

    2009-05-01

    Full Text Available Abstract Background Inference of gene networks typically relies on measurements across a wide range of conditions or treatments. Although one network structure is predicted, the relationship between genes could vary across conditions. A comprehensive approach to infer general and condition-dependent gene networks was evaluated. This approach integrated Bayesian network and Gaussian mixture models to describe continuous microarray gene expression measurements, and three gene networks were predicted. Results The first reconstructions of a circadian rhythm pathway in honey bees and an adherens junction pathway in mouse embryos were obtained. In addition, general and condition-specific gene relationships, some unexpected, were detected in these two pathways and in a yeast cell-cycle pathway. The mixture Bayesian network approach identified all (honey bee circadian rhythm and mouse adherens junction pathways or the vast majority (yeast cell-cycle pathway of the gene relationships reported in empirical studies. Findings across the three pathways and data sets indicate that the mixture Bayesian network approach is well-suited to infer gene pathways based on microarray data. Furthermore, the interpretation of model estimates provided a broader understanding of the relationships between genes. The mixture models offered a comprehensive description of the relationships among genes in complex biological processes or across a wide range of conditions. The mixture parameter estimates and corresponding odds that the gene network inferred for a sample pertained to each mixture component allowed the uncovering of both general and condition-dependent gene relationships and patterns of expression. Conclusion This study demonstrated the two main benefits of learning gene pathways using mixture Bayesian networks. First, the identification of the optimal number of mixture components supported by the data offered a robust approach to infer gene relationships and

  18. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

    Directory of Open Access Journals (Sweden)

    Xiaomei Wu

    Full Text Available BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS, which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC. RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS. HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from http://cmb.bnu.edu.cn/hrss.

  19. Development of an Ontology for Periodontitis.

    Science.gov (United States)

    Suzuki, Asami; Takai-Igarashi, Takako; Nakaya, Jun; Tanaka, Hiroshi

    2015-01-01

    In the clinical dentists and periodontal researchers' community, there is an obvious demand for a systems model capable of linking the clinical presentation of periodontitis to underlying molecular knowledge. A computer-readable representation of processes on disease development will give periodontal researchers opportunities to elucidate pathways and mechanisms of periodontitis. An ontology for periodontitis can be a model for integration of large variety of factors relating to a complex disease such as chronic inflammation in different organs accompanied by bone remodeling and immune system disorders, which has recently been referred to as osteoimmunology. Terms characteristic of descriptions related to the onset and progression of periodontitis were manually extracted from 194 review articles and PubMed abstracts by experts in periodontology. We specified all the relations between the extracted terms and constructed them into an ontology for periodontitis. We also investigated matching between classes of our ontology and that of Gene Ontology Biological Process. We developed an ontology for periodontitis called Periodontitis-Ontology (PeriO). The pathological progression of periodontitis is caused by complex, multi-factor interrelationships. PeriO consists of all the required concepts to represent the pathological progression and clinical treatment of periodontitis. The pathological processes were formalized with reference to Basic Formal Ontology and Relation Ontology, which accounts for participants in the processes realized by biological objects such as molecules and cells. We investigated the peculiarity of biological processes observed in pathological progression and medical treatments for the disease in comparison with Gene Ontology Biological Process (GO-BP) annotations. The results indicated that peculiarities of Perio existed in 1) granularity and context dependency of both the conceptualizations, and 2) causality intrinsic to the pathological processes

  20. Ontology Localization

    OpenAIRE

    2009-01-01

    Nuestra meta principal en esta tesis es proponer una solución para construir una ontología multilingüe, a través de la localización automática de una ontología. La noción de localización viene del área de Desarrollo de Software que hace referencia a la adaptación de un producto de software a un ambiente no nativo. En la Ingeniería Ontológica, la localización de ontologías podría ser considerada como un subtipo de la localización de software en el cual el producto es un modelo compartido de un...

  1. Text mining in cancer gene and pathway prioritization.

    Science.gov (United States)

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  2. Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available The current increase in Gene Ontology (GO annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

  3. Differences in gene expression profiles and carcinogenesis pathways involved in cisplatin resistance of four types of cancer.

    Science.gov (United States)

    Yang, Yong; Li, Hui; Hou, Shengcai; Hu, Bin; Liu, Jie; Wang, Jun

    2013-08-01

    Cisplatin-based chemotherapy is the standard therapy used for the treatment of several types of cancer. However, its efficacy is largely limited by the acquired drug resistance. To date, little is known about the RNA expression changes in cisplatin-resistant cancers. Identification of the RNAs related to cisplatin resistance may provide specific insight into cancer therapy. In the present study, expression profiling of 7 cancer cell lines was performed using oligonucleotide microarray analysis data obtained from the GEO database. Bioinformatic analyses such as the Gene Ontology (GO) and KEGG pathway were used to identify genes and pathways specifically associated with cisplatin resistance. A signal transduction network was established to identify the core genes in regulating cancer cell cisplatin resistance. A number of genes were differentially expressed in 7 groups of cancer cell lines. They mainly participated in 85 GO terms and 11 pathways in common. All differential gene interactions in the Signal-Net were analyzed. CTNNB1, PLCG2 and SRC were the most significantly altered. With the use of bioinformatics, large amounts of data in microarrays were retrieved and analyzed by means of thorough experimental planning, scientific statistical analysis and collection of complete data on cancer cell cisplatin resistance. In the present study, a novel differential gene expression pattern was constructed and further study will provide new targets for the diagnosis and mechanisms of cancer cisplatin resistance.

  4. The updated RGD Pathway Portal utilizes increased curation efficiency and provides expanded pathway information.

    Science.gov (United States)

    Hayman, G Thomas; Jayaraman, Pushkala; Petri, Victoria; Tutaj, Marek; Liu, Weisong; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary

    2013-02-05

    The RGD Pathway Portal provides pathway annotations for rat, human and mouse genes and pathway diagrams and suites, all interconnected via the pathway ontology. Diagram pages present the diagram and description, with diagram objects linked to additional resources. A newly-developed dual-functionality web application composes the diagram page. Curators input the description, diagram, references and additional pathway objects. The application combines these with tables of rat, human and mouse pathway genes, including genetic information, analysis tool and reference links, and disease, phenotype and other pathway annotations to pathway genes. The application increases the information content of diagram pages while expediting publication.

  5. Ontology Requirements Specification

    OpenAIRE

    Suárez-Figueroa, Mari Carmen; Gómez-Pérez, A.

    2012-01-01

    The goal of the ontology requirements specification activity is to state why the ontology is being built, what its intended uses are, who the end users are, and which requirements the ontology should fulfill. This chapter presents detailed methodological guidelines for specifying ontology requirements efficiently. These guidelines will help ontology engineers to capture ontology requirements and produce the ontology requirements specification document (ORSD). The ORSD will play a key role dur...

  6. Lentiviral gene ontology (LeGO) vectors equipped with novel drug-selectable fluorescent proteins: new building blocks for cell marking and multi-gene analysis.

    Science.gov (United States)

    Weber, K; Mock, U; Petrowitz, B; Bartsch, U; Fehse, B

    2010-04-01

    Vector-encoded fluorescent proteins (FPs) facilitate unambiguous identification or sorting of gene-modified cells by fluorescence-activated cell sorting (FACS). Exploiting this feature, we have recently developed lentiviral gene ontology (LeGO) vectors (www.LentiGO-Vectors.de) for multi-gene analysis in different target cells. In this study, we extend the LeGO principle by introducing 10 different drug-selectable FPs created by fusing one of the five selection marker (protecting against blasticidin, hygromycin, neomycin, puromycin and zeocin) and one of the five FP genes (Cerulean, eGFP, Venus, dTomato and mCherry). All tested fusion proteins allowed both fluorescence-mediated detection and drug-mediated selection of LeGO-transduced cells. Newly generated codon-optimized hygromycin- and neomycin-resistance genes showed improved expression as compared with their ancestors. New LeGO constructs were produced at titers >10(6) per ml (for non-concentrated supernatants). We show efficient combinatorial marking and selection of various cells, including mesenchymal stem cells, simultaneously transduced with different LeGO constructs. Inclusion of the cytomegalovirus early enhancer/chicken beta-actin promoter into LeGO vectors facilitated robust transgene expression in and selection of neural stem cells and their differentiated progeny. We suppose that the new drug-selectable markers combining advantages of FACS and drug selection are well suited for numerous applications and vector systems. Their inclusion into LeGO vectors opens new possibilities for (stem) cell tracking and functional multi-gene analysis.

  7. Gene expression profiling of lymphoblasts from autistic and nonaffected sib pairs: altered pathways in neuronal development and steroid biosynthesis.

    Science.gov (United States)

    Hu, Valerie W; Nguyen, AnhThu; Kim, Kyung Soon; Steinberg, Mara E; Sarachana, Tewarit; Scully, Michele A; Soldin, Steven J; Luu, Truong; Lee, Norman H

    2009-06-03

    Despite the identification of numerous autism susceptibility genes, the pathobiology of autism remains unknown. The present "case-control" study takes a global approach to understanding the molecular basis of autism spectrum disorders based upon large-scale gene expression profiling. DNA microarray analyses were conducted on lymphoblastoid cell lines from over 20 sib pairs in which one sibling had a diagnosis of autism and the other was not affected in order to identify biochemical and signaling pathways which are differentially regulated in cells from autistic and nonautistic siblings. Bioinformatics and gene ontological analyses of the data implicate genes which are involved in nervous system development, inflammation, and cytoskeletal organization, in addition to genes which may be relevant to gastrointestinal or other physiological symptoms often associated with autism. Moreover, the data further suggests that these processes may be modulated by cholesterol/steroid metabolism, especially at the level of androgenic hormones. Elevation of male hormones, in turn, has been suggested as a possible factor influencing susceptibility to autism, which affects approximately 4 times as many males as females. Preliminary metabolic profiling of steroid hormones in lymphoblastoid cell lines from several pairs of siblings reveals higher levels of testosterone in the autistic sibling, which is consistent with the increased expression of two genes involved in the steroidogenesis pathway. Global gene expression profiling of cultured cells from ASD probands thus serves as a window to underlying metabolic and signaling deficits that may be relevant to the pathobiology of autism.

  8. Gene expression profiling of lymphoblasts from autistic and nonaffected sib pairs: altered pathways in neuronal development and steroid biosynthesis.

    Directory of Open Access Journals (Sweden)

    Valerie W Hu

    Full Text Available Despite the identification of numerous autism susceptibility genes, the pathobiology of autism remains unknown. The present "case-control" study takes a global approach to understanding the molecular basis of autism spectrum disorders based upon large-scale gene expression profiling. DNA microarray analyses were conducted on lymphoblastoid cell lines from over 20 sib pairs in which one sibling had a diagnosis of autism and the other was not affected in order to identify biochemical and signaling pathways which are differentially regulated in cells from autistic and nonautistic siblings. Bioinformatics and gene ontological analyses of the data implicate genes which are involved in nervous system development, inflammation, and cytoskeletal organization, in addition to genes which may be relevant to gastrointestinal or other physiological symptoms often associated with autism. Moreover, the data further suggests that these processes may be modulated by cholesterol/steroid metabolism, especially at the level of androgenic hormones. Elevation of male hormones, in turn, has been suggested as a possible factor influencing susceptibility to autism, which affects approximately 4 times as many males as females. Preliminary metabolic profiling of steroid hormones in lymphoblastoid cell lines from several pairs of siblings reveals higher levels of testosterone in the autistic sibling, which is consistent with the increased expression of two genes involved in the steroidogenesis pathway. Global gene expression profiling of cultured cells from ASD probands thus serves as a window to underlying metabolic and signaling deficits that may be relevant to the pathobiology of autism.

  9. Impact of ontology evolution on functional analyses.

    Science.gov (United States)

    Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard

    2012-10-15

    Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.

  10. Ontological backdrop

    DEFF Research Database (Denmark)

    Galle, Per

    2000-01-01

    In this report I keep track of ontological assumptions or implications of other OARs, introducing a system of categories and concepts that is compatible with them. The purpose was originally to keep terminology consistent throughout all OARs. However, the report also gives a condensed picture...... of the world view which underlies my current work on product modelling. It contains a justification of my view of concept exemplification, with lines traced back to Kant's work on epistemology....

  11. Phylogenetic Origin and Diversification of RNAi Pathway Genes in Insects.

    Science.gov (United States)

    Dowling, Daniel; Pauli, Thomas; Donath, Alexander; Meusemann, Karen; Podsiadlowski, Lars; Petersen, Malte; Peters, Ralph S; Mayer, Christoph; Liu, Shanlin; Zhou, Xin; Misof, Bernhard; Niehuis, Oliver

    2017-01-06

    RNA interference (RNAi) refers to the set of molecular processes found in eukaryotic organisms in which small RNA molecules mediate the silencing or down-regulation of target genes. In insects, RNAi serves a number of functions, including regulation of endogenous genes, anti-viral defense, and defense against transposable elements. Despite being well studied in model organisms, such as Drosophila, the distribution of core RNAi pathway genes and their evolution in insects is not well understood. Here we present the most comprehensive overview of the distribution and diversity of core RNAi pathway genes across 100 insect species, encompassing all currently recognized insect orders. We inferred the phylogenetic origin of insect-specific RNAi pathway genes and also identified several hitherto unrecorded gene expansions using whole-body transcriptome data from the international 1KITE (1000 Insect Transcriptome Evolution) project as well as other resources such as i5K (5000 Insect Genome Project). Specifically, we traced the origin of the double stranded RNA binding protein R2D2 to the last common ancestor of winged insects (Pterygota), the loss of Sid-1/Tag-130 orthologs in Antliophora (fleas, flies and relatives, and scorpionflies in a broad sense), and confirm previous evidence for the splitting of the Argonaute proteins Aubergine and Piwi in Brachyceran flies (Diptera, Brachycera). Our study offers new reference points for future experimental research on RNAi-related pathway genes in insects.

  12. Building ontologies with basic formal ontology

    CERN Document Server

    Arp, Robert; Spear, Andrew D.

    2015-01-01

    In the era of "big data," science is increasingly information driven, and the potential for computers to store, manage, and integrate massive amounts of data has given rise to such new disciplinary fields as biomedical informatics. Applied ontology offers a strategy for the organization of scientific information in computer-tractable form, drawing on concepts not only from computer and information science but also from linguistics, logic, and philosophy. This book provides an introduction to the field of applied ontology that is of particular relevance to biomedicine, covering theoretical components of ontologies, best practices for ontology design, and examples of biomedical ontologies in use. After defining an ontology as a representation of the types of entities in a given domain, the book distinguishes between different kinds of ontologies and taxonomies, and shows how applied ontology draws on more traditional ideas from metaphysics. It presents the core features of the Basic Formal Ontology (BFO), now u...

  13. The Orthology Ontology: development and applications.

    Science.gov (United States)

    Fernández-Breis, Jesualdo Tomás; Chiba, Hirokazu; Legaz-García, María Del Carmen; Uchiyama, Ikuo

    2016-06-04

    Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.

  14. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior

  15. Changes in winter depression phenotype correlate with white blood cell gene expression profiles : A combined metagene and gene ontology approach

    NARCIS (Netherlands)

    Bosker, Fokko J.; Terpstra, Peter; Gladkevich, Anatoliy V.; Dijck-Brouwer, D. A. Janneke; te Meerman, Gerard; Nolen, Willem A.; Schoevers, Robert A.; Meesters, Ybe

    2015-01-01

    In the present study we evaluate the feasibility of gene expression in white blood cells as a peripheral marker for winter depression. Sixteen patients with winter type seasonal affective disorder were included in the study. Blood was taken by venous puncture at three time points; in winter prior an

  16. Global analysis of gene expression in NGF-deprived sympathetic neurons identifies molecular pathways associated with cell death

    Directory of Open Access Journals (Sweden)

    Kristiansen Mark

    2011-11-01

    Full Text Available Abstract Background Developing sympathetic neurons depend on nerve growth factor (NGF for survival and die by apoptosis after NGF withdrawal. This process requires de novo gene expression but only a small number of genes induced by NGF deprivation have been identified so far, either by a candidate gene approach or in mRNA differential display experiments. This is partly because it is difficult to obtain large numbers of sympathetic neurons for in vitro studies. Here, we describe for the first time, how advances in gene microarray technology have allowed us to investigate the expression of all known genes in sympathetic neurons cultured in the presence and absence of NGF. Results We have used Affymetrix Exon arrays to study the pattern of expression of all known genes in NGF-deprived sympathetic neurons. We identified 415 up- and 813 down-regulated genes, including most of the genes previously known to be regulated in this system. NGF withdrawal activates the mixed lineage kinase (MLK-c-Jun N-terminal kinase (JNK-c-Jun pathway which is required for NGF deprivation-induced death. By including a mixed lineage kinase (MLK inhibitor, CEP-11004, in our experimental design we identified which of the genes induced after NGF withdrawal are potential targets of the MLK-JNK-c-Jun pathway. A detailed Gene Ontology and functional enrichment analysis also identified genetic pathways that are highly enriched and overrepresented amongst the genes expressed after NGF withdrawal. Five genes not previously studied in sympathetic neurons - trib3, ddit3, txnip, ndrg1 and mxi1 - were validated by real time-PCR. The proteins encoded by these genes also increased in level after NGF withdrawal and this increase was prevented by CEP-11004, suggesting that these genes are potential targets of the MLK-JNK-c-Jun pathway. Conclusions The sympathetic neuron model is one of the best studied models of neuronal apoptosis. Overall, our microarray data gives a comprehensive

  17. Ontological Surprises

    DEFF Research Database (Denmark)

    Leahu, Lucian

    2016-01-01

    This paper investigates how we might rethink design as the technological crafting of human-machine relations in the context of a machine learning technique called neural networks. It analyzes Google’s Inceptionism project, which uses neural networks for image recognition. The surprising output of...... a hybrid approach where machine learning algorithms are used to identify objects as well as connections between them; finally, it argues for remaining open to ontological surprises in machine learning as they may enable the crafting of different relations with and through technologies....

  18. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    Science.gov (United States)

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  19. Design and synthesis of pathway genes for polyketide biosynthesis.

    Science.gov (United States)

    Peirú, Salvador; Gramajo, Hugo; Menzella, Hugo G

    2009-01-01

    In this chapter we describe novel methods for the design and assembly of synthetic pathways for the synthesis of polyketides and tailoring sugars. First, a generic design for type I polyketide synthase genes is presented that allows their facile assembly for the expression of chimeric enzymes in an engineered Escherichia coli host. The sequences of the synthetic genes are based on naturally occurring polyketide synthase genes but they are redesigned by custom-made software to optimize codon usage to maximize expression in E. coli and to provide a standard set of restriction sites to allow combinatorial assembly into unnatural enzymes. The methodology has been validated by building a large number of bimodular mini-PKSs that make easily assayed triketide products. Learning from the successful bimodules, a conceptual advance was made by assembling genes encoding functional trimodular enzymes, capable of making tetraketide products. Second, methods for the rapid assembly and exchange of sugar pathway genes into functional operons are described. The approach was validated by the assembly of the 15 genes for the synthesis of mycarose and desosamine in two operons, which yielded erythromycin C when coexpressed with the corresponding PKS genes. These methods are important enabling steps toward the goals of making designer drugs by polyketide synthase and sugar pathway engineering and, in the shorter term, producing by fermentation advanced intermediates for the synthesis of compounds that otherwise require large numbers of chemical steps.

  20. Transcriptome analysis and discovery of genes involved in immune pathways from hepatopancreas of microbial challenged mitten crab Eriocheir sinensis.

    Directory of Open Access Journals (Sweden)

    Xihong Li

    Full Text Available BACKGROUND: The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq technology provides a powerful and efficient method for transcript analysis and immune gene discovery. METHODS/PRINCIPAL FINDINGS: A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 10(8 cfu·mL(-1 was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr database. For function classification and pathway assignment, 18,734 (36.00% unigenes were categorized to three Gene Ontology (GO categories, 12,243 (23.51% were classified to 25 Clusters of Orthologous Groups (COG, and 8,983 (17.25% were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. CONCLUSIONS/SIGNIFICANCE: This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab.

  1. Generating application ontologies from reference ontologies.

    Science.gov (United States)

    Shaw, Marianne; Detwiler, Landon T; Brinkley, James F; Suciu, Dan

    2008-11-06

    The semantic web provides the possiblity of linking together large numbers of biomedical ontologies. Unfortunately, many of the biomedical ontologies that have been developed are domain-specific and do not share a common structure that will allow them to be easily combined. Reference ontologies provide the necessary ontological framework for linking together these smaller, specialized ontologies. We present extensions to the semantic web query language SparQL that will allow researchers to develop application ontologies that are derived from reference ontologies. We have modified the ARQ query processor to support subqueries, recursive subqueries, and Skolem functions for node creation. We demonstrate the utility of these extensions by deriving an application ontology from the Foundational Model of Anatomy.

  2. Partial Sleep Restriction Activates Immune Response-Related Gene Expression Pathways: Experimental and Epidemiological Studies in Humans

    Science.gov (United States)

    Rantanen, Ville; Kronholm, Erkki; Surakka, Ida; van Leeuwen, Wessel M. A.; Lehto, Maili; Matikainen, Sampsa; Ripatti, Samuli; Härmä, Mikko; Sallinen, Mikael; Salomaa, Veikko; Jauhiainen, Matti; Alenius, Harri; Paunio, Tiina; Porkka-Heiskanen, Tarja

    2013-01-01

    Epidemiological studies have shown that short or insufficient sleep is associated with increased risk for metabolic diseases and mortality. To elucidate mechanisms behind this connection, we aimed to identify genes and pathways affected by experimentally induced, partial sleep restriction and to verify their connection to insufficient sleep at population level. The experimental design simulated sleep restriction during a working week: sleep of healthy men (N = 9) was restricted to 4 h/night for five nights. The control subjects (N = 4) spent 8 h/night in bed. Leukocyte RNA expression was analyzed at baseline, after sleep restriction, and after recovery using whole genome microarrays complemented with pathway and transcription factor analysis. Expression levels of the ten most up-regulated and ten most down-regulated transcripts were correlated with subjective assessment of insufficient sleep in a population cohort (N = 472). Experimental sleep restriction altered the expression of 117 genes. Eight of the 25 most up-regulated transcripts were related to immune function. Accordingly, fifteen of the 25 most up-regulated Gene Ontology pathways were also related to immune function, including those for B cell activation, interleukin 8 production, and NF-κB signaling (P<0.005). Of the ten most up-regulated genes, expression of STX16 correlated negatively with self-reported insufficient sleep in a population sample, while three other genes showed tendency for positive correlation. Of the ten most down-regulated genes, TBX21 and LGR6 correlated negatively and TGFBR3 positively with insufficient sleep. Partial sleep restriction affects the regulation of signaling pathways related to the immune system. Some of these changes appear to be long-lasting and may at least partly explain how prolonged sleep restriction can contribute to inflammation-associated pathological states, such as cardiometabolic diseases. PMID:24194869

  3. Partial sleep restriction activates immune response-related gene expression pathways: experimental and epidemiological studies in humans.

    Directory of Open Access Journals (Sweden)

    Vilma Aho

    Full Text Available Epidemiological studies have shown that short or insufficient sleep is associated with increased risk for metabolic diseases and mortality. To elucidate mechanisms behind this connection, we aimed to identify genes and pathways affected by experimentally induced, partial sleep restriction and to verify their connection to insufficient sleep at population level. The experimental design simulated sleep restriction during a working week: sleep of healthy men (N = 9 was restricted to 4 h/night for five nights. The control subjects (N = 4 spent 8 h/night in bed. Leukocyte RNA expression was analyzed at baseline, after sleep restriction, and after recovery using whole genome microarrays complemented with pathway and transcription factor analysis. Expression levels of the ten most up-regulated and ten most down-regulated transcripts were correlated with subjective assessment of insufficient sleep in a population cohort (N = 472. Experimental sleep restriction altered the expression of 117 genes. Eight of the 25 most up-regulated transcripts were related to immune function. Accordingly, fifteen of the 25 most up-regulated Gene Ontology pathways were also related to immune function, including those for B cell activation, interleukin 8 production, and NF-κB signaling (P<0.005. Of the ten most up-regulated genes, expression of STX16 correlated negatively with self-reported insufficient sleep in a population sample, while three other genes showed tendency for positive correlation. Of the ten most down-regulated genes, TBX21 and LGR6 correlated negatively and TGFBR3 positively with insufficient sleep. Partial sleep restriction affects the regulation of signaling pathways related to the immune system. Some of these changes appear to be long-lasting and may at least partly explain how prolonged sleep restriction can contribute to inflammation-associated pathological states, such as cardiometabolic diseases.

  4. Signal Transduction Pathways that Regulate CAB Gene Expression

    Energy Technology Data Exchange (ETDEWEB)

    Chory, Joanne

    2004-12-31

    The process of chloroplast differentiation, involves the coordinate regulation of many nuclear and chloroplast genes. The cues for the initiation of this developmental program are both extrinsic (e.g., light) and intrinsic (cell-type and plastid signals). During this project period, we utilized a molecular genetic approach to select for Arabidopsis mutants that did not respond properly to environmental light conditions, as well as mutants that were unable to perceive plastid damage. These latter mutants, called gun mutants, define two retrograde signaling pathways that regulate nuclear gene expression in response to chloroplasts. A major finding was to identify a signal from chloroplasts that regulates nuclear gene transcription. This signal is the build-up of Mg-Protoporphyrin IX, a key intermediate of the chlorophyll biosynthetic pathway. The signaling pathways downstream of this signal are currently being studied. Completion of this project has provided an increased understanding of the input signals and retrograde signaling pathways that control nuclear gene expression in response to the functional state of chloroplasts. These studies should ultimately influence our abilities to manipulate plant growth and development, and will aid in the understanding of the developmental control of photosynthesis.

  5. Signal Transduction Pathways that Regulate CAB Gene Expression

    Energy Technology Data Exchange (ETDEWEB)

    Chory, Joanne

    2006-01-16

    The process of chloroplast differentiation, involves the coordinate regulation of many nuclear and chloroplast genes. The cues for the initiation of this developmental program are both extrinsic (e.g., light) and intrinsic (cell-type and plastid signals). During this project period, we utilized a molecular genetic approach to select for Arabidopsis mutants that did not respond properly to environmental light conditions, as well as mutants that were unable to perceive plastid damage. These latter mutants, called gun mutants, define two retrograde signaling pathways that regulate nuclear gene expression in response to chloroplasts. A major finding was to identify a signal from chloroplasts that regulates nuclear gene transcription. This signal is the build-up of Mg-Protoporphyrin IX, a key intermediate of the chlorophyll biosynthetic pathway. The signaling pathways downstream of this signal are currently being studied. Completion of this project has provided an increased understanding of the input signals and retrograde signaling pathways that control nuclear gene expression in response to the functional state of chloroplasts. These studies should ultimately influence our abilities to manipulate plant growth and development, and will aid in the understanding of the developmental control of photosynthesis.

  6. Anatomy Ontology Matching Using Markov Logic Networks

    Directory of Open Access Journals (Sweden)

    Chunhua Li

    2016-01-01

    Full Text Available The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relationships between ontologies describing different species. Ontology matching is a kind of solutions to find semantic correspondences between entities of different ontologies. Markov logic networks which unify probabilistic graphical model and first-order logic provide an excellent framework for ontology matching. We combine several different matching strategies through first-order logic formulas according to the structure of anatomy ontologies. Experiments on the adult mouse anatomy and the human anatomy have demonstrated the effectiveness of proposed approach in terms of the quality of result alignment.

  7. Exploring two plant hosts for expression of diterpenoid pathway genes

    DEFF Research Database (Denmark)

    Bach, Søren Spanner

    by humanity in biopharmaceuticals or as industrial bioproducts. Yields and purity of diterpenoids purified from natural sources or made by chemical synthesis are generally insufficient for large-volume or high-end applications, thus alternative sources are needed. Synthetic biology, where heterologous pathways...... have been reconstructed in host production organisms is an attractive lternative, which holds the promise to enable a scalable, costeffective and table supply of natural products. Knowledge about the genes and mechanisms nvolved in the original pathway is a prerequisite for such heterologous production...... is compatible with native codon usage, and through the conserved mechanisms of protein targeting and posttranslational odifications, has the capacity to produce functional enzymes. To further explore plant based expression and characterization of diterpenoid pathway genes, two different plant expression hosts...

  8. Integrative analysis of RUNX1 downstream pathways and target genes

    Science.gov (United States)

    Michaud, Joëlle; Simpson, Ken M; Escher, Robert; Buchet-Poyau, Karine; Beissbarth, Tim; Carmichael, Catherine; Ritchie, Matthew E; Schütz, Frédéric; Cannon, Ping; Liu, Marjorie; Shen, Xiaofeng; Ito, Yoshiaki; Raskind, Wendy H; Horwitz, Marshall S; Osato, Motomi; Turner, David R; Speed, Terence P; Kavallaris, Maria; Smyth, Gordon K; Scott, Hamish S

    2008-01-01

    Background The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML). The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. Results Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1) cell lines with RUNX1 mutations from FPD-AML patients, 2) over-expression of RUNX1 and CBFβ, and 3) Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes) significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFβ. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. Conclusion This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease progression in both

  9. Integrative analysis of RUNX1 downstream pathways and target genes

    Directory of Open Access Journals (Sweden)

    Liu Marjorie

    2008-07-01

    Full Text Available Abstract Background The RUNX1 transcription factor gene is frequently mutated in sporadic myeloid and lymphoid leukemia through translocation, point mutation or amplification. It is also responsible for a familial platelet disorder with predisposition to acute myeloid leukemia (FPD-AML. The disruption of the largely unknown biological pathways controlled by RUNX1 is likely to be responsible for the development of leukemia. We have used multiple microarray platforms and bioinformatic techniques to help identify these biological pathways to aid in the understanding of why RUNX1 mutations lead to leukemia. Results Here we report genes regulated either directly or indirectly by RUNX1 based on the study of gene expression profiles generated from 3 different human and mouse platforms. The platforms used were global gene expression profiling of: 1 cell lines with RUNX1 mutations from FPD-AML patients, 2 over-expression of RUNX1 and CBFβ, and 3 Runx1 knockout mouse embryos using either cDNA or Affymetrix microarrays. We observe that our datasets (lists of differentially expressed genes significantly correlate with published microarray data from sporadic AML patients with mutations in either RUNX1 or its cofactor, CBFβ. A number of biological processes were identified among the differentially expressed genes and functional assays suggest that heterozygous RUNX1 point mutations in patients with FPD-AML impair cell proliferation, microtubule dynamics and possibly genetic stability. In addition, analysis of the regulatory regions of the differentially expressed genes has for the first time systematically identified numerous potential novel RUNX1 target genes. Conclusion This work is the first large-scale study attempting to identify the genetic networks regulated by RUNX1, a master regulator in the development of the hematopoietic system and leukemia. The biological pathways and target genes controlled by RUNX1 will have considerable importance in disease

  10. Analysis of JAK-STAT signaling pathway genes and their microRNAs in the intestinal mucosa of genetically disparate chicken lines induced with necrotic enteritis.

    Science.gov (United States)

    Truong, Anh Duc; Rengaraj, Deivendran; Hong, Yeojin; Hoang, Cong Thanh; Hong, Yeong Ho; Lillehoj, Hyun S

    2017-05-01

    The JAK-STAT signaling pathway plays a key role in cytokine and growth factor activation and is involved in several cellular functions and diseases. The main objective of this study was to investigate the expression of candidate JAK-STAT pathway genes and their regulators and interactors in the intestinal mucosal layer of two genetically disparate chicken lines [Marek's disease (MD)-resistant line 6.3 and MD-susceptible line 7.2] induced with necrotic enteritis (NE). Through RNA-sequencing, we investigated 116 JAK-STAT signaling pathway-related genes that were significant and differentially expressed between the intestinal mucosa of the two lines compared with respective uninfected controls. About 15 JAK-STAT pathway genes were further verified by qRT-PCR, and the results were in agreement with our sequencing data. All the identified 116 genes were annotated through Gene Ontology and mapped to the KEGG chicken JAK-STAT signaling pathway. To the best of our knowledge, this is the first study to represent the transcriptional analysis of a large number of candidate genes, regulators, and potential interactors in the JAK-STAT pathway of the two chicken lines induced with NE. Several key genes of the interactome, namely, STAT1/3/4, STAT5B, JAK1-3, TYK2, AKT1/3, SOCS1-5, PIAS1/2/4, PTPN6/11, and PIK3, were determined to be differentially expressed in the two lines. Moreover, we detected 68 known miRNAs variably targeting JAK-STAT pathway genes and differentially expressed in the two lines induced with NE. The RNA-sequencing and bioinformatics analyses in this study provided an abundance of data that will be useful for future studies on JAK-STAT pathways associated with the functions of two genetically disparate chicken lines induced with NE. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. De novo characterization of the spleen transcriptome of the large yellow croaker (Pseudosciaena crocea) and analysis of the immune relevant genes and pathways involved in the antiviral response

    KAUST Repository

    Mu, Yinnan

    2014-05-12

    The large yellow croaker (Pseudosciaena crocea) is an economically important marine fish in China. To understand the molecular basis for antiviral defense in this species, we used Illumia paired-end sequencing to characterize the spleen transcriptome of polyriboinosinic:polyribocytidylic acid [poly(I:C)]-induced large yellow croakers. The library produced 56,355,728 reads and assembled into 108,237 contigs. As a result, 15,192 unigenes were found from this transcriptome. Gene ontology analysis showed that 4,759 genes were involved in three major functional categories: biological process, cellular component, and molecular function. We further ascertained that numerous consensus sequences were homologous to known immune-relevant genes. Kyoto Encyclopedia of Genes and Genomes orthology mapping annotated 5,389 unigenes and identified numerous immune-relevant pathways. These immune-relevant genes and pathways revealed major antiviral immunity effectors, including but not limited to: pattern recognition receptors, adaptors and signal transducers, the interferons and interferon-stimulated genes, inflammatory cytokines and receptors, complement components, and B-cell and T-cell antigen activation molecules. Moreover, the partial genes of Toll-like receptor signaling pathway, RIG-I-like receptors signaling pathway, Janus kinase-Signal Transducer and Activator of Transcription (JAK-STAT) signaling pathway, and T-cell receptor (TCR) signaling pathway were found to be changed after poly(I:C) induction by real-time polymerase chain reaction (PCR) analysis, suggesting that these signaling pathways may be regulated by poly(I:C), a viral mimic. Overall, the antivirus-related genes and signaling pathways that were identified in response to poly(I:C) challenge provide valuable leads for further investigation of the antiviral defense mechanism in the large yellow croaker. © 2014 Mu et al.

  12. De novo transcriptomes of olfactory epithelium reveal the genes and pathways for spawning migration in japanese grenadier anchovy (Coilia nasus.

    Directory of Open Access Journals (Sweden)

    Guoli Zhu

    Full Text Available BACKGROUND: Coilia nasus (Japanese grenadier anchovy undergoes spawning migration from the ocean to fresh water inland. Previous studies have suggested that anadromous fish use olfactory cues to perform successful migration to spawn. However, limited genomic information is available for C. nasus. To understand the molecular mechanisms of spawning migration, it is essential to identify the genes and pathways involved in the migratory behavior of C. nasus. RESULTS: Using de novo transcriptome sequencing and assembly, we constructed two transcriptomes of the olfactory epithelium from wild anadromous and non-anadromous C. nasus. Over 178 million high-quality clean reads were generated using Illumina sequencing technology and assembled into 176,510 unigenes (mean length: 843 bp. About 51% (89,456 of the unigenes were functionally annotated using protein databases. Gene ontology analysis of the transcriptomes indicated gene enrichment not only in signal detection and transduction, but also in regulation and enzymatic activity. The potential genes and pathways involved in the migratory behavior were identified. In addition, simple sequence repeats and single nucleotide polymorphisms were analyzed to identify potential molecular markers. CONCLUSION: We, for the first time, obtained high-quality de novo transcriptomes of C. nasus using a high-throughput sequencing approach. Our study lays the foundation for further investigation of C. nasus spawning migration and genome evolution.

  13. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.

    Science.gov (United States)

    Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G

    2016-05-01

    Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Redox Homeostasis via Gene Families of Ascorbate-Glutathione Pathway

    Directory of Open Access Journals (Sweden)

    Prachi ePandey

    2015-03-01

    Full Text Available The imposition of environmental stresses on plants brings about disturbance in their metabolism thereby negatively affecting their growth and development and leading to reduction in the productivity. One of the manifestations of abiotic and biotic stress conditions is the enhanced production of reactive oxygen species (ROS which can be hazardous to cells. Therefore, in order to protect themselves against toxic ROS, plant cells employ the anti-oxidant defense system. The ascorbate-glutathione pathway (Halliwell-Asada cycle is an indispensible component of the ROS homeostasis mechanism of plants. This pathway entails the antioxidant metabolites: ascorbate, glutathione and NADPH along with the enzymes linking them. The ascorbate-glutathione pathway is functional in different subcellular compartments and all the enzymes of this pathway exist as multiple isoforms. The expression of different isoforms of the enzymes of ascorbate-glutathione pathway is developmentally as well as spatially regulated. Moreover, various abiotic and biotic stress conditions modulate the expression of the enzyme- isoforms differently. It is the intricate regulation of expression of different isoforms of the ascorbate-glutathione pathway enzymes that helps in the maintenance of redox balance in plants under various abiotic and biotic stress conditions. The present review provides an insight into the gene families of the ascorbate-glutathione pathway, shedding light on their role in different abiotic and biotic stress conditions as well as in the growth and development of plants.

  15. An ontology approach to comparative phenomics in plants

    KAUST Repository

    Oellrich, Anika

    2015-02-25

    Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics

  16. Evolutionary Origins of the Eukaryotic Shikimate Pathway: Gene Fusions, Horizontal Gene Transfer, and Endosymbiotic Replacements†

    OpenAIRE

    2006-01-01

    Currently the shikimate pathway is reported as a metabolic feature of prokaryotes, ascomycete fungi, apicomplexans, and plants. The plant shikimate pathway enzymes have similarities to prokaryote homologues and are largely active in chloroplasts, suggesting ancestry from the plastid progenitor genome. Toxoplasma gondii, which also possesses an alga-derived plastid organelle, encodes a shikimate pathway with similarities to ascomycete genes, including a five-enzyme pentafunctional arom. These ...

  17. Markov Chain Ontology Analysis (MCOA)

    Science.gov (United States)

    2012-01-01

    Background Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data. Results In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods. Conclusion A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches

  18. An ontology for microbial phenotypes.

    Science.gov (United States)

    Chibucos, Marcus C; Zweifel, Adrienne E; Herrera, Jonathan C; Meza, William; Eslamfam, Shabnam; Uetz, Peter; Siegele, Deborah A; Hu, James C; Giglio, Michelle G

    2014-11-30

    Phenotypic data are routinely used to elucidate gene function in organisms amenable to genetic manipulation. However, previous to this work, there was no generalizable system in place for the structured storage and retrieval of phenotypic information for bacteria. The Ontology of Microbial Phenotypes (OMP) has been created to standardize the capture of such phenotypic information from microbes. OMP has been built on the foundations of the Basic Formal Ontology and the Phenotype and Trait Ontology. Terms have logical definitions that can facilitate computational searching of phenotypes and their associated genes. OMP can be accessed via a wiki page as well as downloaded from SourceForge. Initial annotations with OMP are being made for Escherichia coli using a wiki-based annotation capture system. New OMP terms are being concurrently developed as annotation proceeds. We anticipate that diverse groups studying microbial genetics and associated phenotypes will employ OMP for standardizing microbial phenotype annotation, much as the Gene Ontology has standardized gene product annotation. The resulting OMP resource and associated annotations will facilitate prediction of phenotypes for unknown genes and result in new experimental characterization of phenotypes and functions.

  19. Simple Ontology Format (SOFT)

    Energy Technology Data Exchange (ETDEWEB)

    2011-10-01

    Simple Ontology Format (SOFT) library and file format specification provides a set of simple tools for developing and maintaining ontologies. The library, implemented as a perl module, supports parsing and verification of the files in SOFt format, operations with ontologies (adding, removing, or filtering of entities), and converting of ontologies into other formats. SOFT allows users to quickly create ontologies using only a basic text editor, verify it, and portray it in a graph layout system using customized styles.

  20. Pathways-driven sparse regression identifies pathways and genes associated with high-density lipoprotein cholesterol in two Asian cohorts.

    Directory of Open Access Journals (Sweden)

    Matt Silver

    2013-11-01

    Full Text Available Standard approaches to data analysis in genome-wide association studies (GWAS ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK

  1. Applying the functional abnormality ontology pattern to anatomical functions

    Directory of Open Access Journals (Sweden)

    Hoehndorf Robert

    2010-03-01

    Full Text Available Abstract Background Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions. Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately. Results We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions. Conclusions Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities. Availability The source code and results of our analysis are available at http://bioonto.de.

  2. Temporal network based analysis of cell specific vein graft transcriptome defines key pathways and hub genes in implantation injury.

    Directory of Open Access Journals (Sweden)

    Manoj Bhasin

    Full Text Available Vein graft failure occurs between 1 and 6 months after implantation due to obstructive intimal hyperplasia, related in part to implantation injury. The cell-specific and temporal response of the transcriptome to vein graft implantation injury was determined by transcriptional profiling of laser capture microdissected endothelial cells (EC and medial smooth muscle cells (SMC from canine vein grafts, 2 hours (H to 30 days (D following surgery. Our results demonstrate a robust genomic response beginning at 2 H, peaking at 12-24 H, declining by 7 D, and resolving by 30 D. Gene ontology and pathway analyses of differentially expressed genes indicated that implantation injury affects inflammatory and immune responses, apoptosis, mitosis, and extracellular matrix reorganization in both cell types. Through backpropagation an integrated network was built, starting with genes differentially expressed at 30 D, followed by adding upstream interactive genes from each prior time-point. This identified significant enrichment of IL-6, IL-8, NF-κB, dendritic cell maturation, glucocorticoid receptor, and Triggering Receptor Expressed on Myeloid Cells (TREM-1 signaling, as well as PPARα activation pathways in graft EC and SMC. Interactive network-based analyses identified IL-6, IL-8, IL-1α, and Insulin Receptor (INSR as focus hub genes within these pathways. Real-time PCR was used for the validation of two of these genes: IL-6 and IL-8, in addition to Collagen 11A1 (COL11A1, a cornerstone of the backpropagation. In conclusion, these results establish causality relationships clarifying the pathogenesis of vein graft implantation injury, and identifying novel targets for its prevention.

  3. Standardized Markerless Gene Integration for Pathway Engineering in Yarrowia lipolytica.

    Science.gov (United States)

    Schwartz, Cory; Shabbir-Hussain, Murtaza; Frogue, Keith; Blenner, Mark; Wheeldon, Ian

    2016-12-22

    The yeast Yarrowia lipolytica is a promising microbial host due to its native capacity to produce lipid-based chemicals. Engineering stable production strains requires genomic integration of modified genes, avoiding episomal expression that requires specialized media to maintain selective pressures. Here, we develop a CRISPR-Cas9-based tool for targeted, markerless gene integration into the Y. lipolytica genome. A set of genomic loci was screened to identify sites that were accepting of gene integrations without impacting cell growth. Five sites were found to meet these criteria. Expression levels from a GFP expression cassette were consistent when inserted into AXP, XPR2, A08, and D17, with reduced expression from MFE1. The standardized tool is comprised of five pairs of plasmids (one homologous donor plasmid and a CRISPR-Cas9 expression plasmid), with each pair targeting gene integration into one of the characterized sites. To demonstrate the utility of the tool we rapidly engineered a semisynthetic lycopene biosynthesis pathway by integrating four different genes at different loci. The capability to integrate multiple genes without the need for marker recovery and into sites with known expression levels will enable more rapid and reliable pathway engineering in Y. lipolytica.

  4. An integrated analysis of genes and pathways exhibiting metabolic differences between estrogen receptor positive breast cancer cells

    Directory of Open Access Journals (Sweden)

    Davie James R

    2007-09-01

    Full Text Available Abstract Background The sex hormone estrogen (E2 is pivotal to normal mammary gland growth and differentiation and in breast carcinogenesis. In this in silico study, we examined metabolic differences between ER(+ve breast cancer cells during E2 deprivation. Methods Public repositories of SAGE and MA gene expression data generated from E2 deprived ER(+ve breast cancer cell lines, MCF-7 and ZR75-1 were compared with normal breast tissue. We analyzed gene ontology (GO, enrichment, clustering, chromosome localization, and pathway profiles and performed multiple comparisons with cell lines and tumors with different ER status. Results In all GO terms, biological process (BP, molecular function (MF, and cellular component (CC, MCF-7 had higher gene utilization than ZR75-1. Various analyses showed a down-regulated immune function, an up-regulated protein (ZR75-1 and glucose metabolism (MCF-7. A greater percentage of 77 common genes localized to the q arm of all chromosomes, but in ZR75-1 chromosomes 11, 16, and 19 harbored more overexpressed genes. Despite differences in gene utilization (electron transport, proteasome, glycolysis/gluconeogenesis and expression (ribosome in both cells, there was an overall similarity of ZR75-1 with ER(-ve cell lines and ER(+ve/ER(-ve breast tumors. Conclusion This study demonstrates integral metabolic differences may exist within the same cell subtype (luminal A in representative ER(+ve cell line models. Selectivity of gene and pathway usage for strategies such as energy requirement minimization, sugar utilization by ZR75-1 contrasted with MCF-7 cells, expressing genes whose protein products require ATP utilization. Such characteristics may impart aggressiveness to ZR75-1 and may be prognostic determinants of ER(+ve breast tumors.

  5. An Ontology for Insider Threat Indicators Development and Applications

    Science.gov (United States)

    2014-11-01

    J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, et al., " Gene Ontology : tool for the unification of biology," Nature genetics, vol. 25, pp. 25-29...An Ontology for Insider Threat Indicators Development and Applications Daniel L. Costa, Matthew L. Collins, Samuel J. Perl, Michael J. Albrethsen...cert.org Abstract—We describe our ongoing development of an insider threat indicator ontology . Our ontology is intended to serve as a standardized

  6. Contributions to an animal trait ontology.

    Science.gov (United States)

    Hulsegge, B; Smits, M A; te Pas, M F W; Woelders, H

    2012-06-01

    Improved understanding of the biology of traits of livestock species necessitates the use and combination of information that is stored in a variety of different sources such as databases and literature. The ability to effectively combine information from different sources, however, depends on a high level of standardization within and between various resources, at least with respect to the used terminology. Ontologies represent a set of concepts that facilitate standardization of terminology within specific domains of interest. The biological mechanisms underlying quantitative traits of farm animal species related to reproduction and host pathogen interactions are complex and not well understood. This knowledge could be improved through the availability of domain-specific ontologies that provide enhanced possibilities for data annotation, data retrieval, data integration, data exchange, data analysis, and ontology-based searches. Here we describe a framework for domain-specific ontologies and the development of 2 first-generation ontologies: Reproductive Trait and Phenotype Ontology (REPO) and Host Pathogen Interactions Ontology . In these first-generation ontologies, we focused on "female fertility in cattle" and "interactions between pigs and Salmonella". Through this, we contribute to the global initiative toward the development of an Animal Trait Ontology for livestock species. To demonstrate its usefulness, we show how REPO can be used to select candidate genes for fertility.

  7. Margin based ontology sparse vector learning algorithm and applied in biology science.

    Science.gov (United States)

    Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad

    2017-01-01

    In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.

  8. Vitamin D metabolic pathway genes and pancreatic cancer risk.

    Directory of Open Access Journals (Sweden)

    Hannah Arem

    Full Text Available Evidence on the association between vitamin D status and pancreatic cancer risk is inconsistent. This inconsistency may be partially attributable to variation in vitamin D regulating genes. We selected 11 vitamin D-related genes (GC, DHCR7, CYP2R1, VDR, CYP27B1, CYP24A1, CYP27A1, RXRA, CRP2, CASR and CUBN totaling 213 single nucleotide polymorphisms (SNPs, and examined associations with pancreatic adenocarcinoma. Our study included 3,583 pancreatic cancer cases and 7,053 controls from the genome-wide association studies of pancreatic cancer PanScans-I-III. We used the Adaptive Joint Test and the Adaptive Rank Truncated Product statistic for pathway and gene analyses, and unconditional logistic regression for SNP analyses, adjusting for age, sex, study and population stratification. We examined effect modification by circulating vitamin D concentration (≤50, >50 nmol/L for the most significant SNPs using a subset of cohort cases (n = 713 and controls (n = 878. The vitamin D metabolic pathway was not associated with pancreatic cancer risk (p = 0.830. Of the individual genes, none were associated with pancreatic cancer risk at a significance level of p<0.05. SNPs near the VDR (rs2239186, LRP2 (rs4668123, CYP24A1 (rs2762932, GC (rs2282679, and CUBN (rs1810205 genes were the top SNPs associated with pancreatic cancer (p-values 0.008-0.037, but none were statistically significant after adjusting for multiple comparisons. Associations between these SNPs and pancreatic cancer were not modified by circulating concentrations of vitamin D. These findings do not support an association between vitamin D-related genes and pancreatic cancer risk. Future research should explore other pathways through which vitamin D status might be associated with pancreatic cancer risk.

  9. Toll-like receptor signaling in vertebrates: testing the integration of protein, complex, and pathway data in the protein ontology framework.

    Directory of Open Access Journals (Sweden)

    Cecilia Arighi

    Full Text Available The Protein Ontology (PRO provides terms for and supports annotation of species-specific protein complexes in an ontology framework that relates them both to their components and to species-independent families of complexes. Comprehensive curation of experimentally known forms and annotations thereof is expected to expose discrepancies, differences, and gaps in our knowledge. We have annotated the early events of innate immune signaling mediated by Toll-Like Receptor 3 and 4 complexes in human, mouse, and chicken. The resulting ontology and annotation data set has allowed us to identify species-specific gaps in experimental data and possible functional differences between species, and to employ inferred structural and functional relationships to suggest plausible resolutions of these discrepancies and gaps.

  10. Transcription of meiotic-like-pathway genes in Giardia intestinalis

    OpenAIRE

    2008-01-01

    The reproductive mechanism of Giardia intestinalis, considered one of the earliest divergent eukaryotes, has not been fully defined yet. Some evidence supports the hypothesis that Giardia is an exclusively asexual organism with a clonal population structure. However, the high genetic variability, the variation in ploidy during its life cycle, the low heterozygosity and the existence of genes involved in the meiotic-like recombination pathway in the parasite's genome cast doubt on exclusively ...

  11. Gene Expression Profiling of Biological Pathway Alterations by Radiation Exposure

    Directory of Open Access Journals (Sweden)

    Kuei-Fang Lee

    2014-01-01

    Full Text Available Though damage caused by radiation has been the focus of rigorous research, the mechanisms through which radiation exerts harmful effects on cells are complex and not well-understood. In particular, the influence of low dose radiation exposure on the regulation of genes and pathways remains unclear. In an attempt to investigate the molecular alterations induced by varying doses of radiation, a genome-wide expression analysis was conducted. Peripheral blood mononuclear cells were collected from five participants and each sample was subjected to 0.5 Gy, 1 Gy, 2.5 Gy, and 5 Gy of cobalt 60 radiation, followed by array-based expression profiling. Gene set enrichment analysis indicated that the immune system and cancer development pathways appeared to be the major affected targets by radiation exposure. Therefore, 1 Gy radioactive exposure seemed to be a critical threshold dosage. In fact, after 1 Gy radiation exposure, expression levels of several genes including FADD, TNFRSF10B, TNFRSF8, TNFRSF10A, TNFSF10, TNFSF8, CASP1, and CASP4 that are associated with carcinogenesis and metabolic disorders showed significant alterations. Our results suggest that exposure to low-dose radiation may elicit changes in metabolic and immune pathways, potentially increasing the risk of immune dysfunctions and metabolic disorders.

  12. Datamining with Ontologies.

    Science.gov (United States)

    Hoehndorf, Robert; Gkoutos, Georgios V; Schofield, Paul N

    2016-01-01

    The use of ontologies has increased rapidly over the past decade and they now provide a key component of most major databases in biology and biomedicine. Consequently, datamining over these databases benefits from considering the specific structure and content of ontologies, and several methods have been developed to use ontologies in datamining applications. Here, we discuss the principles of ontology structure, and datamining methods that rely on ontologies. The impact of these methods in the biological and biomedical sciences has been profound and is likely to increase as more datasets are becoming available using common, shared ontologies.

  13. Tutorial on Protein Ontology Resources.

    Science.gov (United States)

    Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R; Ross, Karen E; Natale, Darren A

    2017-01-01

    The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.

  14. Comparing Relational and Ontological Triple Stores in Healthcare Domain

    Directory of Open Access Journals (Sweden)

    Ozgu Can

    2017-01-01

    Full Text Available Today’s technological improvements have made ubiquitous healthcare systems that converge into smart healthcare applications in order to solve patients’ problems, to communicate effectively with patients, and to improve healthcare service quality. The first step of building a smart healthcare information system is representing the healthcare data as connected, reachable, and sharable. In order to achieve this representation, ontologies are used to describe the healthcare data. Combining ontological healthcare data with the used and obtained data can be maintained by storing the entire health domain data inside big data stores that support both relational and graph-based ontological data. There are several big data stores and different types of big data sets in the healthcare domain. The goal of this paper is to determine the most applicable ontology data store for storing the big healthcare data. For this purpose, AllegroGraph and Oracle 12c data stores are compared based on their infrastructural capacity, loading time, and query response times. Hence, healthcare ontologies (GENE Ontology, Gene Expression Ontology (GEXO, Regulation of Transcription Ontology (RETO, Regulation of Gene Expression Ontology (REXO are used to measure the ontology loading time. Thereafter, various queries are constructed and executed for GENE ontology in order to measure the capacity and query response times for the performance comparison between AllegroGraph and Oracle 12c triple stores.

  15. The foundational ontology library ROMULUS

    CSIR Research Space (South Africa)

    Khan, ZC

    2013-09-01

    Full Text Available A purpose of a foundational ontology is to solve interoperability issues among domain ontologies and they are used for ontology- driven conceptual data modelling. Multiple foundational ontologies have been developed in recent years, and most of them...

  16. Transcription of meiotic-like-pathway genes in Giardia intestinalis

    Directory of Open Access Journals (Sweden)

    Sandra P Melo

    2008-06-01

    Full Text Available The reproductive mechanism of Giardia intestinalis, considered one of the earliest divergent eukaryotes, has not been fully defined yet. Some evidence supports the hypothesis that Giardia is an exclusively asexual organism with a clonal population structure. However, the high genetic variability, the variation in ploidy during its life cycle, the low heterozygosity and the existence of genes involved in the meiotic-like recombination pathway in the parasite's genome cast doubt on exclusively asexual nature of Giardia. In this work, semiquantitative RT-PCR analysis was used to assess the transcription pattern of three meiosis-like-specific genes involved in homologues recombination: dmc1, hop1 and spo11. The mRNAs were amplified during the parasite's differentiation processes, encystation and excystation, and expression was found at each stage of its life cycle. A semiquantitative assessment also suggests that expression of some of the genes is regulated during encystation process.

  17. Transcription of meiotic-like-pathway genes in Giardia intestinalis.

    Science.gov (United States)

    Melo, Sandra P; Gómez, Vanessa; Castellanos, Isabel C; Alvarado, Magda E; Hernández, Paula C; Gallego, Amanda; Wasserman, Moisés

    2008-06-01

    The reproductive mechanism of Giardia intestinalis, considered one of the earliest divergent eukaryotes, has not been fully defined yet. Some evidence supports the hypothesis that Giardia is an exclusively asexual organism with a clonal population structure. However, the high genetic variability, the variation in ploidy during its life cycle, the low heterozygosity and the existence of genes involved in the meiotic-like recombination pathway in the parasite's genome cast doubt on exclusively asexual nature of Giardia. In this work, semiquantitative RT-PCR analysis was used to assess the transcription pattern of three meiosis-like-specific genes involved in homologues recombination: dmc1, hop1 and spo11. The mRNAs were amplified during the parasite's differentiation processes, encystation and excystation, and expression was found at each stage of its life cycle. A semiquantitative assessment also suggests that expression of some of the genes is regulated during encystation process.

  18. Construction of ontology augmented networks for protein complex prediction.

    Science.gov (United States)

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

    2013-01-01

    Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.

  19. De Novo Transcriptomic Analysis of an Oleaginous Microalga: Pathway Description and Gene Discovery for Production of Next-Generation Biofuels

    Science.gov (United States)

    Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu

    2012-01-01

    Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352

  20. De novo transcriptomic analysis of an oleaginous microalga: pathway description and gene discovery for production of next-generation biofuels.

    Directory of Open Access Journals (Sweden)

    LingLin Wan

    Full Text Available BACKGROUND: Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. RESULTS: We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. CONCLUSIONS: Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  1. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  2. Altered metabolic pathways in clear cell renal cell carcinoma: A meta-analysis and validation study focused on the deregulated genes and their associated networks.

    Science.gov (United States)

    Zaravinos, Apostolos; Pieri, Myrtani; Mourmouras, Nikos; Anastasiadou, Natassa; Zouvani, Ioanna; Delakas, Dimitris; Deltas, Constantinos

    2014-01-01

    Clear cell renal cell carcinoma (ccRCC) is the predominant subtype of renal cell carcinoma (RCC). It is one of the most therapy-resistant carcinomas, responding very poorly or not at all to radiotherapy, hormonal therapy and chemotherapy. A more comprehensive understanding of the deregulated pathways in ccRCC can lead to the development of new therapies and prognostic markers. We performed a meta- analysis of 5 publicly available gene expression datasets and identified a list of co- deregulated genes, for which we performed extensive bioinformatic analysis coupled with experimental validation on the mRNA level. Gene ontology enrichment showed that many proteins are involved in response to hypoxia/oxygen levels and positive regulation of the VEGFR signaling pathway. KEGG analysis revealed that metabolic pathways are mostly altered in ccRCC. Similarly, Ingenuity Pathway Analysis showed that the antigen presentation, inositol metabolism, pentose phosphate, glycolysis/gluconeogenesis and fructose/mannose metabolism pathways are altered in the disease. Cellular growth, proliferation and carbohydrate metabolism, were among the top molecular and cellular functions of the co-deregulated genes. qRT-PCR validated the deregulated expression of several genes in Caki-2 and ACHN cell lines and in a cohort of ccRCC tissues. NNMT and NR3C1 increased expression was evident in ccRCC biopsies from patients using immunohistochemistry. ROC curves evaluated the diagnostic performance of the top deregulated genes in each dataset. We show that metabolic pathways are mostly deregulated in ccRCC and we highlight those being most responsible in its formation. We suggest that these genes are candidate predictive markers of the disease.

  3. Interaction between leptin and leptin receptor in gastric carcinoma: Gene ontology analysis Interacción entre la leptina y su receptor en el carcinoma gástrico: análisis de ontología genética

    Directory of Open Access Journals (Sweden)

    V. Wiwanitkit

    2007-04-01

    Full Text Available Gastric carcinoma is a rare but important malignancy. The link between leptin, a cytokine that is elevated in obese individuals, and cancer development has been proposed. It is noted that leptin and its receptor may play a positive role in the progression in gastric cancer. However, the exact mechanism resulting form the interaction between leptin and leptin receptor has never been clarified. Here, the author used a new gene ontology technology to predict the molecular function and biological process due to the interaction between leptin and leptin receptor. Comparing to leptin and leptin receptor, the leptin-leptin receptor poses the same function and biological process as leptin receptor. This can confirm that leptin receptor has a significant suppressive effect on the expression of leptin. Loss of hormone activity and disturbance of normal cell signaling pathway of leptin can be seen. Blocking of receptor might be rational therapeutic strategy.El carcinoma gástrico es un cáncer muy poco frecuente pero importante. Se ha postulado que la leptina, una citocina que aparece elevada en las personas obesas, está relacionada con el cáncer. Se sabe que la leptina y su receptor pueden desempeñar un papel positivo en la progresión del cáncer gástrico. Sin embargo, nunca se ha dilucidado el mecanismo exacto al que daría lugar la interacción entre la leptina y el receptor de leptina. Aquí, el autor empleó una nueva tecnología de ontología genética para predecir la función molecular y el proceso biológico resultantes de la interacción entre la leptina y su receptor. Frente a la leptina y su receptor, el compuesto leptina-receptor realiza la misma función y el mismo proceso biológico que el receptor de leptina. Esto puede confirmar que el receptor de leptina ejerce un importante efecto supresor sobre la expresión de leptina. Pueden observarse una pérdida de actividad hormonal y la alteración de la vía normal de señalización celular

  4. Molecular pathways: targeting ETS gene fusions in cancer.

    Science.gov (United States)

    Feng, Felix Y; Brenner, J Chad; Hussain, Maha; Chinnaiyan, Arul M

    2014-09-01

    Rearrangements, or gene fusions, involving the ETS family of transcription factors are common driving events in both prostate cancer and Ewing sarcoma. These rearrangements result in pathogenic expression of the ETS genes and trigger activation of transcriptional programs enriched for invasion and other oncogenic features. Although ETS gene fusions represent intriguing therapeutic targets, transcription factors, such as those comprising the ETS family, have been notoriously difficult to target. Recently, preclinical studies have demonstrated an association between ETS gene fusions and components of the DNA damage response pathway, such as PARP1, the catalytic subunit of DNA protein kinase (DNAPK), and histone deactylase 1 (HDAC1), and have suggested that ETS fusions may confer sensitivity to inhibitors of these DNA repair proteins. In this review, we discuss the role of ETS fusions in cancer, the preclinical rationale for targeting ETS fusions with inhibitors of PARP1, DNAPK, and HDAC1, as well as ongoing clinical trials targeting ETS gene fusions. ©2014 American Association for Cancer Research.

  5. Gene pathways that delay Caenorhabditis elegans reproductive senescence.

    Directory of Open Access Journals (Sweden)

    Meng C Wang

    2014-12-01

    Full Text Available Reproductive senescence is a hallmark of aging. The molecular mechanisms regulating reproductive senescence and its association with the aging of somatic cells remain poorly understood. From a full genome RNA interference (RNAi screen, we identified 32 Caenorhabditis elegans gene inactivations that delay reproductive senescence and extend reproductive lifespan. We found that many of these gene inactivations interact with insulin/IGF-1 and/or TGF-β endocrine signaling pathways to regulate reproductive senescence, except nhx-2 and sgk-1 that modulate sodium reabsorption. Of these 32 gene inactivations, we also found that 19 increase reproductive lifespan through their effects on oocyte activities, 8 of them coordinate oocyte and sperm functions to extend reproductive lifespan, and 5 of them can induce sperm humoral response to promote reproductive longevity. Furthermore, we examined the effects of these reproductive aging regulators on somatic aging. We found that 5 of these gene inactivations prolong organismal lifespan, and 20 of them increase healthy life expectancy of an organism without altering total life span. These studies provide a systemic view on the genetic regulation of reproductive senescence and its intersection with organism longevity. The majority of these newly identified genes are conserved, and may provide new insights into age-associated reproductive senescence during human aging.

  6. Automation of gene assignments to metabolic pathways using high-throughput expression data

    Directory of Open Access Journals (Sweden)

    Yona Golan

    2005-08-01

    Full Text Available Abstract Background Accurate assignment of genes to pathways is essential in order to understand the functional role of genes and to map the existing pathways in a given genome. Existing algorithms predict pathways by extrapolating experimental data in one organism to other organisms for which this data is not available. However, current systems classify all genes that belong to a specific EC family to all the pathways that contain the corresponding enzymatic reaction, and thus introduce ambiguity. Results Here we describe an algorithm for assignment of genes to cellular pathways that addresses this problem by selectively assigning specific genes to pathways. Our algorithm uses the set of experimentally elucidated metabolic pathways from MetaCyc, together with statistical models of enzyme families and expression data to assign genes to enzyme families and pathways by optimizing correlated co-expression, while minimizing conflicts due to shared assignments among pathways. Our algorithm also identifies alternative ("backup" genes and addresses the multi-domain nature of proteins. We apply our model to assign genes to pathways in the Yeast genome and compare the results for genes that were assigned experimentally. Our assignments are consistent with the experimentally verified assignments and reflect characteristic properties of cellular pathways. Conclusion We present an algorithm for automatic assignment of genes to metabolic pathways. The algorithm utilizes expression data and reduces the ambiguity that characterizes assignments that are based only on EC numbers.

  7. Ontology-based representation and analysis of host-Brucella interactions.

    Science.gov (United States)

    Lin, Yu; Xiang, Zuoshuang; He, Yongqun

    2015-01-01

    Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this study, IDOBRU is used as a platform to model and analyze how the hosts, especially host macrophages, interact with virulent Brucella strains or live attenuated Brucella vaccine strains. Such a study allows us to better integrate and understand intricate Brucella pathogenesis and host immunity mechanisms. Different levels of host-Brucella interactions based on different host cell types and Brucella strains were first defined ontologically. Three important processes of virulent Brucella interacting with host macrophages were represented: Brucella entry into macrophage, intracellular trafficking, and intracellular replication. Two Brucella pathogenesis mechanisms were ontologically represented: Brucella Type IV secretion system that supports intracellular trafficking and replication, and Brucella erythritol metabolism that participates in Brucella intracellular survival and pathogenesis. The host cell death pathway is critical to the outcome of host-Brucella interactions. For better survival and replication, virulent Brucella prevents macrophage cell death. However, live attenuated B. abortus vaccine strain RB51 induces caspase-2-mediated proinflammatory cell death. Brucella-associated cell death processes are represented in IDOBRU. The gene and protein information of 432 manually annotated Brucella virulence factors were represented using the Ontology of Genes and Genomes (OGG) and Protein Ontology (PRO), respectively. Seven inference rules were defined to capture the knowledge of host

  8. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  9. AmiGO: online access to ontology and annotation data

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  10. BiNChE: A web tool and library for chemical enrichment analysis based on the ChEBI ontology

    OpenAIRE

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-01-01

    Background Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology,...

  11. Toxicology ontology perspectives.

    Science.gov (United States)

    Hardy, Barry; Apic, Gordana; Carthew, Philip; Clark, Dominic; Cook, David; Dix, Ian; Escher, Sylvia; Hastings, Janna; Heard, David J; Jeliazkova, Nina; Judson, Philip; Matis-Mitchell, Sherri; Mitic, Dragana; Myatt, Glenn; Shah, Imran; Spjuth, Ola; Tcheremenskaia, Olga; Toldo, Luca; Watson, David; White, Andrew; Yang, Chihae

    2012-01-01

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  12. Differentially expressed genes and signalling pathways are involved in mouse osteoblast-like MC3T3-E1 cells exposed to 17-b estradiol

    Institute of Scientific and Technical Information of China (English)

    Zhen-Zhen Shang; Xin Li; Hui-Qiang Sun; Guo-Ning Xiao; Cun-Wei Wang; Qi Gong

    2014-01-01

    Oestrogen is essential for maintaining bone mass, and it has been demonstrated to induce osteoblast proliferation and bone formation. In this study, complementary DNA (cDNA) microarrays were used to identify and study the expression of novel genes that may be involved in MC3T3-E1 cells’ response to 17-b estradiol. MC3T3-E1 cells were inoculated in minimum essential media alpha (a-MEM) cell culture supplemented with 17-b estradiol at different concentrations and for different time periods. MC3T3-E1 cells treated with 1028 mol?L21 17-b estradiol for 5 days exhibited the highest proliferation and alkaline phosphatase (ALP) activity;thus, this group was chosen for microarray analysis. The harvested RNA was used for microarray hybridisation and subsequent real-time reverse transcription polymerase chain reaction (RT-PCR) to validate the expression levels for selected genes. The microarray results were analysed using both functional and pathway analysis. In this study, microarray analysis detected 5 403 differentially expressed genes, of which 1 996 genes were upregulated and 3 407 genes were downregulated, 1 553 different functional classifications were identified by gene ontology (GO) analysis and 53 different pathways were involved based on pathway analysis. Among the differentially expressed genes, a portion not previously reported to be associated with the osteoblast response to oestrogen was identified. These findings clearly demonstrate that the expression of genes related to osteoblast proliferation, cell differentiation, collagens and transforming growth factor beta (TGF-b)-related cytokines increases, while the expression of genes related to apoptosis and osteoclast differentiation decreases, following the exposure of MC3T3-E1 cells to a-MEM supplemented with 17-b estradiol. Microarray analysis with functional gene classification is critical for a complete understanding of complementary intracellular processes. This microarray analysis provides large

  13. Marker2sequence, mine your QTL regions for candidate genes

    NARCIS (Netherlands)

    Chibon, P.Y.F.R.P.; Schoof, H.; Visser, R.G.F.; Finkers, H.J.

    2012-01-01

    Marker2sequence (M2S) aims at mining quantitative trait loci (QTLs) for candidate genes. For each gene, within the QTL region, M2S uses data integration technology to integrate putative gene function with associated gene ontology terms, proteins, pathways and literature. As a typical QTL region

  14. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta d...... classification systems and meta data taxonomies, should be based on ontologies.......What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...

  15. MicroRNA-gene signaling pathways in pancreatic cancer

    Directory of Open Access Journals (Sweden)

    Alexandra Drakaki

    2013-10-01

    Full Text Available Pancreatic cancer is the fourth most frequent cause of cancer-related deaths and is characterized by early metastasis and pronounced resistance to chemotherapy and radiation therapy. Despite extensive esearch efforts, there is not any substantial progress regarding the identification of novel drugs against pancreatic cancer. Although the introduction of the chemotherapeutic agent gemcitabine improved clinical response, the prognosis of these patients remained extremely poor with a 5-year survival rate of 3-5%. Thus, the identification of the novel molecular pathways involved in pancreatic oncogenesis and the development of new and potent therapeutic options are highly desirable. Here, we describe how microRNAs control signaling pathways that are frequently deregulated during pancreatic oncogenesis. In addition, we provide evidence that microRNAs could be potentially used as novel pancreatic cancer therapeutics through reversal of chemotherapy and radiotherapy resistance or regulation of essential molecular pathways. Further studies should integrate the deregulated genes and microRNAs into molecular networks in order to identify the central regulators of pancreatic oncogenesis. Targeting these central regulators could lead to the development of novel targeted therapeutic approaches for pancreatic cancer patients.

  16. Characterization of purine catabolic pathway genes in coelacanths.

    Science.gov (United States)

    Forconi, Mariko; Biscotti, Maria Assunta; Barucca, Marco; Buonocore, Francesco; De Moro, Gianluca; Fausto, Anna Maria; Gerdol, Marco; Pallavicini, Alberto; Scapigliati, Giuseppe; Schartl, Manfred; Olmo, Ettore; Canapa, Adriana

    2014-09-01

    Coelacanths are a critically valuable species to explore the gene changes that took place in the transition from aquatic to terrestrial life. One interesting and biologically relevant feature of the genus Latimeria is ureotelism. However not all urea is excreted from the body; in fact high concentrations are retained in plasma and seem to be involved in osmoregulation. The purine catabolic pathway, which leads to urea production in Latimeria, has progressively lost some steps, reflecting an enzyme loss during diversification of terrestrial species. We report the results of analyses of the liver and testis transcriptomes of the Indonesian coelacanth Latimeria menadoensis and of the genome of Latimeria chalumnae, which has recently been fully sequenced in the framework of the coelacanth genome project. We describe five genes, uricase, 5-hydroxyisourate hydrolase, parahox neighbor B, allantoinase, and allantoicase, each coding for one of the five enzymes involved in urate degradation to urea, and report the identification of a putative second form of 5-hydroxyisourate hydrolase that is characteristic of the genus Latimeria. The present data also highlight the activity of the complete purine pathway in the coelacanth liver and suggest its involvement in the maintenance of high plasma urea concentrations.

  17. Altered Pathway Analyzer: A gene expression dataset analysis tool for identification and prioritization of differentially regulated and network rewired pathways

    Science.gov (United States)

    Kaushik, Abhinav; Ali, Shakir; Gupta, Dinesh

    2017-01-01

    Gene connection rewiring is an essential feature of gene network dynamics. Apart from its normal functional role, it may also lead to dysregulated functional states by disturbing pathway homeostasis. Very few computational tools measure rewiring within gene co-expression and its corresponding regulatory networks in order to identify and prioritize altered pathways which may or may not be differentially regulated. We have developed Altered Pathway Analyzer (APA), a microarray dataset analysis tool for identification and prioritization of altered pathways, including those which are differentially regulated by TFs, by quantifying rewired sub-network topology. Moreover, APA also helps in re-prioritization of APA shortlisted altered pathways enriched with context-specific genes. We performed APA analysis of simulated datasets and p53 status NCI-60 cell line microarray data to demonstrate potential of APA for identification of several case-specific altered pathways. APA analysis reveals several altered pathways not detected by other tools evaluated by us. APA analysis of unrelated prostate cancer datasets identifies sample-specific as well as conserved altered biological processes, mainly associated with lipid metabolism, cellular differentiation and proliferation. APA is designed as a cross platform tool which may be transparently customized to perform pathway analysis in different gene expression datasets. APA is freely available at http://bioinfo.icgeb.res.in/APA. PMID:28084397

  18. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research [v1; ref status: indexed, http://f1000r.es/p5

    Directory of Open Access Journals (Sweden)

    Sebastian Köhler

    2013-02-01

    Full Text Available Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.

  19. pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

    Science.gov (United States)

    Rahmati, Sara; Abovsky, Mark; Pastrello, Chiara; Jurisica, Igor

    2017-01-01

    Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options. PMID:27899558

  20. Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene.

    Science.gov (United States)

    Narise, Takafumi; Sakurai, Nozomu; Obayashi, Takeshi; Ohta, Hiroyuki; Shibata, Daisuke

    2017-06-05

    Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato . The database allows users to predict pathways that are relevant to a

  1. Genes of the mitochondrial apoptotic pathway in Mytilus galloprovincialis.

    Directory of Open Access Journals (Sweden)

    Noelia Estévez-Calvar

    Full Text Available Bivalves play vital roles in marine, brackish, freshwater and terrestrial habitats. In recent years, these ecosystems have become affected through anthropogenic activities. The ecological success of marine bivalves is based on the ability to modify their physiological functions in response to environmental changes. One of the most important mechanisms involved in adaptive responses to environmental and biological stresses is apoptosis, which has been scarcely studied in mollusks, although the final consequence of this process, DNA fragmentation, has been frequently used for pollution monitoring. Environmental stressors induce apoptosis in molluscan cells via an intrinsic pathway. Many of the proteins involved in vertebrate apoptosis have been recognized in model invertebrates; however, this process might not be universally conserved. Mytilus galloprovincialis is presented here as a new model to study the linkage between molecular mechanisms that mediate apoptosis and marine bivalve ecological adaptations. Therefore, it is strictly necessary to identify the key elements involved in bivalve apoptosis. In the present study, six mitochondrial apoptotic-related genes were characterized, and their gene expression profiles following UV irradiation were evaluated. This is the first step for the development of potential biomarkers to assess the biological responses of marine organisms to stress. The results confirmed that apoptosis and, more specifically, the expression of the genes involved in this process can be used to assess the biological responses of marine organisms to stress.

  2. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Directory of Open Access Journals (Sweden)

    Bibby Kyle

    2011-03-01

    Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  3. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Science.gov (United States)

    2011-01-01

    Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:21401935

  4. Candidate pathways and genes for prostate cancer: a meta-analysis of gene expression data

    Directory of Open Access Journals (Sweden)

    Efstathiou Eleni

    2009-08-01

    Full Text Available Abstract Backgound The genetic mechanisms of prostate tumorigenesis remain poorly understood, but with the advent of gene expression array capabilities, we can now produce a large amount of data that can be used to explore the molecular and genetic mechanisms of prostate tumorigenesis. Methods We conducted a meta-analysis of gene expression data from 18 gene array datasets targeting transition from normal to localized prostate cancer and from localized to metastatic prostate cancer. We functionally annotated the top 500 differentially expressed genes and identified several candidate pathways associated with prostate tumorigeneses. Results We found the top differentially expressed genes to be clustered in pathways involving integrin-based cell adhesion: integrin signaling, the actin cytoskeleton, cell death, and cell motility pathways. We also found integrins themselves to be downregulated in the transition from normal prostate tissue to primary localized prostate cancer. Based on the results of this study, we developed a collagen hypothesis of prostate tumorigenesis. According to this hypothesis, the initiating event in prostate tumorigenesis is the age-related decrease in the expression of collagen genes and other genes encoding integrin ligands. This concomitant depletion of integrin ligands leads to the accumulation of ligandless integrin and activation of integrin-associated cell death. To escape integrin-associated death, cells suppress the expression of integrins, which in turn alters the actin cytoskeleton, elevates cell motility and proliferation, and disorganizes prostate histology, contributing to the histologic progression of prostate cancer and its increased metastasizing potential. Conclusion The results of this study suggest that prostate tumor progression is associated with the suppression of integrin-based cell adhesion. Suppression of integrin expression driven by integrin-mediated cell death leads to increased cell

  5. Exploring two plant hosts for expression of diterpenoid pathway genes

    DEFF Research Database (Denmark)

    Bach, Søren Spanner

    Plants produce more than 10.000 diterpenoid compounds of which the large majority is involved in specialized metabolism, while a few are involved in general metabolism. Specialized metabolism diterpenoids have functions in interactions of plants with other organisms and selected ones are utilized...... and aracterization of diTPSs deriving from the plant kingdom, a plant expression host offers several advantages such as the presence of all relevant compartments (plastids and endoplasmic reticulum) and the universal C5 building blocks for isoprenoid biosynthesis. In addition, a plant based xpression host...... is compatible with native codon usage, and through the conserved mechanisms of protein targeting and posttranslational odifications, has the capacity to produce functional enzymes. To further explore plant based expression and characterization of diterpenoid pathway genes, two different plant expression hosts...

  6. PTSD and gene variants: new pathways and new thinking.

    Science.gov (United States)

    Skelton, Kelly; Ressler, Kerry J; Norrholm, Seth D; Jovanovic, Tanja; Bradley-Davino, Bekh

    2012-02-01

    Posttraumatic Stress Disorder (PTSD) is an anxiety disorder which can develop as a result of exposure to a traumatic event and is associated with significant functional impairment. Family and twin studies have found that risk for PTSD is associated with an underlying genetic vulnerability and that more than 30% of the variance associated with PTSD is related to a heritable component. Using a fear conditioning model to conceptualize the neurobiology of PTSD, three primary neuronal systems have been investigated - the hypothalamic-pituitary-adrenal axis, the locus coeruleus-noradrenergic system, and neurocircuitry interconnecting the limbic system and frontal cortex. The majority of the initial investigations into main effects of candidate genes hypothesized to be associated with PTSD risk have been negative, but studies examining the interaction of genetic polymorphisms with specific environments in predicting PTSD have produced several positive results which have increased our understanding of the determinants of risk and resilience in the aftermath of trauma. Promising avenues of inquiry into the role of epigenetic modification have also been proposed to explain the enduring impact of environmental exposures which occur during key, often early, developmental periods on gene expression. Studies of PTSD endophenotypes, which are heritable biomarkers associated with a circumscribed trait within the more complex psychiatric disorder, may be more directly amenable to analysis of the underlying genetics and neural pathways and have provided promising targets for elucidating the neurobiology of PTSD. Knowledge of the genetic underpinnings and neuronal pathways involved in the etiology and maintenance of PTSD will allow for improved targeting of primary prevention amongst vulnerable individuals or populations, as well as timely, targeted treatment interventions. This article is part of a Special Issue entitled 'Post-Traumatic Stress Disorder'. Copyright © 2011 Elsevier

  7. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    Science.gov (United States)

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  8. Differential hexosamine biosynthetic pathway gene expression with type 2 diabetes

    Directory of Open Access Journals (Sweden)

    Megan Coomer

    2014-01-01

    Full Text Available The hexosamine biosynthetic pathway (HBP culminates in the attachment of O-linked β-N-acetylglucosamine (O-GlcNAc onto serine/threonine residues of target proteins. The HBP is regulated by several modulators, i.e. O-linked β-N-acetylglucosaminyl transferase (OGT and β-N-acetylglucosaminidase (OGA catalyze the addition and removal of O-GlcNAc moieties, respectively; while flux is controlled by the rate-limiting enzyme glutamine:fructose-6-phosphate amidotransferase (GFPT, transcribed by two genes, GFPT1 and GFPT2. Since increased HBP flux is glucose-responsive and linked to insulin resistance/type 2 diabetes onset, we hypothesized that diabetic individuals exhibit differential expression of HBP regulatory genes. Volunteers (n = 60; n = 20 Mixed Ancestry, n = 40 Caucasian were recruited from Stellenbosch and Paarl (Western Cape, South Africa and classified as control, pre- or diabetic according to fasting plasma glucose and HbA1c levels, respectively. RNA was purified from leukocytes isolated from collected blood samples and OGT, OGA, GFPT1 and GFPT2 expressions determined by quantitative real-time PCR. The data reveal lower OGA expression in diabetic individuals (P < 0.01, while pre- and diabetic subjects displayed attenuated OGT expression vs. controls (P < 0.01 and P < 0.001, respectively. Moreover, GFPT2 expression decreased in pre- and diabetic Caucasians vs. controls (P < 0.05 and P < 0.01, respectively. We also found ethnic differences, i.e. Mixed Ancestry individuals exhibited a 2.4-fold increase in GFPT2 expression vs. Caucasians, despite diagnosis (P < 0.01. Gene expression of HBP regulators differs between diabetic and non-diabetic individuals, together with distinct ethnic-specific gene profiles. Thus differential HBP gene regulation may offer diagnostic utility and provide candidate susceptibility genes for different ethnic groupings.

  9. Primer on Ontologies.

    Science.gov (United States)

    Hastings, Janna

    2017-01-01

    As molecular biology has increasingly become a data-intensive discipline, ontologies have emerged as an essential computational tool to assist in the organisation, description and analysis of data. Ontologies describe and classify the entities of interest in a scientific domain in a computationally accessible fashion such that algorithms and tools can be developed around them. The technology that underlies ontologies has its roots in logic-based artificial intelligence, allowing for sophisticated automated inference and error detection. This chapter presents a general introduction to modern computational ontologies as they are used in biology.

  10. Kuhn's Ontological Relativism.

    Science.gov (United States)

    Sankey, Howard

    2000-01-01

    Discusses Kuhn's model of scientific theory change. Documents Kuhn's move away from conceptual relativism and rational relativism. Provides an analysis of his present ontological form of relativism. (CCM)

  11. Identifying differentially expressed genes and pathways in two types of non-small cell lung cancer: adenocarcinoma and squamous cell carcinoma.

    Science.gov (United States)

    Liu, J; Yang, X Y; Shi, W J

    2014-01-08

    Non-small cell lung carcinoma, NSCLC, accounts for 80-85% of lung cancers. NSCLC can be mainly divided into two types: adenocarcinoma (ADC) and squamous cell carcinoma (SCC). The purpose of our study was to identify and differentiate the pathogenesis of ADC and SCC at the molecular level. The gene expression profiles of ADC and SCC were downloaded from Gene Expression Omnibus under accession No. GSE10245. Accordingly, differentially expressed genes (DEGs) were identified by the limma package in R language. In addition, DEGs were functionally analyzed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment. A total of 4124 DEGs were identified, including CDK1, CDK2, CDK4, and SKP2. The DEGs were mainly involved in 16 pathways related to cell proliferation, cell signal transduction and metabolism. We conclude that the molecular mechanisms of ADC and SCC are considerably different, and that they are involved in immune response, cell signal transduction, metabolism, cell division, and cell proliferation. Therefore, the two diseases should be treated differently. This study offers new insight into the diagnosis and therapy of these two types of lung cancer.

  12. Gene-Gene Interactions in the Folate Metabolic Pathway and the Risk of Conotruncal Heart Defects

    Directory of Open Access Journals (Sweden)

    Philip J. Lupo

    2010-01-01

    Full Text Available Conotruncal and related heart defects (CTRD are common, complex malformations. Although there are few established risk factors, there is evidence that genetic variation in the folate metabolic pathway influences CTRD risk. This study was undertaken to assess the association between inherited (i.e., case and maternal gene-gene interactions in this pathway and the risk of CTRD. Case-parent triads (n=727, ascertained from the Children's Hospital of Philadelphia, were genotyped for ten functional variants of nine folate metabolic genes. Analyses of inherited genotypes were consistent with the previously reported association between MTHFR A1298C and CTRD (adjusted P=.02, but provided no evidence that CTRD was associated with inherited gene-gene interactions. Analyses of the maternal genotypes provided evidence of a MTHFR C677T/CBS 844ins68 interaction and CTRD risk (unadjusted P=.02. This association is consistent with the effects of this genotype combination on folate-homocysteine biochemistry but remains to be confirmed in independent study populations.

  13. Artificial intelligence techniques for colorectal cancer drug metabolism: ontology and complex network.

    Science.gov (United States)

    Martínez-Romero, Marcos; Vázquez-Naya, José M; Rabuñal, Juan R; Pita-Fernández, Salvador; Macenlle, Ramiro; Castro-Alvariño, Javier; López-Roses, Leopoldo; Ulla, José L; Martínez-Calvo, Antonio V; Vázquez, Santiago; Pereira, Javier; Porto-Pazos, Ana B; Dorado, Julián; Pazos, Alejandro; Munteanu, Cristian R

    2010-05-01

    Colorectal cancer is one of the most frequent types of cancer in the world and generates important social impact. The understanding of the specific metabolism of this disease and the transformations of the specific drugs will allow finding effective prevention, diagnosis and treatment of the colorectal cancer. All the terms that describe the drug metabolism contribute to the construction of ontology in order to help scientists to link the correlated information and to find the most useful data about this topic. The molecular components involved in this metabolism are included in complex network such as metabolic pathways in order to describe all the molecular interactions in the colorectal cancer. The graphical method of processing biological information such as graphs and complex networks leads to the numerical characterization of the colorectal cancer drug metabolic network by using invariant values named topological indices. Thus, this method can help scientists to study the most important elements in the metabolic pathways and the dynamics of the networks during mutations, denaturation or evolution for any type of disease. This review presents the last studies regarding ontology and complex networks of the colorectal cancer drug metabolism and a basic topology characterization of the drug metabolic process sub-ontology from the Gene Ontology.

  14. Transcriptomic analysis in the developing zebrafish embryo after compound exposure: Individual gene expression and pathway regulation

    Energy Technology Data Exchange (ETDEWEB)

    Hermsen, Sanne A.B., E-mail: Sanne.Hermsen@rivm.nl [Centre for Health Protection, National Institute for Public Health and the Environment (RIVM), P.O. Box 1, 3720 BA Bilthoven (Netherlands); Department of Toxicogenomics, Maastricht University, P.O. Box 616, 6200 MD, Maastricht (Netherlands); Institute for Risk Assessment Sciences (IRAS), Utrecht University, P.O. Box 80.178, 3508 TD, Utrecht (Netherlands); Pronk, Tessa E. [Centre for Health Protection, National Institute for Public Health and the Environment (RIVM), P.O. Box 1, 3720 BA Bilthoven (Netherlands); Department of Toxicogenomics, Maastricht University, P.O. Box 616, 6200 MD, Maastricht (Netherlands); Brandhof, Evert-Jan van den [Centre for Environmental Quality, National Institute for Public Health and the Environment (RIVM), P.O. Box 1, 3720 BA Bilthoven (Netherlands); Ven, Leo T.M. van der [Centre for Health Protection, National Institute for Public Health and the Environment (RIVM), P.O. Box 1, 3720 BA Bilthoven (Netherlands); Piersma, Aldert H. [Centre for Health Protection, National Institute for Public Health and the Environment (RIVM), P.O. Box 1, 3720 BA Bilthoven (Netherlands); Institute for Risk Assessment Sciences (IRAS), Utrecht University, P.O. Box 80.178, 3508 TD, Utrecht (Netherlands)

    2013-10-01

    The zebrafish embryotoxicity test is a promising alternative assay for developmental toxicity. Classically, morphological assessment of the embryos is applied to evaluate the effects of compound exposure. However, by applying differential gene expression analysis the sensitivity and predictability of the test may be increased. For defining gene expression signatures of developmental toxicity, we explored the possibility of using gene expression signatures of compound exposures based on commonly expressed individual genes as well as based on regulated gene pathways. Four developmental toxic compounds were tested in concentration-response design, caffeine, carbamazepine, retinoic acid and valproic acid, and two non-embryotoxic compounds, D-mannitol and saccharin, were included. With transcriptomic analyses we were able to identify commonly expressed genes, which were mostly development related, after exposure to the embryotoxicants. We also identified gene pathways regulated by the embryotoxicants, suggestive of their modes of action. Furthermore, whereas pathways may be regulated by all compounds, individual gene expression within these pathways can differ for each compound. Overall, the present study suggests that the use of individual gene expression signatures as well as pathway regulation may be useful starting points for defining gene biomarkers for predicting embryotoxicity. - Highlights: • The zebrafish embryotoxicity test in combination with transcriptomics was used. • We explored two approaches of defining gene biomarkers for developmental toxicity. • Four compounds in concentration-response design were tested. • We identified commonly expressed individual genes as well as regulated gene pathways. • Both approaches seem suitable starting points for defining gene biomarkers.

  15. Genome-wide transcriptional analysis of apoptosis-related genes and pathways regulated by H2AX in lung cancer A549 cells.

    Science.gov (United States)

    Lu, Chengrong; Xiong, Min; Luo, Yuan; Li, Jing; Zhang, Yanjun; Dong, Yaqiong; Zhu, Yanjun; Niu, Tianhui; Wang, Zhe; Duan, Lianning

    2013-09-01

    Histone H2AX is a novel tumor suppressor protein and plays an important role in apoptosis of cancer cells. However, the role of H2AX in lung cancer cells is unclear. The detailed mechanism and epigenetic regulation by H2AX remain elusive in cancer cells. We showed that H2AX was involved in apoptosis of lung cancer A549 cells as in other tumor cells. Knockdown of H2AX strongly suppressed apoptosis of A549 cells. We clarified the molecular mechanisms of apoptosis regulated by H2AX based on genome-wide transcriptional analysis. Microarray data analysis demonstrated that H2AX knockdown in A549 cells affected expression of 3,461 genes, including upregulation of 1,435 and downregulation of 2,026. These differentially expressed genes were subjected to bioinformatic analysis for exploring biological processes regulated by H2AX in lung cancer cells. Gene ontology analysis showed that H2AX affected expression of many genes, through which, many important functions including response to stimuli, gene expression, and apoptosis were involved in apoptotic regulation of lung cancer cells. Pathway analysis identified the mitogen-activated protein kinase signaling pathway and apoptosis as the most important pathways targeted by H2AX. Signal transduction pathway networks analysis and chromatin immunoprecipitation assay showed that two core genes, NFKB1 and JUN, were involved in apoptosis regulated by H2AX in lung cancer cells. Taken together, these data provide compelling clues for further exploration of H2AX function in cancer cells.

  16. Identification of differentially expressed genes and pathways for intramuscular fat deposition in pectoralis major tissues of fast-and slow-growing chickens

    Directory of Open Access Journals (Sweden)

    Cui Huan-Xian

    2012-05-01

    Full Text Available Abstract Background Intramuscular fat (IMF is one of the important factors influencing meat quality, however, for chickens, the molecular regulatory mechanisms underlying this trait have not yet been determined. In this study, a systematic identification of candidate genes and new pathways related to IMF deposition in chicken breast tissue has been made using gene expression profiles of two distinct breeds: Beijing-you (BJY, a slow-growing Chinese breed possessing high meat quality and Arbor Acres (AA, a commercial fast-growing broiler line. Results Agilent cDNA microarray analyses were conducted to determine gene expression profiles of breast muscle sampled at different developmental stages of BJY and AA chickens. Relative to d 1 when there is no detectable IMF, breast muscle at d 21, d 42, d 90 and d 120 (only for BJY contained 1310 differentially expressed genes (DEGs in BJY and 1080 DEGs in AA. Of these, 34–70 DEGs related to lipid metabolism or muscle development processes were examined further in each breed based on Gene Ontology (GO analysis. The expression of several DEGs was correlated, positively or negatively, with the changing patterns of lipid content or breast weight across the ages sampled, indicating that those genes may play key roles in these developmental processes. In addition, based on KEGG pathway analysis of DEGs in both BJY and AA chickens, it was found that in addition to pathways affecting lipid metabolism (pathways for MAPK & PPAR signaling, cell junction-related pathways (tight junction, ECM-receptor interaction, focal adhesion, regulation of actin cytoskeleton, which play a prominent role in maintaining the integrity of tissues, could contribute to the IMF deposition. Conclusion The results of this study identified potential candidate genes associated with chicken IMF deposition and imply that IMF deposition in chicken breast muscle is regulated and mediated not only by genes and pathways related to lipid

  17. Uniform designation for genes of the Calvin-Benson-Bassham reductive pentose phosphate pathway of bacteria

    NARCIS (Netherlands)

    Tabita, F. Robert; Gibson, Janet L.; Bowien, Botho; Dijkhuizen, Lubbert; Meijer, Wilhelmus

    1992-01-01

    Structural and regulatory genes encoding enzymes and proteins of the reductive pentose phosphate pathway have been isolated from a number of bacteria recently. In the phototroph Rhodobacter sphaeroides, and in two chemoautotrophic bacteria, Alcaligenes eutrophus and Xanthobacter flavus, these genes

  18. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO Cellular Component curation

    Directory of Open Access Journals (Sweden)

    Chan Juancarlos

    2009-07-01

    Full Text Available Abstract Background Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%, when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%. From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%. Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given

  19. GeneAnalytics Pathway Analysis and Genetic Overlap among Autism Spectrum Disorder, Bipolar Disorder and Schizophrenia

    Directory of Open Access Journals (Sweden)

    Naveen S. Khanzada

    2017-02-01

    Full Text Available Bipolar disorder (BPD and schizophrenia (SCH show similar neuropsychiatric behavioral disturbances, including impaired social interaction and communication, seen in autism spectrum disorder (ASD with multiple overlapping genetic and environmental influences implicated in risk and course of illness. GeneAnalytics software was used for pathway analysis and genetic profiling to characterize common susceptibility genes obtained from published lists for ASD (792 genes, BPD (290 genes and SCH (560 genes. Rank scores were derived from the number and nature of overlapping genes, gene-disease association, tissue specificity and gene functions subdivided into categories (e.g., diseases, tissues or functional pathways. Twenty-three genes were common to all three disorders and mapped to nine biological Superpathways including Circadian entrainment (10 genes, score = 37.0, Amphetamine addiction (five genes, score = 24.2, and Sudden infant death syndrome (six genes, score = 24.1. Brain tissues included the medulla oblongata (11 genes, score = 2.1, thalamus (10 genes, score = 2.0 and hypothalamus (nine genes, score = 2.0 with six common genes (BDNF, DRD2, CHRNA7, HTR2A, SLC6A3, and TPH2. Overlapping genes impacted dopamine and serotonin homeostasis and signal transduction pathways, impacting mood, behavior and physical activity level. Converging effects on pathways governing circadian rhythms support a core etiological relationship between neuropsychiatric illnesses and sleep disruption with hypoxia and central brain stem dysfunction.

  20. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  1. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  2. Constructive Ontology Engineering

    Science.gov (United States)

    Sousan, William L.

    2010-01-01

    The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…

  3. The Ontology of Disaster.

    Science.gov (United States)

    Thompson, Neil

    1995-01-01

    Explores some key existential or ontological concepts to show their applicability to the complex area of disaster impact as it relates to health and social welfare practice. Draws on existentialist philosophy, particularly that of John Paul Sartre, and introduces some key ontological concepts to show how they specifically apply to the experience…

  4. Students' Ontological Security and Agency in Science Education--An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-01-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from…

  5. Ayurveda research: Ontological challenges.

    Science.gov (United States)

    Nayak, Jayakrishna

    2012-01-01

    Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the 'ASIIA (A Science Initiative In Ayurveda)' projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  6. Ayurveda research: Ontological challenges

    Directory of Open Access Journals (Sweden)

    Jayakrishna Nayak

    2012-01-01

    Full Text Available Collaborative research involving Ayurveda and the current sciences is undoubtedly an imperative and is emerging as an exciting horizon, particularly in basic sciences. Some work in this direction is already going on and outcomes are awaited with bated breath. For instance the ′ASIIA (A Science Initiative In Ayurveda′ projects of Dept of Science and Technology, Govt of India, which include studies such as Ayurvedic Prakriti and Genetics. Further intense and sustained collaborative research needs to overcome a subtle and fundamental challenge-the ontologic divide between Ayurveda and all the current sciences. Ontology, fundamentally, means existence; elaborated, ontology is a particular perspective of an object of existence and the vocabulary developed to share that perspective. The same object of existence is susceptible to several ontologies. Ayurveda and modern biomedical as well as other sciences belong to different ontologies, and as such, collaborative research cannot be carried out at required levels until a mutually acceptable vocabulary is developed.

  7. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    Science.gov (United States)

    2010-01-01

    Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1) and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae). Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http://cgob.ucd.ie. PMID:20459735

  8. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    LENUS (Irish Health Repository)

    Fitzpatrick, David A

    2010-05-10

    Abstract Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB), an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1) and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae). Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging\\/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine) and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http:\\/\\/cgob.ucd.ie.

  9. Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser

    Directory of Open Access Journals (Sweden)

    Byrne Kevin P

    2010-05-01

    Full Text Available Abstract Background Candida species are the most common cause of opportunistic fungal infection worldwide. Recent sequencing efforts have provided a wealth of Candida genomic data. We have developed the Candida Gene Order Browser (CGOB, an online tool that aids comparative syntenic analyses of Candida species. CGOB incorporates all available Candida clade genome sequences including two Candida albicans isolates (SC5314 and WO-1 and 8 closely related species (Candida dubliniensis, Candida tropicalis, Candida parapsilosis, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis, Candida guilliermondii and Candida lusitaniae. Saccharomyces cerevisiae is also included as a reference genome. Results CGOB assignments of homology were manually curated based on sequence similarity and synteny. In total CGOB includes 65617 genes arranged into 13625 homology columns. We have also generated improved Candida gene sets by merging/removing partial genes in each genome. Interrogation of CGOB revealed that the majority of tandemly duplicated genes are under strong purifying selection in all Candida species. We identified clusters of adjacent genes involved in the same metabolic pathways (such as catabolism of biotin, galactose and N-acetyl glucosamine and we showed that some clusters are species or lineage-specific. We also identified one example of intron gain in C. albicans. Conclusions Our analysis provides an important resource that is now available for the Candida community. CGOB is available at http://cgob.ucd.ie.

  10. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Science.gov (United States)

    2011-01-01

    Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches. PMID:21914205

  11. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Directory of Open Access Journals (Sweden)

    Kirsten Toralf

    2011-09-01

    Full Text Available Abstract Background Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. Results We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. Conclusions GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.

  12. The neurological disease ontology.

    Science.gov (United States)

    Jensen, Mark; Cox, Alexander P; Chaudhry, Naveed; Ng, Marcus; Sule, Donat; Duncan, William; Ray, Patrick; Weinstock-Guttman, Bianca; Smith, Barry; Ruttenberg, Alan; Szigeti, Kinga; Diehl, Alexander D

    2013-12-06

    We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer's disease, multiple sclerosis, and stroke. ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms 'disease', 'diagnosis', 'disease course', and 'disorder'. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer's disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker. ND seeks to provide a formal foundation for the representation of clinical and research data

  13. Practical ontologies for information professionals

    CERN Document Server

    AUTHOR|(CDS)2071712

    2016-01-01

    Practical Ontologies for Information Professionals provides an introduction to ontologies and their development, an essential tool for fighting back against information overload. The development of robust and widely used ontologies is an increasingly important tool in the fight against information overload. The publishing and sharing of explicit explanations for a wide variety of conceptualizations, in a machine readable format, has the power to both improve information retrieval and identify new knowledge. This new book provides an accessible introduction to the following: * What is an ontology? Defining the concept and why it is increasingly important to the information professional * Ontologies and the semantic web * Existing ontologies, such as SKOS, OWL, FOAF, schema.org, and the DBpedia Ontology * Adopting and building ontologies, showing how to avoid repetition of work and how to build a simple ontology with Protege * Interrogating semantic web ontologies * The future of ontologies and the role of the ...

  14. Ontologies in biological data visualization.

    Science.gov (United States)

    Carpendale, Sheelagh; Chen, Min; Evanko, Daniel; Gehlenborg, Nils; Gorg, Carsten; Hunter, Larry; Rowland, Francis; Storey, Margaret-Anne; Strobelt, Hendrik

    2014-01-01

    In computer science, an ontology is essentially a graph-based knowledge representation in which each node corresponds to a concept and each edge specifies a relation between two concepts. Ontological development in biology can serve as a focus to discuss the challenges and possible research directions for ontologies in visualization. The principle challenges are the dynamic and evolving nature of ontologies, the ever-present issue of scale, the diversity and richness of the relationships in ontologies, and the need to better understand the relationship between ontologies and the data analysis tasks scientists wish to support. Research directions include visualizing ontologies; visualizing semantically or ontologically annotated texts, documents, and corpora; automated generation of visualizations using ontologies; and visualizing ontological context to support search. Although this discussion uses issues of ontologies in biological data visualization as a springboard, these topics are of general relevance to visualization.

  15. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms

    Directory of Open Access Journals (Sweden)

    Yang Xiang

    2015-01-01

    Full Text Available The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.

  16. Modular Ontology Techniques and their Applications in the Biomedical Domain.

    Science.gov (United States)

    Pathak, Jyotishman; Johnson, Thomas M; Chute, Christopher G

    2008-08-05

    In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing "modules" of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.

  17. Quality control for terms and definitions in ontologies and taxonomies

    Directory of Open Access Journals (Sweden)

    Rüegg Alexander

    2006-04-01

    Full Text Available Abstract Background Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO, the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. Results We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. Conclusion Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.

  18. Identification of canine platelet proteins separated by differential detergent fractionation for nonelectrophoretic proteomics analyzed by Gene Ontology and pathways analysis

    Directory of Open Access Journals (Sweden)

    Trichler SA

    2014-01-01

    Full Text Available Shauna A Trichler,1,* Sandra C Bulla,1,* Nandita Mahajan,1 Kari V Lunsford,2 Ken Pendarvis,3 Bindu Nanduri,4,5 Fiona M McCarthy,3 Camilo Bulla1 1Department of Pathobiology and Population Medicine, 2Department of Clinical Sciences and Animal Health Center, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS, 3Department of Veterinary Science and Microbiology, University of Arizona, Tucson, AZ, 4Department of Biological Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS, 5Institute for Genomics, Biocomputing and Biotechnology, Starkville, MS, USA *These authors contributed equally to this work Abstract: During platelet development, proteins necessary for the many functional roles of the platelet are stored within cytoplasmic granules. Platelets have also been shown to take up and store many plasma proteins into granules. This makes the platelet a potential novel source of biomarkers for many disease states. Approaches to sample preparation for proteomic studies for biomarkers search vary. Compared with traditional two-dimensional polyacrylamide gel electrophoresis systems, nonelectrophoretic proteomics methods that employ offline protein fractionation methods such as the differential detergent fractionation method have clear advantages. Here we report a proteomic survey of the canine platelet proteome using differential detergent fractionation coupled with mass spectrometry and functional modeling of the canine platelet proteins identified. A total of 5,974 unique proteins were identified from platelets, of which only 298 (5% had previous experimental evidence of in vivo expression. The use of offline prefractionation of canine proteins by differential detergent fractionation resulted in greater proteome coverage as compared with previous reports. This initial study contributes to a broader understanding of canine platelet biology and aids functional research, identification of potential treatment targets and biomarkers, and sets a new standard for the resting platelet proteome. Keywords: proteome, differential detergent fractionation, dog, functional analysis, protein

  19. Identification of sugarcane genes involved in the purine synthesis pathway

    Directory of Open Access Journals (Sweden)

    Mario A. Jancso

    2001-12-01

    Full Text Available Nucleotide synthesis is of central importance to all cells. In most organisms, the purine nucleotides are synthesized de novo from non-nucleotide precursors such as amino acids, ammonia and carbon dioxide. An understanding of the enzymes involved in sugarcane purine synthesis opens the possibility of using these enzymes as targets for chemicals which may be effective in combating phytopathogen. Such an approach has already been applied to several parasites and types of cancer. The strategy described in this paper was applied to identify sugarcane clusters for each step of the de novo purine synthesis pathway. Representative sequences of this pathway were chosen from the National Center for Biotechnology Information (NCBI database and used to search the translated sugarcane expressed sequence tag (SUCEST database using the available basic local alignment search tool (BLAST facility. Retrieved clusters were further tested for the statistical significance of the alignment by an implementation (PRSS3 of the Monte Carlo shuffling algorithm calibrated using known protein sequences of divergent taxa along the phylogenetic tree. The sequences were compared to each other and to the sugarcane clusters selected using BLAST analysis, with the resulting table of p-values indicating the degree of divergence of each enzyme within different taxa and in relation to the sugarcane clusters. The results obtained by this strategy allowed us to identify the sugarcane proteins participating in the purine synthesis pathway.A via de síntese de purino nucleotídeos é considerada uma via de central importância para todas as células. Na maioria dos organismos, os purino nucleotídeos são sintetizados ''de novo'' a partir de precursores não-nucleotídicos como amino ácidos, amônia e dióxido de carbono. O conhecimento das enzimas envolvidas na via de síntese de purinas da cana-de-açúcar vai abrir a possibilidade do uso dessas enzimas como alvos no desenho

  20. Ontological foundations for evolutionary economics: A Darwinian social ontology

    NARCIS (Netherlands)

    J.W. Stoelhorst

    2008-01-01

    The purpose of this paper is to further the project of generalized Darwinism by developing a social ontology on the basis of a combined commitment to ontological continuity and ontological commonality. Three issues that are central to the development of a social ontology are addressed: (1) the speci

  1. Multi-species Ontologies of the Craniofacial Musculoskeletal System

    Science.gov (United States)

    Mejino, Jose L.V.; Detwiler, Landon T.; Cox, Timothy C.; Brinkley, James F.

    2017-01-01

    We created the Ontology of Craniofacial Development and Malformation (OCDM) [1] to provide a unifying framework for organizing and integrating craniofacial data ranging from genes to clinical phenotypes from multi-species. Within this framework we focused on spatio-structural representation of anatomical entities related to craniofacial development and malformation, such as craniosynostosis and midface hypoplasia. Animal models are used to support human studies and so we built multi-species ontologies that would allow for cross-species correlation of anatomical information. For this purpose we first developed and enhanced the craniofacial component of the human musculoskeletal system in the Foundational Model of Anatomy Ontology (FMA)[2], and then imported this component, which we call the Craniofacial Human Ontology (CHO), into the OCDM. The CHO was then used as a template to create the anatomy for the mouse, the Craniofacial Mouse Ontology (CMO) as well as for the zebrafish, the Craniofacial Zebrafish Ontology (CZO).

  2. Evolution of variation in presence and absence of genes in bacterial pathways

    Directory of Open Access Journals (Sweden)

    Francis Andrew R

    2012-04-01

    Full Text Available Abstract Background Bacterial genomes exhibit a remarkable degree of variation in the presence and absence of genes, which probably extends to the level of individual pathways. This variation may be a consequence of the significant evolutionary role played by horizontal gene transfer, but might also be explained by the loss of genes through mutation. A challenge is to understand why there would be variation in gene presence within pathways if they confer a benefit only when complete. Results Here, we develop a mathematical model to study how variation in pathway content is produced by horizontal transfer, gene loss and partial exposure of a population to a novel environment. Conclusions We discuss the possibility that variation in gene presence acts as cryptic genetic variation on which selection acts when the appropriate environment occurs. We find that a high level of variation in gene presence can be readily explained by decay of the pathway through mutation when there is no longer exposure to the selective environment, or when selection becomes too weak to maintain the genes. In the context of pathway variation the role of horizontal gene transfer is probably the initial introduction of a complete novel pathway rather than in building up the variation in a genome without the pathway.

  3. Constructing Adverse Outcome Pathways: a Demonstration of ...

    Science.gov (United States)

    Adverse outcome pathway (AOP) provides a conceptual framework to evaluate and integrate chemical toxicity and its effects across the levels of biological organization. As such, it is essential to develop a resource-efficient and effective approach to extend molecular initiating events (MIEs) of chemicals to their downstream phenotypes of a greater regulatory relevance. A number of ongoing public phenomics (high throughput phenotyping) efforts have been generating abundant phenotypic data annotated with ontology terms. These phenotypes can be analyzed semantically and linked to MIEs of interest, all in the context of a knowledge base integrated from a variety of ontologies for various species and knowledge domains. In such analyses, two phenotypic profiles (PPs; anchored by genes or diseases) each characterized by multiple ontology terms are compared for their semantic similarities within a common ontology graph, but across boundaries of species and knowledge domains. Taking advantage of publicly available ontologies and software tool kits, we have implemented an OS-Mapping (Ontology-based Semantics Mapping) approach as a Java application, and constructed a network of 19383 PPs as nodes with edges weighed by their pairwise semantic similarity scores. Individual PPs were assembled from public phenomics data. Out of possible 1.87×108 pairwise connections among these nodes, about 71% of them have similarity scores between 0.2 and the maximum possible of 1.0.

  4. The Pathway From Genes to Gene Therapy in Glaucoma: A Review of Possibilities for Using Genes as Glaucoma Drugs.

    Science.gov (United States)

    Borrás, Teresa

    2017-01-01

    Treatment of diseases with gene therapy is advancing rapidly. The use of gene therapy has expanded from the original concept of re-placing the mutated gene causing the disease to the use of genes to con-trol nonphysiological levels of expression or to modify pathways known to affect the disease. Genes offer numerous advantages over conventional drugs. They have longer duration of action and are more specific. Genes can be delivered to the target site by naked DNA, cells, nonviral, and viral vectors. The enormous progress of the past decade in molecular bi-ology and delivery systems has provided ways for targeting genes to the intended cell/tissue and safe, long-term vectors. The eye is an ideal organ for gene therapy. It is easily accessible and it is an immune-privileged site. Currently, there are clinical trials for diseases affecting practically every tissue of the eye, including those to restore vision in patients with Leber congenital amaurosis. However, the number of eye trials compared with those for systemic diseases is quite low (1.8%). Nevertheless, judg-ing by the vast amount of ongoing preclinical studies, it is expected that such number will increase considerably in the near future. One area of great need for eye gene therapy is glaucoma, where a long-term gene drug would eliminate daily applications and compliance issues. Here, we review the current state of gene therapy for glaucoma and the possibilities for treating the trabecular meshwork to lower intraocular pressure and the retinal ganglion cells to protect them from neurodegeneration.

  5. Integrated GWAS and Pathway profiling for feed efficiency traits in pigs leads to novel genes and their molecular pathways

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Ostersen, Tage; Strathe, Anders Bjerring

    2013-01-01

    . Residual feed intake is a feed efficiency measure and is highly economically important in animal production. In our study, a total of 596 Yorkshire boars had phenotypic and genotypic records. After quality control, 37,915 SNPs were available for GWAS which was implemented in the DMU software package...... is an important step where we firstly detect genes located near GWAS-detected SNPs and subsequently we detect enrichment of these genes in various biological processes and pathways. The objective of this study was to apply these steps to identify relevant pathways involved in residual feed intake (RFI) in pigs...

  6. [ ] Toward an Ontology of Finitude

    Directory of Open Access Journals (Sweden)

    Julia Hölzl

    2011-09-01

    Full Text Available Hölzl palpates an ontology of fracture. Unlike original ontologies that are concerned with essence rather than being, the ontology proposed here does not believe in its originality. This project is concerned with becoming as such rather than with its Wesen. With the indefinite striving for remaining in itself. This ontology is a fissure, fissuring itself.

  7. Perspectives on ontology learning

    CERN Document Server

    Lehmann, J

    2014-01-01

    Perspectives on Ontology Learning brings together researchers and practitioners from different communities − natural language processing, machine learning, and the semantic web − in order to give an interdisciplinary overview of recent advances in ontology learning.Starting with a comprehensive introduction to the theoretical foundations of ontology learning methods, the edited volume presents the state-of-the-start in automated knowledge acquisition and maintenance. It outlines future challenges in this area with a special focus on technologies suitable for pushing the boundaries beyond the c

  8. The sexual and ontology

    Directory of Open Access Journals (Sweden)

    Zupančič Alenka

    2014-01-01

    Full Text Available This paper explores some of the crucial ontological implications of the psychoanalytic theory of sexuality in its Freudo-Lacanian orientation. As irreducible to different sexual practices and contents, the concept of sexuality obtains conceptual weight that makes it particularly relevant for philosophical ontological thinking. Starting from the hypothesis that something about sexuality is constitutively unconscious - that is to say, existing only in the form of the unconscious - the paper points at the singular short-circuit of the epistemological and ontological level which is at work in psychoanalytic theory, and which cannot be neglected in philosophical examination of the relation between knowledge and being.

  9. Data mining for ontology development.

    Energy Technology Data Exchange (ETDEWEB)

    Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC); Bollinger, James (Savannah River National Laboratory, Aiken, SC); Ghosh, Vinita (Brookhaven National Laboratory, Upton, NY); Sorokine, Alexandre (Oak Ridge National Laboratory, Oak Ridge, TN); Ferrell, Regina (Oak Ridge National Laboratory, Oak Ridge, TN); Ward, Richard (Oak Ridge National Laboratory, Oak Ridge, TN); Schoenwald, David Alan

    2010-06-01

    A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.

  10. Towards automated biomedical ontology harmonization.

    Science.gov (United States)

    Uribe, Gustavo A; Lopez, Diego M; Blobel, Bernd

    2014-01-01

    The use of biomedical ontologies is increasing, especially in the context of health systems interoperability. Ontologies are key pieces to understand the semantics of information exchanged. However, given the diversity of biomedical ontologies, it is essential to develop tools that support harmonization processes amongst them. Several algorithms and tools are proposed by computer scientist for partially supporting ontology harmonization. However, these tools face several problems, especially in the biomedical domain where ontologies are large and complex. In the harmonization process, matching is a basic task. This paper explains the different ontology harmonization processes, analyzes existing matching tools, and proposes a prototype of an ontology harmonization service. The results demonstrate that there are many open issues in the field of biomedical ontology harmonization, such as: overcoming structural discrepancies between ontologies; the lack of semantic algorithms to automate the process; the low matching efficiency of existing algorithms; and the use of domain and top level ontologies in the matching process.

  11. Differential gene expression by fiber-optic beadarray and pathway in adrenocorticotrophin-secreting pituitary adenomas

    Institute of Scientific and Technical Information of China (English)

    JIANG Zhi-quan; GUI Song-bo; ZHANG Ya-zhuo

    2010-01-01

    Background Adrenocorticotrophin (ACTH)-secreting pituitary adenomas account for approximately 7%-14% of all pituitary adenomas, but its pathogenesis is still enigmatic. This study aimed to explore mechanisms underlying the pathogenesis of ACTH-secreting pituitary adenomas.Methods We used fiber-optic beadarray to examine gene expression in three ACTH-secreting adenomas compared with three normal pituitaries. Four differentially expressed genes from the three ACTH-secreting adenomas and three normal pituitaries were chosen randomly for validation by reverse transcriptase-real time quantitative polymerase chain reaction (RT-qPCR). We then analyzed the differentially expressed gene profile with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.Results Fiber-optic beadarray analysis showed that the expression of 28 genes and 8 expressed sequence tags (ESTs)were significantly increased and the expression of 412 genes and 31 ESTs were significantly decreased. Bioinformatic and pathway analysis showed that the genes HIGD1B, EPS8, HPGD, DAPK2, and IGFBP3 and the transforming growth factor (TGF)-β signaling pathway and extracellular matrix (ECM)-receptor interaction pathway may play important roles in tumorigenesis and progression of ACTH-secreting pituitary adenomas.Conclusions Our data suggest that numerous aberrantly expressed genes and several pathways are involved in the pathogenesis of ACTH-secreting pituitary adenomas. Fiber-optic beadarray combined with pathway analysis of differential gene expression appears to be a valid method of investigating tumour pathogenesis.

  12. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  13. A Method for Evaluating and Standardizing Ontologies

    Science.gov (United States)

    Seyed, Ali Patrice

    2012-01-01

    The Open Biomedical Ontology (OBO) Foundry initiative is a collaborative effort for developing interoperable, science-based ontologies. The Basic Formal Ontology (BFO) serves as the upper ontology for the domain-level ontologies of OBO. BFO is an upper ontology of types as conceived by defenders of realism. Among the ontologies developed for OBO…

  14. Students' Ontological Security and Agency in Science Education—An Example from Reasoning about the Use of Gene Technology

    Science.gov (United States)

    Lindahl, Mats Gunnar; Linder, Cedric

    2013-09-01

    This paper reports on a study of how students' reasoning about socioscientific issues is framed by three dynamics: societal structures, agency and how trust and security issues are handled. Examples from gene technology were used as the forum for interviews with 13 Swedish high-school students (year 11, age 17-18). A grid based on modalities from the societal structures described by Giddens was used to structure the analysis. The results illustrate how the participating students used both modalities for 'Legitimation' and 'Domination' to justify positions that accept or reject new technology. The analysis also showed how norms and knowledge can be used to justify opposing positions in relation to building trust in science and technology, or in democratic decisions expected to favour personal norms. Here, students accepted or rejected the authority of experts based on perceptions of the knowledge base that the authority was seen to be anchored in. Difficulty in discerning between material risks (reduced safety) and immaterial risks (loss of norms) was also found. These outcomes are used to draw attention to the educational challenges associated with students' using knowledge claims (Domination) to support norms (Legitimation) and how this is related to the development of a sense of agency in terms of sharing norms with experts or with laymen.

  15. The design ontology

    DEFF Research Database (Denmark)

    Storga, Mario; Andreasen, Mogens Myrup; Marjanovic, Dorian

    2010-01-01

    The article presents the research of the nature, building and practical role of a Design Ontology as a potential framework for the more efficient product development (PD) data-, information- and knowledge- description, -explanation, -understanding and -reusing. In the methodology for development...... of the ontology two steps could be identified: empirical research and computer implementation. Empirical research has included domain documentation analysis (Genetic Design Model System developed by Mortensen 1999), identification of the key concepts and relations between them, and categorisation of the concepts...... and relations into taxonomies. As an epistemological foundation for the concepts formalisation, The Suggested Upper Merged Ontology (SUMO) proposed by IEEE, was reused. As the result of the previously described process, the ontology content has been categorised into six main subcategories divided between...

  16. Ontologies for Bioinformatics

    Directory of Open Access Journals (Sweden)

    Agnieszka Leszczynski

    2008-01-01

    Full Text Available The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.

  17. Mechanisms in biomedical ontology

    National Research Council Canada - National Science Library

    Röhl, Johannes

    2012-01-01

    .... Taking some hints from an "ontology of devices" I suggest as a general approach for this task the introduction of functional kinds and functional parts by which the particular relations between a mechanism and its components can be captured.

  18. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Directory of Open Access Journals (Sweden)

    José Cuenca

    Full Text Available Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR to map a genome region linked to Alternaria brown spot (ABS resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  19. Genetically based location from triploid populations and gene ontology of a 3.3-mb genome region linked to Alternaria brown spot resistance in citrus reveal clusters of resistance genes.

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids.

  20. Genetically Based Location from Triploid Populations and Gene Ontology of a 3.3-Mb Genome Region Linked to Alternaria Brown Spot Resistance in Citrus Reveal Clusters of Resistance Genes

    Science.gov (United States)

    Cuenca, José; Aleza, Pablo; Vicent, Antonio; Brunel, Dominique; Ollitrault, Patrick; Navarro, Luis

    2013-01-01

    Genetic analysis of phenotypical traits and marker-trait association in polyploid species is generally considered as a challenge. In the present work, different approaches were combined taking advantage of the particular genetic structures of 2n gametes resulting from second division restitution (SDR) to map a genome region linked to Alternaria brown spot (ABS) resistance in triploid citrus progeny. ABS in citrus is a serious disease caused by the tangerine pathotype of the fungus Alternaria alternata. This pathogen produces ACT-toxin, which induces necrotic lesions on fruit and young leaves, defoliation and fruit drop in susceptible genotypes. It is a strong concern for triploid breeding programs aiming to produce seedless mandarin cultivars. The monolocus dominant inheritance of susceptibility, proposed on the basis of diploid population studies, was corroborated in triploid progeny. Bulk segregant analysis coupled with genome scan using a large set of genetically mapped SNP markers and targeted genetic mapping by half tetrad analysis, using SSR and SNP markers, allowed locating a 3.3 Mb genomic region linked to ABS resistance near the centromere of chromosome III. Clusters of resistance genes were identified by gene ontology analysis of this genomic region. Some of these genes are good candidates to control the dominant susceptibility to the ACT-toxin. SSR and SNP markers were developed for efficient early marker-assisted selection of ABS resistant hybrids. PMID:24116149

  1. Manufacturing ontology through templates

    Directory of Open Access Journals (Sweden)

    Diciuc Vlad

    2017-01-01

    Full Text Available The manufacturing industry contains a high volume of knowhow and of high value, much of it being held by key persons in the company. The passing of this know-how is the basis of manufacturing ontology. Among other methods like advanced filtering and algorithm based decision making, one way of handling the manufacturing ontology is via templates. The current paper tackles this approach and highlights the advantages concluding with some recommendations.

  2. Ontology alignment with OLA

    OpenAIRE

    Euzenat, Jérôme; Loup, David; Touzani, Mohamed; Valtchev, Petko

    2004-01-01

    euzenat2004d; International audience; Using ontologies is the standard way to achieve interoperability of heterogeneous systems within the Semantic web. However, as the ontologies underlying two systems are not necessarily compatible, they may in turn need to be aligned. Similarity-based approaches to alignment seems to be both powerful and flexible enough to match the expressive power of languages like OWL. We present an alignment tool that follows the similarity-based paradigm, called OLA. ...

  3. Ontology Usage at ZFIN

    CERN Document Server

    Howe, Doug

    2010-01-01

    The Zebrafish Model Organism Database (ZFIN) provides a Web resource of zebrafish genomic, genetic, developmental, and phenotypic data. Four different ontologies are currently used to annotate data to the most specific term available facilitating a better comparison between inter-species data. In addition, ontologies are used to help users find and cluster data more quickly without the need of knowing the exact technical name for a term.

  4. Characterization of genes and pathways that respond to heat stress in Holstein calves through transcriptome analysis.

    Science.gov (United States)

    Srikanth, Krishnamoorthy; Kwon, Anam; Lee, Eunjin; Chung, Hoyoung

    2017-01-01

    This study aimed to investigate the genes and pathways that respond to heat stress in Holstein bull calves exposed to severe ranges of temperature and humidity. A total of ten animals from 4 to 6 months of age were subjected to heat stress at 37 °C and 90 % humidity for 12 h. Skin and rectal temperatures were measured before and after heat stress; while no correlation was found between them before heat stress, a moderate correlation was detected after heat stress, confirming rectal temperature to be a better barometer for monitoring heat stress. RNAseq analysis identified 8567 genes to be differentially regulated, out of which 465 genes were significantly upregulated (≥2-fold, P heat stress. Significant terms and pathways enriched in response to heat stress included chaperones, cochaperones, cellular response to heat stress, phosphorylation, kinase activation, immune response, apoptosis, Toll-like receptor signaling pathway, Pi3K/AKT activation, protein processing in endoplasmic reticulum, interferon signaling, pathways in cancer, estrogen signaling pathway, and MAPK signaling pathway. The differentially expressed genes were validated by quantitative real-time PCR analysis, which confirmed the tendency of the expression. The genes and pathways identified in this analysis extend our understanding of transcriptional response to heat stress and their likely functioning in adapting the animal to hyperthermic stress. The identified genes could be used as candidate genes for association studies to select and breed animals with improved heat tolerance.

  5. Association and gene-gene interactions study of reelin signaling pathway related genes with autism in the Han Chinese population.

    Science.gov (United States)

    Shen, Yidong; Xun, Guanglei; Guo, Hui; He, Yiqun; Ou, Jianjun; Dong, Huixi; Xia, Kun; Zhao, Jingping

    2016-04-01

    Autism is a neurodevelopmental disorder with unclear etiology. Reelin had been proposed to participate in the etiology of autism due to its important role in brain development. The goal of this study was to explore the association and gene-gene interactions of reelin signaling pathway related genes (RELN, VLDLR, LRP8, DAB1, FYN, and CDK5) with autism in Han Chinese population. Genotyping data of the six genes were obtained from a recent genome-wide association study performed in 430 autistic children who fulfilled the DSM-IV-TR criteria for autistic disorder, and 1,074 healthy controls. Single marker case-control association analysis and haplotype case-control association analysis were conducted after the data was screened. Multifactor dimensionality reduction (MDR) was applied to further test gene-gene interactions. Neither the single marker nor the haplotype association tests found any significant difference between the autistic group and the control group after permutation test of 1,000 rounds. The 4-locus MDR model (comprising rs6143734, rs1858782, rs634500, and rs1924267 which belong to RELN and DAB1) was determined to be the model with the highest cross-validation consistency (CVC) and testing balanced accuracy. The results indicate that an interaction between RELN and DAB1 may increase the risk of autism in the Han Chinese population. Furthermore, it can also be inferred that the involvement of RELN in the etiology of autism would occur through interaction with DAB1.

  6. Applications of ontology design patterns in biomedical ontologies.

    Science.gov (United States)

    Mortensen, Jonathan M; Horridge, Matthew; Musen, Mark A; Noy, Natalya F

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited.

  7. Applications of Ontology Design Patterns in Biomedical Ontologies

    Science.gov (United States)

    Mortensen, Jonathan M.; Horridge, Matthew; Musen, Mark A.; Noy, Natalya F.

    2012-01-01

    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited. PMID:23304337

  8. Integrated GWAS and Pathway profiling for feed efficiency traits in pigs leads to novel genes and their molecular pathways

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Ostersen, Tage; Strathe, Anders Bjerring

    is an important step where we firstly detect genes located near GWAS-detected SNPs and subsequently we detect enrichment of these genes in various biological processes and pathways. The objective of this study was to apply these steps to identify relevant pathways involved in residual feed intake (RFI) in pigs....... Residual feed intake is a feed efficiency measure and is highly economically important in animal production. In our study, a total of 596 Yorkshire boars had phenotypic and genotypic records. After quality control, 37,915 SNPs were available for GWAS which was implemented in the DMU software package...... pathway are known to be involved in biological processes closely related to regulation of feed intake or residual feed intake. These results provide insights into the genetic architecture as well as the systems biological mechanisms of this complex trait in pigs....

  9. Integrated GWAS and Pathway profiling for feed efficiency traits in pigs leads to novel genes and their molecular pathways

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Ostersen, Tage; Strathe, Anders Bjerring

    2013-01-01

    is an important step where we firstly detect genes located near GWAS-detected SNPs and subsequently we detect enrichment of these genes in various biological processes and pathways. The objective of this study was to apply these steps to identify relevant pathways involved in residual feed intake (RFI) in pigs....... Residual feed intake is a feed efficiency measure and is highly economically important in animal production. In our study, a total of 596 Yorkshire boars had phenotypic and genotypic records. After quality control, 37,915 SNPs were available for GWAS which was implemented in the DMU software package...... pathway are known to be involved in biological processes closely related to regulation of feed intake or residual feed intake. These results provide insights into the genetic architecture as well as the systems biological mechanisms of this complex trait in pigs....

  10. Identification of gene networks and pathways associated with Guillain-Barre syndrome.

    Directory of Open Access Journals (Sweden)

    Kuo-Hsuan Chang

    Full Text Available BACKGROUND: The underlying change of gene network expression of Guillain-Barré syndrome (GBS remains elusive. We sought to identify GBS-associated gene networks and signaling pathways by analyzing the transcriptional profile of leukocytes in the patients with GBS. METHODS AND FINDINGS: Quantitative global gene expression microarray analysis of peripheral blood leukocytes was performed on 7 patients with GBS and 7 healthy controls. Gene expression profiles were compared between patients and controls after standardization. The set of genes that significantly correlated with GBS was further analyzed by Ingenuity Pathways Analyses. 256 genes and 18 gene networks were significantly associated with GBS (fold change ≥2, P<0.05. FOS, PTGS2, HMGB2 and MMP9 are the top four of 246 significantly up-regulated genes. The most significant disease and altered biological function genes associated with GBS were those involved in inflammatory response, infectious disease, and respiratory disease. Cell death, cellular development and cellular movement were the top significant molecular and cellular functions involved in GBS. Hematological system development and function, immune cell trafficking and organismal survival were the most significant GBS-associated function in physiological development and system category. Several hub genes, such as MMP9, PTGS2 and CREB1 were identified in the associated gene networks. Canonical pathway analysis showed that GnRH, corticotrophin-releasing hormone and ERK/MAPK signaling were the most significant pathways in the up-regulated gene set in GBS. CONCLUSIONS: This study reveals the gene networks and canonical pathways associated with GBS. These data provide not only networks between the genes for understanding the pathogenic properties of GBS but also map significant pathways for the future development of novel therapeutic strategies.

  11. ENRICHMENT OF OBO ONTOLOGIES

    Science.gov (United States)

    Bada, Michael; Hunter, Lawrence

    2006-01-01

    This paper describes a frame-based integration of the three GO subontologies, the Chemicals of Biological Interest ontology (ChEBI), and the Cell Type Ontology (CTO) in which relationships between elements of the ontologies are modeled in a way that better captures the relational semantics between biological concepts represented by the terms, rather than between the terms themselves, than previous frame-based efforts. We also describe a methodology for creating suggested enriching assertions of the form (subject, relationship, object) by identifying patterns in GO terms, mapping these patterns and subpatterns to relationships, matching concepts to these patterns and subpatterns, and integrating these assertions into the ontologies. Using this methodology, a large number of reliable assertions linking previously unlinked OBO terms using a wide variety of specific, hierarchically arranged relationships were created: A predicted assertion was made for 62% of GO terms that matched one of 31 patterns, and 97% of these predicted assertions were assessed to be valid; a further 429 assertions (corresponding to 6% of the matching terms) were manually created, resulting in an initial set of 4,497 assertions. Furthermore, this methodology programmatically integrates assertions into a base ontology such that each assertion is fully consistent with respect to higher (i.e., more general) relevant class and slot levels. Such an integration is absent from previous compositional efforts, and we argue its necessity for the creation of coherent biological ontologies when linking previously unlinked terms. PMID:17011833

  12. Analysis of vascular endothelial dysfunction genes and related pathways in obesity through systematic bioinformatics.

    Science.gov (United States)

    Zhang, Hui; Wang, Jing; Sun, Ling; Xu, Qiuqin; Hou, Miao; Ding, Yueyue; Huang, Jie; Chen, Ye; Cao, Lei; Zhang, Jianmin; Qian, Weiguo; Lv, Haitao

    2015-01-01

    Obesity has become an increasingly serious health problem and popular research topic. It is associated with many diseases, especially cardiovascular disease (CVD)-related endothelial dysfunction. This study analyzed genes related to endothelial dysfunction and obesity and then summarized their most significant signaling pathways. Genes related to vascular endothelial dysfunction and obesity were extracted from a PubMed database, and analyzed by STRING, DAVID, and Gene-Go Meta-Core software. 142 genes associated with obesity were found to play a role in endothelial dysfunction in PubMed. A significant pathway (Angiotensin system maturation in protein folding and maturation) associated with obesity and endothelial dysfunction was explored. The genes and the pathway explored may play an important role in obesity. Further studies about preventing vascular endothelial dysfunction obesity should be conducted through targeting these loci and pathways.

  13. User centered and ontology based information retrieval system for life sciences

    Directory of Open Access Journals (Sweden)

    Sy Mohameth-François

    2012-01-01

    Full Text Available Abstract Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens

  14. The gene-gene interaction of INSIG-SCAP-SREBP pathway on the risk of obesity in Chinese children.

    Science.gov (United States)

    Liu, Fang-Hong; Song, Jie-Yun; Shang, Xiao-Rui; Meng, Xiang-Rui; Ma, Jun; Wang, Hai-Jun

    2014-01-01

    Childhood obesity has become a global public health problem in recent years. This study aimed to explore the association of genetic variants in INSIG-SCAP-SREBP pathway with obesity in Chinese children. A case-control study was conducted, including 705 obese cases and 1,325 nonobese controls. We genotyped 15 single nucleotide polymorphisms (SNPs) of five genes in INSIG-SCAP-SREBP pathway, including insulin induced gene 1 (INSIG1), insulin induced gene 2 (INSIG2), SREBP cleavage-activating protein gene (SCAP), sterol regulatory element binding protein gene 1 (SREBP1), and sterol regulatory element binding protein gene 2 (SREBP2). We used generalized multifactor dimensionality reduction (GMDR) and logistic regression to investigate gene-gene interactions. Single polymorphism analyses showed that SCAP rs12487736 and rs12490383 were nominally associated with obesity. We identified a 3-locus interaction on obesity in GMDR analyses (P = 0.001), involving 3 genetic variants of INSIG2, SCAP, and SREBP2. The individuals in high-risk group of the 3-locus combinations had a 79.9% increased risk of obesity compared with those in low-risk group (OR = 1.799, 95% CI: 1.475-2.193, P = 6.61 × 10(-9)). We identified interaction of three genes in INSIG-SCAP-SREBP pathway on risk of obesity, revealing that these genes affect obesity more likely through a complex interaction pattern than single gene effect.

  15. Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways

    Science.gov (United States)

    Zhang, Kui; Busov, Victor; Wei, Hairong

    2017-01-01

    Background Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. Results A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. Conclusions BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental

  16. Diverse chromatin remodeling genes antagonize the Rb-involved SynMuv pathways in C. elegans.

    Directory of Open Access Journals (Sweden)

    Mingxue Cui

    2006-05-01

    Full Text Available In Caenorhabditis elegans, vulval cell-fate specification involves the activities of multiple signal transduction and regulatory pathways that include a receptor tyrosine kinase/Ras/mitogen-activated protein kinase pathway and synthetic multivulva (SynMuv pathways. Many genes in the SynMuv pathways encode transcription factors including the homologs of mammalian Rb, E2F, and components of the nucleosome-remodeling deacetylase complex. To further elucidate the functions of the SynMuv genes, we performed a genome-wide RNA interference (RNAi screen to search for genes that antagonize the SynMuv gene activities. Among those that displayed a varying degree of suppression of the SynMuv phenotype, 32 genes are potentially involved in chromatin remodeling (called SynMuv suppressor genes herein. Genetic mutations of two representative genes (zfp-1 and mes-4 were used to further characterize their positive roles in vulval induction and relationships with Ras function. Our analysis revealed antagonistic roles of the SynMuv suppressor genes and the SynMuv B genes in germline-soma distinction, RNAi, somatic transgene silencing, and tissue specific expression of pgl-1 and the lag-2/Delta genes. The opposite roles of these SynMuv B and SynMuv suppressor genes on transcriptional regulation were confirmed in somatic transgene silencing. We also report the identifications of ten new genes in the RNAi pathway and six new genes in germline silencing. Among the ten new RNAi genes, three encode homologs of proteins involved in both protein degradation and chromatin remodeling. Our findings suggest that multiple chromatin remodeling complexes are involved in regulating the expression of specific genes that play critical roles in developmental decisions.

  17. GoPipe:批量序列的Gene Ontology注释和统计分析%GoPipe: Streamlined Gene Ontology Annotation for Batch Anonymous Sequences With Statistics

    Institute of Scientific and Technical Information of China (English)

    陈作舟; 薛成海; 朱晟; 周丰丰; XUEFENG BRUCE LING; 刘国平; 陈良标

    2005-01-01

    随着后基因组时代的到来,批量的测序,特别是EST的测序,逐渐成为普通实验室的日常工作.这些新的序列往往需要进行批量的Gene Ontology(GO)的注释及随后的统计分析.但是目前除了Goblet以外,并没有软件适合对未知序列进行批量的GO注释,而GoBlet因为具有上载量的限制,以及仅仅利用BLAST作为预测工具,所以仍有许多不足之处.开发了一个软件包GoPipe,通过整合BLAST和InterProScan的结果来进行序列注释,并提供了进一步作统计比较的工具.主程序接收任意个BLAST和InterProScan的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作.还提供了统计的工具来比较不同输入对GO的分布来挖掘生物学意义.另外,在交集工作模式下,程序取InterProScan和BLAST结果的交集,在测试数据集中,其精确度达到99.1%,这大大超过了InterProScan本身对GO预测的精确度,而敏感度只是稍微下降.较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量Gene Ontology注释的理想的工具.上述软件包可以在网站(http://gopipe.fishgenome.org/)免费获得或者与作者联系获取.%Accelerated availability of new sequences, especially ESTs, calls for computational methods to link sequences with Gene Ontology (GO) terms in a batch mode. There is currently no program for such purpose except Goblet, an online tool which uses BLAST to interpret query sequence with proper GO terms, but has a restriction of upload sequence files less than 100 kilobytes in size. GoPipe is a standalone package that integrates BLAST and InterProScan results to obtain Gene Ontology annotation with built-in statistical options. GoPipe takes any number of BLAST and/or InterProScan output files simultaneously and launches jobs sequentially to perform parsing, data integration, redundancy removal, GO distributions calculation and graphic display. A very

  18. NF-Y activates genes of metabolic pathways altered in cancer cells.

    Science.gov (United States)

    Benatti, Paolo; Chiaramonte, Maria Luisa; Lorenzo, Mariangela; Hartley, John A; Hochhauser, Daniel; Gnesutta, Nerina; Mantovani, Roberto; Imbriano, Carol; Dolfini, Diletta

    2016-01-12

    The trimeric transcription factor NF-Y binds to the CCAAT box, an element enriched in promoters of genes overexpresse