WorldWideScience

Sample records for gene network inference

  1. Inferring gene networks from discrete expression data

    KAUST Repository

    Zhang, L.

    2013-07-18

    The modeling of gene networks from transcriptional expression data is an important tool in biomedical research to reveal signaling pathways and to identify treatment targets. Current gene network modeling is primarily based on the use of Gaussian graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which generate counts of mRNAtranscripts in cell samples.We propose a generalized linear model to fit the discrete gene expression data and assume that the log ratios of the mean expression levels follow a Gaussian distribution.We restrict the gene network structures to decomposable graphs and derive the graphs by selecting the covariance matrix of the Gaussian distribution with the hyper-inverse Wishart priors. Furthermore, we incorporate prior network models based on gene ontology information, which avails existing biological information on the genes of interest. We conduct simulation studies to examine the performance of our discrete graphical model and apply the method to two real datasets for gene network inference. © The Author 2013. Published by Oxford University Press. All rights reserved.

  2. Inferring Phylogenetic Networks from Gene Order Data

    Directory of Open Access Journals (Sweden)

    Alexey Anatolievich Morozov

    2013-01-01

    Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

  3. Inferring gene networks from discrete expression data

    KAUST Repository

    Zhang, L.; Mallick, B. K.

    2013-01-01

    graphical models applied to continuous data, which give a closedformmarginal likelihood. In this paper,we extend network modeling to discrete data, specifically data from serial analysis of gene expression, and RNA-sequencing experiments, both of which

  4. Fused Regression for Multi-source Gene Regulatory Network Inference.

    Directory of Open Access Journals (Sweden)

    Kari Y Lam

    2016-12-01

    Full Text Available Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method's utility in learning from data collected on different experimental platforms.

  5. Inferring time-varying network topologies from gene expression data.

    Science.gov (United States)

    Rao, Arvind; Hero, Alfred O; States, David J; Engel, James Douglas

    2007-01-01

    Most current methods for gene regulatory network identification lead to the inference of steady-state networks, that is, networks prevalent over all times, a hypothesis which has been challenged. There has been a need to infer and represent networks in a dynamic, that is, time-varying fashion, in order to account for different cellular states affecting the interactions amongst genes. In this work, we present an approach, regime-SSM, to understand gene regulatory networks within such a dynamic setting. The approach uses a clustering method based on these underlying dynamics, followed by system identification using a state-space model for each learnt cluster--to infer a network adjacency matrix. We finally indicate our results on the mouse embryonic kidney dataset as well as the T-cell activation-based expression dataset and demonstrate conformity with reported experimental evidence.

  6. An algebra-based method for inferring gene regulatory networks.

    Science.gov (United States)

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the

  7. Inferring the conservative causal core of gene regulatory networks

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2010-09-01

    Full Text Available Abstract Background Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. Results In this paper, we introduce a novel gene regulatory network inference (GRNI algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. Conclusions For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  8. Inferring the conservative causal core of gene regulatory networks.

    Science.gov (United States)

    Altay, Gökmen; Emmert-Streib, Frank

    2010-09-28

    Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.

  9. Inferring the gene network underlying the branching of tomato inflorescence.

    Directory of Open Access Journals (Sweden)

    Laura Astola

    Full Text Available The architecture of tomato inflorescence strongly affects flower production and subsequent crop yield. To understand the genetic activities involved, insight into the underlying network of genes that initiate and control the sympodial growth in the tomato is essential. In this paper, we show how the structure of this network can be derived from available data of the expressions of the involved genes. Our approach starts from employing biological expert knowledge to select the most probable gene candidates behind branching behavior. To find how these genes interact, we develop a stepwise procedure for computational inference of the network structure. Our data consists of expression levels from primary shoot meristems, measured at different developmental stages on three different genotypes of tomato. With the network inferred by our algorithm, we can explain the dynamics corresponding to all three genotypes simultaneously, despite their apparent dissimilarities. We also correctly predict the chronological order of expression peaks for the main hubs in the network. Based on the inferred network, using optimal experimental design criteria, we are able to suggest an informative set of experiments for further investigation of the mechanisms underlying branching behavior.

  10. Comparison of evolutionary algorithms in gene regulatory network model inference.

    LENUS (Irish Health Repository)

    2010-01-01

    ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.

  11. Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Alina Sîrbu

    2015-05-01

    Full Text Available Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions. Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.

  12. Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks.

    Science.gov (United States)

    Sîrbu, Alina; Crane, Martin; Ruskin, Heather J

    2015-05-14

    Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.

  13. Data identification for improving gene network inference using computational algebra.

    Science.gov (United States)

    Dimitrova, Elena; Stigler, Brandilyn

    2014-11-01

    Identification of models of gene regulatory networks is sensitive to the amount of data used as input. Considering the substantial costs in conducting experiments, it is of value to have an estimate of the amount of data required to infer the network structure. To minimize wasted resources, it is also beneficial to know which data are necessary to identify the network. Knowledge of the data and knowledge of the terms in polynomial models are often required a priori in model identification. In applications, it is unlikely that the structure of a polynomial model will be known, which may force data sets to be unnecessarily large in order to identify a model. Furthermore, none of the known results provides any strategy for constructing data sets to uniquely identify a model. We provide a specialization of an existing criterion for deciding when a set of data points identifies a minimal polynomial model when its monomial terms have been specified. Then, we relax the requirement of the knowledge of the monomials and present results for model identification given only the data. Finally, we present a method for constructing data sets that identify minimal polynomial models.

  14. Inferring Gene Regulatory Networks Using Conditional Regulation Pattern to Guide Candidate Genes.

    Directory of Open Access Journals (Sweden)

    Fei Xiao

    Full Text Available Combining path consistency (PC algorithms with conditional mutual information (CMI are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference, to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.

  15. Inferring Drosophila gap gene regulatory network: Pattern analysis of simulated gene expression profiles and stability analysis

    NARCIS (Netherlands)

    Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.

    2009-01-01

    Background: Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori

  16. Inferring Drosophila gap gene regulatory network: Pattern analysis of simulated gene expression profiles and stability analysis

    OpenAIRE

    Fomekong-Nanfack, Y.; Postma, M.; Kaandorp, J.A.

    2009-01-01

    Abstract Background Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori assumptions about the interactions, which all simulate the observed patterns. It is important to analyze the properties of the circuits. Findings We have analyzed the simulated gene expression ...

  17. Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors

    Directory of Open Access Journals (Sweden)

    Frank Emmert-Streib

    2013-02-01

    Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.

  18. Learning a Markov Logic network for supervised gene regulatory network inference.

    Science.gov (United States)

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a

  19. Gene regulatory network inference by point-based Gaussian approximation filters incorporating the prior information.

    Science.gov (United States)

    Jia, Bin; Wang, Xiaodong

    2013-12-17

    : The extended Kalman filter (EKF) has been applied to inferring gene regulatory networks. However, it is well known that the EKF becomes less accurate when the system exhibits high nonlinearity. In addition, certain prior information about the gene regulatory network exists in practice, and no systematic approach has been developed to incorporate such prior information into the Kalman-type filter for inferring the structure of the gene regulatory network. In this paper, an inference framework based on point-based Gaussian approximation filters that can exploit the prior information is developed to solve the gene regulatory network inference problem. Different point-based Gaussian approximation filters, including the unscented Kalman filter (UKF), the third-degree cubature Kalman filter (CKF3), and the fifth-degree cubature Kalman filter (CKF5) are employed. Several types of network prior information, including the existing network structure information, sparsity assumption, and the range constraint of parameters, are considered, and the corresponding filters incorporating the prior information are developed. Experiments on a synthetic network of eight genes and the yeast protein synthesis network of five genes are carried out to demonstrate the performance of the proposed framework. The results show that the proposed methods provide more accurate inference results than existing methods, such as the EKF and the traditional UKF.

  20. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    Science.gov (United States)

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  1. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation.

    Directory of Open Access Journals (Sweden)

    Xiaobo Guo

    Full Text Available Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs. It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC curve and the precision-recall (PR curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.

  2. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    Science.gov (United States)

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high

  3. Evaluation of artificial time series microarray data for dynamic gene regulatory network inference.

    Science.gov (United States)

    Xenitidis, P; Seimenis, I; Kakolyris, S; Adamopoulos, A

    2017-08-07

    High-throughput technology like microarrays is widely used in the inference of gene regulatory networks (GRNs). We focused on time series data since we are interested in the dynamics of GRNs and the identification of dynamic networks. We evaluated the amount of information that exists in artificial time series microarray data and the ability of an inference process to produce accurate models based on them. We used dynamic artificial gene regulatory networks in order to create artificial microarray data. Key features that characterize microarray data such as the time separation of directly triggered genes, the percentage of directly triggered genes and the triggering function type were altered in order to reveal the limits that are imposed by the nature of microarray data on the inference process. We examined the effect of various factors on the inference performance such as the network size, the presence of noise in microarray data, and the network sparseness. We used a system theory approach and examined the relationship between the pole placement of the inferred system and the inference performance. We examined the relationship between the inference performance in the time domain and the true system parameter identification. Simulation results indicated that time separation and the percentage of directly triggered genes are crucial factors. Also, network sparseness, the triggering function type and noise in input data affect the inference performance. When two factors were simultaneously varied, it was found that variation of one parameter significantly affects the dynamic response of the other. Crucial factors were also examined using a real GRN and acquired results confirmed simulation findings with artificial data. Different initial conditions were also used as an alternative triggering approach. Relevant results confirmed that the number of datasets constitutes the most significant parameter with regard to the inference performance. Copyright © 2017 Elsevier

  4. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

    Science.gov (United States)

    Schaffter, Thomas; Marbach, Daniel; Floreano, Dario

    2011-08-15

    Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.

  5. Inference of Cancer-specific Gene Regulatory Networks Using Soft Computing Rules

    Directory of Open Access Journals (Sweden)

    Xiaosheng Wang

    2010-03-01

    Full Text Available Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  6. Inference of cancer-specific gene regulatory networks using soft computing rules.

    Science.gov (United States)

    Wang, Xiaosheng; Gotoh, Osamu

    2010-03-24

    Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer) using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.

  7. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Xiaodong Cai

    Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.

  8. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    Directory of Open Access Journals (Sweden)

    Joeri Ruyssinck

    Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made

  9. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    Energy Technology Data Exchange (ETDEWEB)

    Santra, Tapesh, E-mail: tapesh.santra@ucd.ie [Systems Biology Ireland, University College Dublin, Dublin (Ireland)

    2014-05-20

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  10. A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Santra, Tapesh

    2014-01-01

    Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein–protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.

  11. Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions.

    Directory of Open Access Journals (Sweden)

    Yinyin Yuan

    Full Text Available Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/.

  12. GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

    Science.gov (United States)

    Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

    2013-09-01

    GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.

  13. Integration of steady-state and temporal gene expression data for the inference of gene regulatory networks.

    Science.gov (United States)

    Wang, Yi Kan; Hurley, Daniel G; Schnell, Santiago; Print, Cristin G; Crampin, Edmund J

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.

  14. Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

    Science.gov (United States)

    Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

    2017-01-01

    The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

  15. Large-scale modeling of condition-specific gene regulatory networks by information integration and inference.

    Science.gov (United States)

    Ellwanger, Daniel Christian; Leonhardt, Jörn Florian; Mewes, Hans-Werner

    2014-12-01

    Understanding how regulatory networks globally coordinate the response of a cell to changing conditions, such as perturbations by shifting environments, is an elementary challenge in systems biology which has yet to be met. Genome-wide gene expression measurements are high dimensional as these are reflecting the condition-specific interplay of thousands of cellular components. The integration of prior biological knowledge into the modeling process of systems-wide gene regulation enables the large-scale interpretation of gene expression signals in the context of known regulatory relations. We developed COGERE (http://mips.helmholtz-muenchen.de/cogere), a method for the inference of condition-specific gene regulatory networks in human and mouse. We integrated existing knowledge of regulatory interactions from multiple sources to a comprehensive model of prior information. COGERE infers condition-specific regulation by evaluating the mutual dependency between regulator (transcription factor or miRNA) and target gene expression using prior information. This dependency is scored by the non-parametric, nonlinear correlation coefficient η(2) (eta squared) that is derived by a two-way analysis of variance. We show that COGERE significantly outperforms alternative methods in predicting condition-specific gene regulatory networks on simulated data sets. Furthermore, by inferring the cancer-specific gene regulatory network from the NCI-60 expression study, we demonstrate the utility of COGERE to promote hypothesis-driven clinical research. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Structural influence of gene networks on their inference: analysis of C3NET

    Directory of Open Access Journals (Sweden)

    Emmert-Streib Frank

    2011-06-01

    Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.

  17. Inference of gene regulatory networks from time series by Tsallis entropy

    Directory of Open Access Journals (Sweden)

    de Oliveira Evaldo A

    2011-05-01

    Full Text Available Abstract Background The inference of gene regulatory networks (GRNs from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information, a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5

  18. A comparative study of covariance selection models for the inference of gene regulatory networks.

    Science.gov (United States)

    Stifanelli, Patrizia F; Creanza, Teresa M; Anglani, Roberto; Liuzzi, Vania C; Mukherjee, Sayan; Schena, Francesco P; Ancona, Nicola

    2013-10-01

    The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip. Copyright © 2013 The Authors. Published by

  19. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    Science.gov (United States)

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  20. State of the Art of Fuzzy Methods for Gene Regulatory Networks Inference

    Directory of Open Access Journals (Sweden)

    Tuqyah Abdullah Al Qazlan

    2015-01-01

    Full Text Available To address one of the most challenging issues at the cellular level, this paper surveys the fuzzy methods used in gene regulatory networks (GRNs inference. GRNs represent causal relationships between genes that have a direct influence, trough protein production, on the life and the development of living organisms and provide a useful contribution to the understanding of the cellular functions as well as the mechanisms of diseases. Fuzzy systems are based on handling imprecise knowledge, such as biological information. They provide viable computational tools for inferring GRNs from gene expression data, thus contributing to the discovery of gene interactions responsible for specific diseases and/or ad hoc correcting therapies. Increasing computational power and high throughput technologies have provided powerful means to manage these challenging digital ecosystems at different levels from cell to society globally. The main aim of this paper is to report, present, and discuss the main contributions of this multidisciplinary field in a coherent and structured framework.

  1. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    Science.gov (United States)

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  2. Inferring transcriptional gene regulation network of starch metabolism in Arabidopsis thaliana leaves using graphical Gaussian model

    Directory of Open Access Journals (Sweden)

    Ingkasuwan Papapit

    2012-08-01

    Full Text Available Abstract Background Starch serves as a temporal storage of carbohydrates in plant leaves during day/night cycles. To study transcriptional regulatory modules of this dynamic metabolic process, we conducted gene regulation network analysis based on small-sample inference of graphical Gaussian model (GGM. Results Time-series significant analysis was applied for Arabidopsis leaf transcriptome data to obtain a set of genes that are highly regulated under a diurnal cycle. A total of 1,480 diurnally regulated genes included 21 starch metabolic enzymes, 6 clock-associated genes, and 106 transcription factors (TF. A starch-clock-TF gene regulation network comprising 117 nodes and 266 edges was constructed by GGM from these 133 significant genes that are potentially related to the diurnal control of starch metabolism. From this network, we found that β-amylase 3 (b-amy3: At4g17090, which participates in starch degradation in chloroplast, is the most frequently connected gene (a hub gene. The robustness of gene-to-gene regulatory network was further analyzed by TF binding site prediction and by evaluating global co-expression of TFs and target starch metabolic enzymes. As a result, two TFs, indeterminate domain 5 (AtIDD5: At2g02070 and constans-like (COL: At2g21320, were identified as positive regulators of starch synthase 4 (SS4: At4g18240. The inference model of AtIDD5-dependent positive regulation of SS4 gene expression was experimentally supported by decreased SS4 mRNA accumulation in Atidd5 mutant plants during the light period of both short and long day conditions. COL was also shown to positively control SS4 mRNA accumulation. Furthermore, the knockout of AtIDD5 and COL led to deformation of chloroplast and its contained starch granules. This deformity also affected the number of starch granules per chloroplast, which increased significantly in both knockout mutant lines. Conclusions In this study, we utilized a systematic approach of microarray

  3. Predictive minimum description length principle approach to inferring gene regulatory networks.

    Science.gov (United States)

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  4. Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data.

    Science.gov (United States)

    Gong, Wuming; Koyano-Nakagawa, Naoko; Li, Tongbin; Garry, Daniel J

    2015-03-07

    Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP

  5. A novel gene network inference algorithm using predictive minimum description length approach.

    Science.gov (United States)

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the

  6. Artificial neural network inference (ANNI: a study on gene-gene interaction for biomarkers in childhood sarcomas.

    Directory of Open Access Journals (Sweden)

    Dong Ling Tong

    Full Text Available OBJECTIVE: To model the potential interaction between previously identified biomarkers in children sarcomas using artificial neural network inference (ANNI. METHOD: To concisely demonstrate the biological interactions between correlated genes in an interaction network map, only 2 types of sarcomas in the children small round blue cell tumors (SRBCTs dataset are discussed in this paper. A backpropagation neural network was used to model the potential interaction between genes. The prediction weights and signal directions were used to model the strengths of the interaction signals and the direction of the interaction link between genes. The ANN model was validated using Monte Carlo cross-validation to minimize the risk of over-fitting and to optimize generalization ability of the model. RESULTS: Strong connection links on certain genes (TNNT1 and FNDC5 in rhabdomyosarcoma (RMS; FCGRT and OLFM1 in Ewing's sarcoma (EWS suggested their potency as central hubs in the interconnection of genes with different functionalities. The results showed that the RMS patients in this dataset are likely to be congenital and at low risk of cardiomyopathy development. The EWS patients are likely to be complicated by EWS-FLI fusion and deficiency in various signaling pathways, including Wnt, Fas/Rho and intracellular oxygen. CONCLUSIONS: The ANN network inference approach and the examination of identified genes in the published literature within the context of the disease highlights the substantial influence of certain genes in sarcomas.

  7. A novel mutual information-based Boolean network inference method from time-series gene expression data.

    Directory of Open Access Journals (Sweden)

    Shohag Barman

    Full Text Available Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately.In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods.Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.

  8. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

    Directory of Open Access Journals (Sweden)

    Xiangyun Xiao

    Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.

  9. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

    Science.gov (United States)

    Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen

    2015-01-01

    The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.

  10. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information.

    Science.gov (United States)

    Lachmann, Alexander; Giorgi, Federico M; Lopez, Gonzalo; Califano, Andrea

    2016-07-15

    The accurate reconstruction of gene regulatory networks from large scale molecular profile datasets represents one of the grand challenges of Systems Biology. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) represents one of the most effective tools to accomplish this goal. However, the initial Fixed Bandwidth (FB) implementation is both inefficient and unable to deal with sample sets providing largely uneven coverage of the probability density space. Here, we present a completely new implementation of the algorithm, based on an Adaptive Partitioning strategy (AP) for estimating the Mutual Information. The new AP implementation (ARACNe-AP) achieves a dramatic improvement in computational performance (200× on average) over the previous methodology, while preserving the Mutual Information estimator and the Network inference accuracy of the original algorithm. Given that the previous version of ARACNe is extremely demanding, the new version of the algorithm will allow even researchers with modest computational resources to build complex regulatory networks from hundreds of gene expression profiles. A JAVA cross-platform command line executable of ARACNe, together with all source code and a detailed usage guide are freely available on Sourceforge (http://sourceforge.net/projects/aracne-ap). JAVA version 8 or higher is required. califano@c2b2.columbia.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    Science.gov (United States)

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for

  12. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    Science.gov (United States)

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  13. A canonical correlation analysis-based dynamic bayesian network prior to infer gene regulatory networks from multiple types of biological data.

    Science.gov (United States)

    Baur, Brittany; Bozdag, Serdar

    2015-04-01

    One of the challenging and important computational problems in systems biology is to infer gene regulatory networks (GRNs) of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer GRNs. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables, and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions in high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in GRN inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.

  14. An empirical comparison of popular structure learning algorithms with a view to gene network inference

    Czech Academy of Sciences Publication Activity Database

    Djordjilović, V.; Chiogna, M.; Vomlel, Jiří

    2017-01-01

    Roč. 88, č. 1 (2017), s. 602-613 ISSN 0888-613X R&D Projects: GA ČR(CZ) GA16-12010S Institutional support: RVO:67985556 Keywords : Bayesian networks * Structure learning * Reverse engineering * Gene networks Subject RIV: JD - Computer Applications, Robotics OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) Impact factor: 2.845, year: 2016 http://library.utia.cas.cz/separaty/2017/MTR/vomlel-0477168.pdf

  15. Gene network inference and biochemical assessment delineates GPCR pathways and CREB targets in small intestinal neuroendocrine neoplasia.

    Directory of Open Access Journals (Sweden)

    Ignat Drozdov

    Full Text Available Small intestinal (SI neuroendocrine tumors (NET are increasing in incidence, however little is known about their biology. High throughput techniques such as inference of gene regulatory networks from microarray experiments can objectively define signaling machinery in this disease. Genome-wide co-expression analysis was used to infer gene relevance network in SI-NETs. The network was confirmed to be non-random, scale-free, and highly modular. Functional analysis of gene co-expression modules revealed processes including 'Nervous system development', 'Immune response', and 'Cell-cycle'. Importantly, gene network topology and differential expression analysis identified over-expression of the GPCR signaling regulators, the cAMP synthetase, ADCY2, and the protein kinase A, PRKAR1A. Seven CREB response element (CRE transcripts associated with proliferation and secretion: BEX1, BICD1, CHGB, CPE, GABRB3, SCG2 and SCG3 as well as ADCY2 and PRKAR1A were measured in an independent SI dataset (n = 10 NETs; n = 8 normal preparations. All were up-regulated (p<0.035 with the exception of SCG3 which was not differently expressed. Forskolin (a direct cAMP activator, 10(-5 M significantly stimulated transcription of pCREB and 3/7 CREB targets, isoproterenol (a selective ß-adrenergic receptor agonist and cAMP activator, 10(-5 M stimulated pCREB and 4/7 targets while BIM-53061 (a dopamine D(2 and Serotonin [5-HT(2] receptor agonist, 10(-6 M stimulated 100% of targets as well as pCREB; CRE transcription correlated with the levels of cAMP accumulation and PKA activity; BIM-53061 stimulated the highest levels of cAMP and PKA (2.8-fold and 2.5-fold vs. 1.8-2-fold for isoproterenol and forskolin. Gene network inference and graph topology analysis in SI NETs suggests that SI NETs express neural GPCRs that activate different CRE targets associated with proliferation and secretion. In vitro studies, in a model NET cell system, confirmed that transcriptional

  16. Posterior association networks and functional modules inferred from rich phenotypes of gene perturbations.

    Directory of Open Access Journals (Sweden)

    Xin Wang

    Full Text Available Combinatorial gene perturbations provide rich information for a systematic exploration of genetic interactions. Despite successful applications to bacteria and yeast, the scalability of this approach remains a major challenge for higher organisms such as humans. Here, we report a novel experimental and computational framework to efficiently address this challenge by limiting the 'search space' for important genetic interactions. We propose to integrate rich phenotypes of multiple single gene perturbations to robustly predict functional modules, which can subsequently be subjected to further experimental investigations such as combinatorial gene silencing. We present posterior association networks (PANs to predict functional interactions between genes estimated using a Bayesian mixture modelling approach. The major advantage of this approach over conventional hypothesis tests is that prior knowledge can be incorporated to enhance predictive power. We demonstrate in a simulation study and on biological data, that integrating complementary information greatly improves prediction accuracy. To search for significant modules, we perform hierarchical clustering with multiscale bootstrap resampling. We demonstrate the power of the proposed methodologies in applications to Ewing's sarcoma and human adult stem cells using publicly available and custom generated data, respectively. In the former application, we identify a gene module including many confirmed and highly promising therapeutic targets. Genes in the module are also significantly overrepresented in signalling pathways that are known to be critical for proliferation of Ewing's sarcoma cells. In the latter application, we predict a functional network of chromatin factors controlling epidermal stem cell fate. Further examinations using ChIP-seq, ChIP-qPCR and RT-qPCR reveal that the basis of their genetic interactions may arise from transcriptional cross regulation. A Bioconductor package

  17. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data.

    Science.gov (United States)

    Huynh-Thu, Vân Anh; Geurts, Pierre

    2018-02-21

    The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene. This method provided state-of-the-art performance on several benchmark datasets, but it could however not specifically be applied to time series expression data. We propose here an adaptation of the GENIE3 method, called dynamical GENIE3 (dynGENIE3), for handling both time series and steady-state expression data. The proposed method is evaluated extensively on the artificial DREAM4 benchmarks and on three real time series expression datasets. Although dynGENIE3 does not systematically yield the best performance on each and every network, it is competitive with diverse methods from the literature, while preserving the main advantages of GENIE3 in terms of scalability.

  18. Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks.

    Science.gov (United States)

    Wu, Mengmeng; Lin, Zhixiang; Ma, Shining; Chen, Ting; Jiang, Rui; Wong, Wing Hung

    2017-12-01

    Although genome-wide association studies (GWAS) have successfully identified thousands of genomic loci associated with hundreds of complex traits in the past decade, the debate about such problems as missing heritability and weak interpretability has been appealing for effective computational methods to facilitate the advanced analysis of the vast volume of existing and anticipated genetic data. Towards this goal, gene-level integrative GWAS analysis with the assumption that genes associated with a phenotype tend to be enriched in biological gene sets or gene networks has recently attracted much attention, due to such advantages as straightforward interpretation, less multiple testing burdens, and robustness across studies. However, existing methods in this category usually exploit non-tissue-specific gene networks and thus lack the ability to utilize informative tissue-specific characteristics. To overcome this limitation, we proposed a Bayesian approach called SIGNET (Simultaneously Inference of GeNEs and Tissues) to integrate GWAS data and multiple tissue-specific gene networks for the simultaneous inference of phenotype-associated genes and relevant tissues. Through extensive simulation studies, we showed the effectiveness of our method in finding both associated genes and relevant tissues for a phenotype. In applications to real GWAS data of 14 complex phenotypes, we demonstrated the power of our method in both deciphering genetic basis and discovering biological insights of a phenotype. With this understanding, we expect to see SIGNET as a valuable tool for integrative GWAS analysis, thereby boosting the prevention, diagnosis, and treatment of human inherited diseases and eventually facilitating precision medicine.

  19. OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks.

    Science.gov (United States)

    Lim, Néhémy; Senbabaoglu, Yasin; Michailidis, George; d'Alché-Buc, Florence

    2013-06-01

    Reverse engineering of gene regulatory networks remains a central challenge in computational systems biology, despite recent advances facilitated by benchmark in silico challenges that have aided in calibrating their performance. A number of approaches using either perturbation (knock-out) or wild-type time-series data have appeared in the literature addressing this problem, with the latter using linear temporal models. Nonlinear dynamical models are particularly appropriate for this inference task, given the generation mechanism of the time-series data. In this study, we introduce a novel nonlinear autoregressive model based on operator-valued kernels that simultaneously learns the model parameters, as well as the network structure. A flexible boosting algorithm (OKVAR-Boost) that shares features from L2-boosting and randomization-based algorithms is developed to perform the tasks of parameter learning and network inference for the proposed model. Specifically, at each boosting iteration, a regularized Operator-valued Kernel-based Vector AutoRegressive model (OKVAR) is trained on a random subnetwork. The final model consists of an ensemble of such models. The empirical estimation of the ensemble model's Jacobian matrix provides an estimation of the network structure. The performance of the proposed algorithm is first evaluated on a number of benchmark datasets from the DREAM3 challenge and then on real datasets related to the In vivo Reverse-Engineering and Modeling Assessment (IRMA) and T-cell networks. The high-quality results obtained strongly indicate that it outperforms existing approaches. The OKVAR-Boost Matlab code is available as the archive: http://amis-group.fr/sourcecode-okvar-boost/OKVARBoost-v1.0.zip. Supplementary data are available at Bioinformatics online.

  20. Inferring Phylogenetic Networks Using PhyloNet.

    Science.gov (United States)

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  1. Inference in hybrid Bayesian networks

    DEFF Research Database (Denmark)

    Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael

    2009-01-01

    Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....

  2. Stochastic simulation for the inference of transcriptional control network of yeast cyclins genes

    Czech Academy of Sciences Publication Activity Database

    Vohradský, Jiří

    2012-01-01

    Roč. 40, č. 15 (2012), s. 7096-7103 ISSN 0305-1048 R&D Projects: GA ČR GAP302/11/0229 Institutional support: RVO:61388971 Keywords : Cell cycle * gene expression * protein complexes Subject RIV: CE - Biochemistry Impact factor: 8.278, year: 2012

  3. Inferring network structure from cascades

    Science.gov (United States)

    Ghonge, Sushrut; Vural, Dervis Can

    2017-07-01

    Many physical, biological, and social phenomena can be described by cascades taking place on a network. Often, the activity can be empirically observed, but not the underlying network of interactions. In this paper we offer three topological methods to infer the structure of any directed network given a set of cascade arrival times. Our formulas hold for a very general class of models where the activation probability of a node is a generic function of its degree and the number of its active neighbors. We report high success rates for synthetic and real networks, for several different cascade models.

  4. Inference in hybrid Bayesian networks

    International Nuclear Information System (INIS)

    Langseth, Helge; Nielsen, Thomas D.; Rumi, Rafael; Salmeron, Antonio

    2009-01-01

    Since the 1980s, Bayesian networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability techniques (like fault trees and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (the so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability.

  5. Inferring network topology from complex dynamics

    International Nuclear Information System (INIS)

    Shandilya, Srinivas Gorur; Timme, Marc

    2011-01-01

    Inferring the network topology from dynamical observations is a fundamental problem pervading research on complex systems. Here, we present a simple, direct method for inferring the structural connection topology of a network, given an observation of one collective dynamical trajectory. The general theoretical framework is applicable to arbitrary network dynamical systems described by ordinary differential equations. No interference (external driving) is required and the type of dynamics is hardly restricted in any way. In particular, the observed dynamics may be arbitrarily complex; stationary, invariant or transient; synchronous or asynchronous and chaotic or periodic. Presupposing a knowledge of the functional form of the dynamical units and of the coupling functions between them, we present an analytical solution to the inverse problem of finding the network topology from observing a time series of state variables only. Robust reconstruction is achieved in any sufficiently long generic observation of the system. We extend our method to simultaneously reconstructing both the entire network topology and all parameters appearing linear in the system's equations of motion. Reconstruction of network topology and system parameters is viable even in the presence of external noise that distorts the original dynamics substantially. The method provides a conceptually new step towards reconstructing a variety of real-world networks, including gene and protein interaction networks and neuronal circuits.

  6. Nonparametric inference of network structure and dynamics

    Science.gov (United States)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  7. System-level insights into the cellular interactome of a non-model organism: inferring, modelling and analysing functional gene network of soybean (Glycine max.

    Directory of Open Access Journals (Sweden)

    Yungang Xu

    Full Text Available Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN, a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max, due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs, in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional

  8. ARACNe-AP: Gene Network Reverse Engineering through Adaptive Partitioning inference of Mutual Information. | Office of Cancer Genomics

    Science.gov (United States)

    The accurate reconstruction of gene regulatory networks from large scale molecular profile datasets represents one of the grand challenges of Systems Biology. The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) represents one of the most effective tools to accomplish this goal. However, the initial Fixed Bandwidth (FB) implementation is both inefficient and unable to deal with sample sets providing largely uneven coverage of the probability density space.

  9. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark

    2006-01-01

    We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...

  10. Compiling Relational Bayesian Networks for Exact Inference

    DEFF Research Database (Denmark)

    Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan

    2004-01-01

    We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...

  11. DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks

    Science.gov (United States)

    Gerstein, Mark

    2016-01-01

    Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem’s gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally–e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the “state” and “control” in the model refer to its own (internal) and another subsystem’s (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model’s parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with

  12. DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks.

    Directory of Open Access Journals (Sweden)

    Daifeng Wang

    2016-10-01

    Full Text Available Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs, cellular growth factors and microRNAs. A subsystem's gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally-e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the "state" and "control" in the model refer to its own (internal and another subsystem's (external gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model's parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs, seeing the degree to which these can be accounted for by orthologous (internal versus species-specific (external TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with

  13. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Meyer Patrick

    2007-01-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  14. Information-Theoretic Inference of Large Transcriptional Regulatory Networks

    Directory of Open Access Journals (Sweden)

    Patrick E. Meyer

    2007-06-01

    Full Text Available The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR, an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

  15. An Intuitive Dashboard for Bayesian Network Inference

    International Nuclear Information System (INIS)

    Reddy, Vikas; Farr, Anna Charisse; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K D V

    2014-01-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++

  16. An Intuitive Dashboard for Bayesian Network Inference

    Science.gov (United States)

    Reddy, Vikas; Charisse Farr, Anna; Wu, Paul; Mengersen, Kerrie; Yarlagadda, Prasad K. D. V.

    2014-03-01

    Current Bayesian network software packages provide good graphical interface for users who design and develop Bayesian networks for various applications. However, the intended end-users of these networks may not necessarily find such an interface appealing and at times it could be overwhelming, particularly when the number of nodes in the network is large. To circumvent this problem, this paper presents an intuitive dashboard, which provides an additional layer of abstraction, enabling the end-users to easily perform inferences over the Bayesian networks. Unlike most software packages, which display the nodes and arcs of the network, the developed tool organises the nodes based on the cause-and-effect relationship, making the user-interaction more intuitive and friendly. In addition to performing various types of inferences, the users can conveniently use the tool to verify the behaviour of the developed Bayesian network. The tool has been developed using QT and SMILE libraries in C++.

  17. Functional networks inference from rule-based machine learning models.

    Science.gov (United States)

    Lazzarini, Nicola; Widera, Paweł; Williamson, Stuart; Heer, Rakesh; Krasnogor, Natalio; Bacardit, Jaume

    2016-01-01

    Functional networks play an important role in the analysis of biological processes and systems. The inference of these networks from high-throughput (-omics) data is an area of intense research. So far, the similarity-based inference paradigm (e.g. gene co-expression) has been the most popular approach. It assumes a functional relationship between genes which are expressed at similar levels across different samples. An alternative to this paradigm is the inference of relationships from the structure of machine learning models. These models are able to capture complex relationships between variables, that often are different/complementary to the similarity-based methods. We propose a protocol to infer functional networks from machine learning models, called FuNeL. It assumes, that genes used together within a rule-based machine learning model to classify the samples, might also be functionally related at a biological level. The protocol is first tested on synthetic datasets and then evaluated on a test suite of 8 real-world datasets related to human cancer. The networks inferred from the real-world data are compared against gene co-expression networks of equal size, generated with 3 different methods. The comparison is performed from two different points of view. We analyse the enriched biological terms in the set of network nodes and the relationships between known disease-associated genes in a context of the network topology. The comparison confirms both the biological relevance and the complementary character of the knowledge captured by the FuNeL networks in relation to similarity-based methods and demonstrates its potential to identify known disease associations as core elements of the network. Finally, using a prostate cancer dataset as a case study, we confirm that the biological knowledge captured by our method is relevant to the disease and consistent with the specialised literature and with an independent dataset not used in the inference process. The

  18. Temporal profiling of cytokine-induced genes in pancreatic β-cells by meta-analysis and network inference

    DEFF Research Database (Denmark)

    Lopes, Miguel; Kutlu, Burak; Miani, Michela

    2014-01-01

    Type 1 Diabetes (T1D) is an autoimmune disease where local release of cytokines such as IL-1β and IFN-γ contributes to β-cell apoptosis. To identify relevant genes regulating this process we performed a meta-analysis of 8 datasets of β-cell gene expression after exposure to IL-1β and IFN-γ. Two...

  19. Assessment of network inference methods: how to cope with an underdetermined problem.

    Directory of Open Access Journals (Sweden)

    Caroline Siegenthaler

    Full Text Available The inference of biological networks is an active research area in the field of systems biology. The number of network inference algorithms has grown tremendously in the last decade, underlining the importance of a fair assessment and comparison among these methods. Current assessments of the performance of an inference method typically involve the application of the algorithm to benchmark datasets and the comparison of the network predictions against the gold standard or reference networks. While the network inference problem is often deemed underdetermined, implying that the inference problem does not have a (unique solution, the consequences of such an attribute have not been rigorously taken into consideration. Here, we propose a new procedure for assessing the performance of gene regulatory network (GRN inference methods. The procedure takes into account the underdetermined nature of the inference problem, in which gene regulatory interactions that are inferable or non-inferable are determined based on causal inference. The assessment relies on a new definition of the confusion matrix, which excludes errors associated with non-inferable gene regulations. For demonstration purposes, the proposed assessment procedure is applied to the DREAM 4 In Silico Network Challenge. The results show a marked change in the ranking of participating methods when taking network inferability into account.

  20. Causal inference in biology networks with integrated belief propagation.

    Science.gov (United States)

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  1. Rice Gene Network Inferred from Expression Profiling of Plants Overexpressing OsWRKY13,a Positive Regulator of Disease Resistance

    Institute of Scientific and Technical Information of China (English)

    Deyun Qiu; Jun Xiao; Weibo Xie; Hongbo Liu; Xianghua Li; Lizhong Xiong; Shiping Wang

    2008-01-01

    Accumulating information indicates that plant disease resistance signaling pathways frequently interact with other pathways regulating developmental processes or abiotic stress responses. However, the molecular mechanisms of these types of crosstalk remain poorly understood in most cases. Here we report that OsWRKY13, an activator of rice resistance to both bacterial and fungal pathogens, appears to function as a convergent point for crosstalk among the pathogen-induced salicylate-dependent defense pathway and five other physiologic pathways. Genome-wide analysis of the expression profiles of OsWRKY13-overexpressing lines suggests that OsWRKY13 directly or indirectly regulates the expression of more than 500 genes that are potentially involved in different physiologic processes according to the classification of the Gene Ontology database. By comparing the expression patterns of genes functioning in known pathways or cellular processes of pathogen infection and the phenotypes between OsWRKY13-overexpressing and wildtype plants, our data suggest that OsWRKY13 is also a regulator of other physiologic processes during pathogen infection. The OsWRKY13-associated disease resistance pathway synergistically interacts via OsWRKY13 with the glutathione/glutaredoxin system and flavonoid biosynthesis pathway to monitor redox homeostasis and to putatively enhance the biosynthesis of antimicrobial flavonoid phytoalexins, respectively, in OsWRKY13-overexpressing lines. Meanwhile, the OsWRKY13-associated disease resistance pathway appears to interact antagonistically with the SNAC1-mediated abiotic stress defense pathway, jasmonic acid signaling pathway, and terpenoid metabolism pathway via OsWRKY13 to suppress salt and cold defense responses as well as to putatively retard rice growth and development.

  2. A Bayesian Network Schema for Lessening Database Inference

    National Research Council Canada - National Science Library

    Chang, LiWu; Moskowitz, Ira S

    2001-01-01

    .... The authors introduce a formal schema for database inference analysis, based upon a Bayesian network structure, which identifies critical parameters involved in the inference problem and represents...

  3. Impact of noise on molecular network inference.

    Directory of Open Access Journals (Sweden)

    Radhakrishnan Nagarajan

    Full Text Available Molecular entities work in concert as a system and mediate phenotypic outcomes and disease states. There has been recent interest in modelling the associations between molecular entities from their observed expression profiles as networks using a battery of algorithms. These networks have proven to be useful abstractions of the underlying pathways and signalling mechanisms. Noise is ubiquitous in molecular data and can have a pronounced effect on the inferred network. Noise can be an outcome of several factors including: inherent stochastic mechanisms at the molecular level, variation in the abundance of molecules, heterogeneity, sensitivity of the biological assay or measurement artefacts prevalent especially in high-throughput settings. The present study investigates the impact of discrepancies in noise variance on pair-wise dependencies, conditional dependencies and constraint-based Bayesian network structure learning algorithms that incorporate conditional independence tests as a part of the learning process. Popular network motifs and fundamental connections, namely: (a common-effect, (b three-chain, and (c coherent type-I feed-forward loop (FFL are investigated. The choice of these elementary networks can be attributed to their prevalence across more complex networks. Analytical expressions elucidating the impact of discrepancies in noise variance on pairwise dependencies and conditional dependencies for special cases of these motifs are presented. Subsequently, the impact of noise on two popular constraint-based Bayesian network structure learning algorithms such as Grow-Shrink (GS and Incremental Association Markov Blanket (IAMB that implicitly incorporate tests for conditional independence is investigated. Finally, the impact of noise on networks inferred from publicly available single cell molecular expression profiles is investigated. While discrepancies in noise variance are overlooked in routine molecular network inference, the

  4. An integrative approach to inferring biologically meaningful gene modules

    Directory of Open Access Journals (Sweden)

    Wang Kai

    2011-07-01

    Full Text Available Abstract Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  5. Gene expression inference with deep learning.

    Science.gov (United States)

    Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

    2016-06-15

    Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Inferring epidemic network topology from surveillance data.

    Directory of Open Access Journals (Sweden)

    Xiang Wan

    Full Text Available The transmission of infectious diseases can be affected by many or even hidden factors, making it difficult to accurately predict when and where outbreaks may emerge. One approach at the moment is to develop and deploy surveillance systems in an effort to detect outbreaks as timely as possible. This enables policy makers to modify and implement strategies for the control of the transmission. The accumulated surveillance data including temporal, spatial, clinical, and demographic information, can provide valuable information with which to infer the underlying epidemic networks. Such networks can be quite informative and insightful as they characterize how infectious diseases transmit from one location to another. The aim of this work is to develop a computational model that allows inferences to be made regarding epidemic network topology in heterogeneous populations. We apply our model on the surveillance data from the 2009 H1N1 pandemic in Hong Kong. The inferred epidemic network displays significant effect on the propagation of infectious diseases.

  7. Quantum Enhanced Inference in Markov Logic Networks.

    Science.gov (United States)

    Wittek, Peter; Gogolin, Christian

    2017-04-19

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  8. Facility Activity Inference Using Radiation Networks

    Energy Technology Data Exchange (ETDEWEB)

    Rao, Nageswara S. [ORNL; Ramirez Aviles, Camila A. [ORNL

    2017-11-01

    We consider the problem of inferring the operational status of a reactor facility using measurements from a radiation sensor network deployed around the facility’s ventilation off-gas stack. The intensity of stack emissions decays with distance, and the sensor counts or measurements are inherently random with parameters determined by the intensity at the sensor’s location. We utilize the measurements to estimate the intensity at the stack, and use it in a one-sided Sequential Probability Ratio Test (SPRT) to infer on/off status of the reactor. We demonstrate the superior performance of this method over conventional majority fusers and individual sensors using (i) test measurements from a network of 21 NaI detectors, and (ii) effluence measurements collected at the stack of a reactor facility. We also analytically establish the superior detection performance of the network over individual sensors with fixed and adaptive thresholds by utilizing the Poisson distribution of the counts. We quantify the performance improvements of the network detection over individual sensors using the packing number of the intensity space.

  9. Quantum Enhanced Inference in Markov Logic Networks

    Science.gov (United States)

    Wittek, Peter; Gogolin, Christian

    2017-04-01

    Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.

  10. Network inference via adaptive optimal design

    Directory of Open Access Journals (Sweden)

    Stigter Johannes D

    2012-09-01

    Full Text Available Abstract Background Current research in network reverse engineering for genetic or metabolic networks very often does not include a proper experimental and/or input design. In this paper we address this issue in more detail and suggest a method that includes an iterative design of experiments based, on the most recent data that become available. The presented approach allows a reliable reconstruction of the network and addresses an important issue, i.e., the analysis and the propagation of uncertainties as they exist in both the data and in our own knowledge. These two types of uncertainties have their immediate ramifications for the uncertainties in the parameter estimates and, hence, are taken into account from the very beginning of our experimental design. Findings The method is demonstrated for two small networks that include a genetic network for mRNA synthesis and degradation and an oscillatory network describing a molecular network underlying adenosine 3’-5’ cyclic monophosphate (cAMP as observed in populations of Dyctyostelium cells. In both cases a substantial reduction in parameter uncertainty was observed. Extension to larger scale networks is possible but needs a more rigorous parameter estimation algorithm that includes sparsity as a constraint in the optimization procedure. Conclusion We conclude that a careful experiment design very often (but not always pays off in terms of reliability in the inferred network topology. For large scale networks a better parameter estimation algorithm is required that includes sparsity as an additional constraint. These algorithms are available in the literature and can also be used in an adaptive optimal design setting as demonstrated in this paper.

  11. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.; Voss, U.; Wilson, M.; Bennett, M.; Byrne, H.; De Smet, I.; Hodgman, C.; King, J.

    2013-01-01

    thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based

  12. A Network Inference Workflow Applied to Virulence-Related Processes in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Weller, Jennifer B.; Khoshnevis, Saeed; Shi, Liang; McDermott, Jason E.

    2009-04-20

    Inference of the structure of mRNA transcriptional regulatory networks, protein regulatory or interaction networks, and protein activation/inactivation-based signal transduction networks are critical tasks in systems biology. In this article we discuss a workflow for the reconstruction of parts of the transcriptional regulatory network of the pathogenic bacterium Salmonella typhimurium based on the information contained in sets of microarray gene expression data now available for that organism, and describe our results obtained by following this workflow. The primary tool is one of the network inference algorithms deployed in the Software Environment for BIological Network Inference (SEBINI). Specifically, we selected the algorithm called Context Likelihood of Relatedness (CLR), which uses the mutual information contained in the gene expression data to infer regulatory connections. The associated analysis pipeline automatically stores the inferred edges from the CLR runs within SEBINI and, upon request, transfers the inferred edges into either Cytoscape or the plug-in Collective Analysis of Biological of Biological Interaction Networks (CABIN) tool for further post-analysis of the inferred regulatory edges. The following article presents the outcome of this workflow, as well as the protocols followed for microarray data collection, data cleansing, and network inference. Our analysis revealed several interesting interactions, functional groups, metabolic pathways, and regulons in S. typhimurium.

  13. Inferring general relations between network characteristics from specific network ensembles.

    Science.gov (United States)

    Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan

    2012-01-01

    Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

  14. Probabilistic Inference of Biological Networks via Data Integration

    Directory of Open Access Journals (Sweden)

    Mark F. Rogers

    2015-01-01

    Full Text Available There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

  15. Connectivity in the yeast cell cycle transcription network: inferences from neural networks.

    Directory of Open Access Journals (Sweden)

    Christopher E Hart

    2006-12-01

    Full Text Available A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico "mutation" to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that "network-local discrimination" occurs when regulatory connections (here between MBF and target genes are explicitly disfavored in one network module (G2, relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of

  16. Mathematical inference and control of molecular networks from perturbation experiments

    Science.gov (United States)

    Mohammed-Rasheed, Mohammed

    One of the main challenges facing biologists and mathematicians in the post genomic era is to understand the behavior of molecular networks and harness this understanding into an educated intervention of the cell. The cell maintains its function via an elaborate network of interconnecting positive and negative feedback loops of genes, RNA and proteins that send different signals to a large number of pathways and molecules. These structures are referred to as genetic regulatory networks (GRNs) or molecular networks. GRNs can be viewed as dynamical systems with inherent properties and mechanisms, such as steady-state equilibriums and stability, that determine the behavior of the cell. The biological relevance of the mathematical concepts are important as they may predict the differentiation of a stem cell, the maintenance of a normal cell, the development of cancer and its aberrant behavior, and the design of drugs and response to therapy. Uncovering the underlying GRN structure from gene/protein expression data, e.g., microarrays or perturbation experiments, is called inference or reverse engineering of the molecular network. Because of the high cost and time consuming nature of biological experiments, the number of available measurements or experiments is very small compared to the number of molecules (genes, RNA and proteins). In addition, the observations are noisy, where the noise is due to the measurements imperfections as well as the inherent stochasticity of genetic expression levels. Intra-cellular activities and extra-cellular environmental attributes are also another source of variability. Thus, the inference of GRNs is, in general, an under-determined problem with a highly noisy set of observations. The ultimate goal of GRN inference and analysis is to be able to intervene within the network, in order to force it away from undesirable cellular states and into desirable ones. However, it remains a major challenge to design optimal intervention strategies

  17. Probabilistic logic networks a comprehensive framework for uncertain inference

    CERN Document Server

    Goertzel, Ben; Goertzel, Izabela Freire; Heljakka, Ari

    2008-01-01

    This comprehensive book describes Probabilistic Logic Networks (PLN), a novel conceptual, mathematical and computational approach to uncertain inference. A broad scope of reasoning types are considered.

  18. fastBMA: scalable network inference and transitive reduction.

    Science.gov (United States)

    Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee

    2017-10-01

    Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.

  19. Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Ji Wei

    2010-10-01

    Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.

  20. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    Science.gov (United States)

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  1. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  2. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks.

    Science.gov (United States)

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A; Kellis, Manolis

    2012-07-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.

  3. Inference of Low and High-Grade Glioma Gene Regulatory Networks Delineates the Role of Rnd3 in Establishing Multiple Hallmarks of Cancer.

    Directory of Open Access Journals (Sweden)

    Kim Clarke

    2015-07-01

    Full Text Available Gliomas are a highly heterogeneous group of brain tumours that are refractory to treatment, highly invasive and pro-angiogenic. Glioblastoma patients have an average survival time of less than 15 months. Understanding the molecular basis of different grades of glioma, from well differentiated, low-grade tumours to high-grade tumours, is a key step in defining new therapeutic targets. Here we use a data-driven approach to learn the structure of gene regulatory networks from observational data and use the resulting models to formulate hypothesis on the molecular determinants of glioma stage. Remarkably, integration of available knowledge with functional genomics datasets representing clinical and pre-clinical studies reveals important properties within the regulatory circuits controlling low and high-grade glioma. Our analyses first show that low and high-grade gliomas are characterised by a switch in activity of two subsets of Rho GTPases. The first one is involved in maintaining normal glial cell function, while the second is linked to the establishment of multiple hallmarks of cancer. Next, the development and application of a novel data integration methodology reveals novel functions of RND3 in controlling glioma cell migration, invasion, proliferation, angiogenesis and clinical outcome.

  4. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling.

    Directory of Open Access Journals (Sweden)

    Junha Shin

    Full Text Available Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co

  5. Inference of Transcriptional Network for Pluripotency in Mouse Embryonic Stem Cells

    International Nuclear Information System (INIS)

    Aburatani, S

    2015-01-01

    In embryonic stem cells, various transcription factors (TFs) maintain pluripotency. To gain insights into the regulatory system controlling pluripotency, I inferred the regulatory relationships between the TFs expressed in ES cells. In this study, I applied a method based on structural equation modeling (SEM), combined with factor analysis, to 649 expression profiles of 19 TF genes measured in mouse Embryonic Stem Cells (ESCs). The factor analysis identified 19 TF genes that were regulated by several unmeasured factors. Since the known cell reprogramming TF genes (Pou5f1, Sox2 and Nanog) are regulated by different factors, each estimated factor is considered to be an input for signal transduction to control pluripotency in mouse ESCs. In the inferred network model, TF proteins were also arranged as unmeasured factors that control other TFs. The interpretation of the inferred network model revealed the regulatory mechanism for controlling pluripotency in ES cells

  6. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    Science.gov (United States)

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  7. Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium

    Energy Technology Data Exchange (ETDEWEB)

    McDermott, Jason E.; Taylor, Ronald C.; Yoon, Hyunjin; Heffron, Fred

    2009-02-01

    Recent advances in experimental methods have provided sufficient data to consider systems as large networks of interconnected components. High-throughput determination of protein-protein interaction networks has led to the observation that topological bottlenecks, that is proteins defined by high centrality in the network, are enriched in proteins with systems-level phenotypes such as essentiality. Global transcriptional profiling by microarray analysis has been used extensively to characterize systems, for example, cellular response to environmental conditions and genetic mutations. These transcriptomic datasets have been used to infer regulatory and functional relationship networks based on co-regulation. We use the context likelihood of relatedness (CLR) method to infer networks from two datasets gathered from the pathogen Salmonella typhimurium; one under a range of environmental culture conditions and the other from deletions of 15 regulators found to be essential in virulence. Bottleneck nodes were identified from these inferred networks and we show that these nodes are significantly more likely to be essential for virulence than their non-bottleneck counterparts. A network generated using Pearson correlation did not display this behavior. Overall this study demonstrates that topology of networks inferred from global transcriptional profiles provides information about the systems-level roles of bottleneck genes. Analysis of the differences between the two CLR-derived networks suggests that the bottleneck nodes are either mediators of transitions between system states or sentinels that reflect the dynamics of these transitions.

  8. Congested Link Inference Algorithms in Dynamic Routing IP Network

    Directory of Open Access Journals (Sweden)

    Yu Chen

    2017-01-01

    Full Text Available The performance descending of current congested link inference algorithms is obviously in dynamic routing IP network, such as the most classical algorithm CLINK. To overcome this problem, based on the assumptions of Markov property and time homogeneity, we build a kind of Variable Structure Discrete Dynamic Bayesian (VSDDB network simplified model of dynamic routing IP network. Under the simplified VSDDB model, based on the Bayesian Maximum A Posteriori (BMAP and Rest Bayesian Network Model (RBNM, we proposed an Improved CLINK (ICLINK algorithm. Considering the concurrent phenomenon of multiple link congestion usually happens, we also proposed algorithm CLILRS (Congested Link Inference algorithm based on Lagrangian Relaxation Subgradient to infer the set of congested links. We validated our results by the experiments of analogy, simulation, and actual Internet.

  9. Bayesian Inference and Online Learning in Poisson Neuronal Networks.

    Science.gov (United States)

    Huang, Yanping; Rao, Rajesh P N

    2016-08-01

    Motivated by the growing evidence for Bayesian computation in the brain, we show how a two-layer recurrent network of Poisson neurons can perform both approximate Bayesian inference and learning for any hidden Markov model. The lower-layer sensory neurons receive noisy measurements of hidden world states. The higher-layer neurons infer a posterior distribution over world states via Bayesian inference from inputs generated by sensory neurons. We demonstrate how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in a higher-layer neuron represents a sample of a particular hidden world state. The spiking activity across the neural population approximates the posterior distribution over hidden states. In this model, variability in spiking is regarded not as a nuisance but as an integral feature that provides the variability necessary for sampling during inference. We demonstrate how the network can learn the likelihood model, as well as the transition probabilities underlying the dynamics, using a Hebbian learning rule. We present results illustrating the ability of the network to perform inference and learning for arbitrary hidden Markov models.

  10. Inference of Transcription Regulatory Network in Low Phytic Acid Soybean Seeds

    Directory of Open Access Journals (Sweden)

    Neelam Redekar

    2017-11-01

    Full Text Available A dominant loss of function mutation in myo-inositol phosphate synthase (MIPS gene and recessive loss of function mutations in two multidrug resistant protein type-ABC transporter genes not only reduce the seed phytic acid levels in soybean, but also affect the pathways associated with seed development, ultimately resulting in low emergence. To understand the regulatory mechanisms and identify key genes that intervene in the seed development process in low phytic acid crops, we performed computational inference of gene regulatory networks in low and normal phytic acid soybeans using a time course transcriptomic data and multiple network inference algorithms. We identified a set of putative candidate transcription factors and their regulatory interactions with genes that have functions in myo-inositol biosynthesis, auxin-ABA signaling, and seed dormancy. We evaluated the performance of our unsupervised network inference method by comparing the predicted regulatory network with published regulatory interactions in Arabidopsis. Some contrasting regulatory interactions were observed in low phytic acid mutants compared to non-mutant lines. These findings provide important hypotheses on expression regulation of myo-inositol metabolism and phytohormone signaling in developing low phytic acid soybeans. The computational pipeline used for unsupervised network learning in this study is provided as open source software and is freely available at https://lilabatvt.github.io/LPANetwork/.

  11. Correlated measurement error hampers association network inference

    NARCIS (Netherlands)

    Kaduk, M.; Hoefsloot, H.C.J.; Vis, D.J.; Reijmers, T.; Greef, J. van der; Smilde, A.K.; Hendriks, M.M.W.B.

    2014-01-01

    Modern chromatography-based metabolomics measurements generate large amounts of data in the form of abundances of metabolites. An increasingly popular way of representing and analyzing such data is by means of association networks. Ideally, such a network can be interpreted in terms of the

  12. MPE inference in conditional linear gaussian networks

    DEFF Research Database (Denmark)

    Salmerón, Antonio; Rumí, Rafael; Langseth, Helge

    2015-01-01

    Given evidence on a set of variables in a Bayesian network, the most probable explanation (MPE) is the problem of finding a configuration of the remaining variables with maximum posterior probability. This problem has previously been addressed for discrete Bayesian networks and can be solved using...

  13. An analysis pipeline for the inference of protein-protein interaction networks

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.

    2009-12-01

    An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: ronald.taylor@pnl.gov. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: ds.daly@pnl.gov. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: mudita.singhal@pnl.gov.

  14. Inferring regulatory networks from expression data using tree-based methods.

    Directory of Open Access Journals (Sweden)

    Vân Anh Huynh-Thu

    2010-09-01

    Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.

  15. Automatic physical inference with information maximizing neural networks

    Science.gov (United States)

    Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.

    2018-04-01

    Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.

  16. Causal Inference and Explaining Away in a Spiking Network

    Science.gov (United States)

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-01-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426

  17. Methodology for the inference of gene function from phenotype data.

    Science.gov (United States)

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and

  18. IPV6 Network Infrastructure and Stability Inference

    Science.gov (United States)

    2014-09-01

    the Too Big Trick ( TBT ) to induce the remote targets to return fragmented responses. By evaluating the responses, the uptime for approximately 35% of...Internet. Approximately 50,000 IPv6 addresses were probed continuously from March to June 2014, using the Too Big Trick ( TBT ) to induce the remote targets...SNMP Simple Network Management Protocol TSval Timestamp Value TCP Transmission Control Protocol TBT Too Big Trick USG U.S. government xiv

  19. Network Completion for Static Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Natsu Nakajima

    2014-01-01

    Full Text Available We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data.

  20. Inferring influenza global transmission networks without complete phylogenetic information.

    Science.gov (United States)

    Aris-Brosou, Stéphane

    2014-03-01

    Influenza is one of the most severe respiratory infections affecting humans throughout the world, yet the dynamics of its global transmission network are still contentious. Here, I describe a novel combination of phylogenetics, time series, and graph theory to analyze 14.25 years of data stratified in space and in time, focusing on the main target of the human immune response, the hemagglutinin gene. While bypassing the complete phylogenetic inference of huge data sets, the method still extracts information suggesting that waves of genetic or of nucleotide diversity circulate continuously around the globe for subtypes that undergo sustained transmission over several seasons, such as H3N2 and pandemic H1N1/09, while diversity of prepandemic H1N1 viruses had until 2009 a noncontinuous transmission pattern consistent with a source/sink model. Irrespective of the shift in the structure of H1N1 diversity circulation with the emergence of the pandemic H1N1/09 strain, US prevalence peaks during the winter months when genetic diversity is at its lowest. This suggests that a dominant strain is generally responsible for epidemics and that monitoring genetic and/or nucleotide diversity in real time could provide public health agencies with an indirect estimate of prevalence.

  1. Inferring topologies of complex networks with hidden variables.

    Science.gov (United States)

    Wu, Xiaoqun; Wang, Weihan; Zheng, Wei Xing

    2012-10-01

    Network topology plays a crucial role in determining a network's intrinsic dynamics and function, thus understanding and modeling the topology of a complex network will lead to greater knowledge of its evolutionary mechanisms and to a better understanding of its behaviors. In the past few years, topology identification of complex networks has received increasing interest and wide attention. Many approaches have been developed for this purpose, including synchronization-based identification, information-theoretic methods, and intelligent optimization algorithms. However, inferring interaction patterns from observed dynamical time series is still challenging, especially in the absence of knowledge of nodal dynamics and in the presence of system noise. The purpose of this work is to present a simple and efficient approach to inferring the topologies of such complex networks. The proposed approach is called "piecewise partial Granger causality." It measures the cause-effect connections of nonlinear time series influenced by hidden variables. One commonly used testing network, two regular networks with a few additional links, and small-world networks are used to evaluate the performance and illustrate the influence of network parameters on the proposed approach. Application to experimental data further demonstrates the validity and robustness of our method.

  2. Effective network inference through multivariate information transfer estimation

    Science.gov (United States)

    Dahlqvist, Carl-Henrik; Gnabo, Jean-Yves

    2018-06-01

    Network representation has steadily gained in popularity over the past decades. In many disciplines such as finance, genetics, neuroscience or human travel to cite a few, the network may not directly be observable and needs to be inferred from time-series data, leading to the issue of separating direct interactions between two entities forming the network from indirect interactions coming through its remaining part. Drawing on recent contributions proposing strategies to deal with this problem such as the so-called "global silencing" approach of Barzel and Barabasi or "network deconvolution" of Feizi et al. (2013), we propose a novel methodology to infer an effective network structure from multivariate conditional information transfers. Its core principal is to test the information transfer between two nodes through a step-wise approach by conditioning the transfer for each pair on a specific set of relevant nodes as identified by our algorithm from the rest of the network. The methodology is model free and can be applied to high-dimensional networks with both inter-lag and intra-lag relationships. It outperforms state-of-the-art approaches for eliminating the redundancies and more generally retrieving simulated artificial networks in our Monte-Carlo experiments. We apply the method to stock market data at different frequencies (15 min, 1 h, 1 day) to retrieve the network of US largest financial institutions and then document how bank's centrality measurements relate to bank's systemic vulnerability.

  3. A Local Poisson Graphical Model for inferring networks from sequencing data.

    Science.gov (United States)

    Allen, Genevera I; Liu, Zhandong

    2013-09-01

    Gaussian graphical models, a class of undirected graphs or Markov Networks, are often used to infer gene networks based on microarray expression data. Many scientists, however, have begun using high-throughput sequencing technologies such as RNA-sequencing or next generation sequencing to measure gene expression. As the resulting data consists of counts of sequencing reads for each gene, Gaussian graphical models are not optimal for this discrete data. In this paper, we propose a novel method for inferring gene networks from sequencing data: the Local Poisson Graphical Model. Our model assumes a Local Markov property where each variable conditional on all other variables is Poisson distributed. We develop a neighborhood selection algorithm to fit our model locally by performing a series of l1 penalized Poisson, or log-linear, regressions. This yields a fast parallel algorithm for estimating networks from next generation sequencing data. In simulations, we illustrate the effectiveness of our methods for recovering network structure from count data. A case study on breast cancer microRNAs (miRNAs), a novel application of graphical models, finds known regulators of breast cancer genes and discovers novel miRNA clusters and hubs that are targets for future research.

  4. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    Science.gov (United States)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  5. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    Science.gov (United States)

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  6. Inference of the Genetic Network Regulating Lateral Root Initiation in Arabidopsis thaliana

    KAUST Repository

    Muraro, D.

    2013-01-01

    Regulation of gene expression is crucial for organism growth, and it is one of the challenges in systems biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyze two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, and assess causality of their regulatory interactions by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation. © 2004-2012 IEEE.

  7. Inference of financial networks using the normalised mutual information rate

    Science.gov (United States)

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics. PMID:29420644

  8. Inference of financial networks using the normalised mutual information rate.

    Science.gov (United States)

    Goh, Yong Kheng; Hasim, Haslifah M; Antonopoulos, Chris G

    2018-01-01

    In this paper, we study data from financial markets, using the normalised Mutual Information Rate. We show how to use it to infer the underlying network structure of interrelations in the foreign currency exchange rates and stock indices of 15 currency areas. We first present the mathematical method and discuss its computational aspects, and apply it to artificial data from chaotic dynamics and to correlated normal-variates data. We then apply the method to infer the structure of the financial system from the time-series of currency exchange rates and stock indices. In particular, we study and reveal the interrelations among the various foreign currency exchange rates and stock indices in two separate networks, of which we also study their structural properties. Our results show that both inferred networks are small-world networks, sharing similar properties and having differences in terms of assortativity. Importantly, our work shows that global economies tend to connect with other economies world-wide, rather than creating small groups of local economies. Finally, the consistent interrelations depicted among the 15 currency areas are further supported by a discussion from the viewpoint of economics.

  9. Functional Module Analysis for Gene Coexpression Networks with Network Integration.

    Science.gov (United States)

    Zhang, Shuqin; Zhao, Hongyu; Ng, Michael K

    2015-01-01

    Network has been a general tool for studying the complex interactions between different genes, proteins, and other small molecules. Module as a fundamental property of many biological networks has been widely studied and many computational methods have been proposed to identify the modules in an individual network. However, in many cases, a single network is insufficient for module analysis due to the noise in the data or the tuning of parameters when building the biological network. The availability of a large amount of biological networks makes network integration study possible. By integrating such networks, more informative modules for some specific disease can be derived from the networks constructed from different tissues, and consistent factors for different diseases can be inferred. In this paper, we have developed an effective method for module identification from multiple networks under different conditions. The problem is formulated as an optimization model, which combines the module identification in each individual network and alignment of the modules from different networks together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying modules in multiple networks are different in simulation studies. We also applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. We identified 13 modules with three complete subgraphs, and 11 modules with two complete subgraphs, respectively. The modules were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most modules for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further studying the modules experimentally.

  10. Statistical inference to advance network models in epidemiology.

    Science.gov (United States)

    Welch, David; Bansal, Shweta; Hunter, David R

    2011-03-01

    Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. Inferring personal economic status from social network location

    Science.gov (United States)

    Luo, Shaojun; Morone, Flaviano; Sarraute, Carlos; Travizano, Matías; Makse, Hernán A.

    2017-05-01

    It is commonly believed that patterns of social ties affect individuals' economic status. Here we translate this concept into an operational definition at the network level, which allows us to infer the economic well-being of individuals through a measure of their location and influence in the social network. We analyse two large-scale sources: telecommunications and financial data of a whole country's population. Our results show that an individual's location, measured as the optimal collective influence to the structural integrity of the social network, is highly correlated with personal economic status. The observed social network patterns of influence mimic the patterns of economic inequality. For pragmatic use and validation, we carry out a marketing campaign that shows a threefold increase in response rate by targeting individuals identified by our social network metrics as compared to random targeting. Our strategy can also be useful in maximizing the effects of large-scale economic stimulus policies.

  12. Sensorimotor Network Crucial for Inferring Amusement from Smiles.

    Science.gov (United States)

    Paracampo, Riccardo; Tidoni, Emmanuele; Borgomaneri, Sara; di Pellegrino, Giuseppe; Avenanti, Alessio

    2017-11-01

    Understanding whether another's smile reflects authentic amusement is a key challenge in social life, yet, the neural bases of this ability have been largely unexplored. Here, we combined transcranial magnetic stimulation (TMS) with a novel empathic accuracy (EA) task to test whether sensorimotor and mentalizing networks are critical for understanding another's amusement. Participants were presented with dynamic displays of smiles and explicitly requested to infer whether the smiling individual was feeling authentic amusement or not. TMS over sensorimotor regions representing the face (i.e., in the inferior frontal gyrus (IFG) and ventral primary somatosensory cortex (SI)), disrupted the ability to infer amusement authenticity from observed smiles. The same stimulation did not affect performance on a nonsocial task requiring participants to track the smiling expression but not to infer amusement. Neither TMS over prefrontal and temporo-parietal areas supporting mentalizing, nor peripheral control stimulations, affected performance on either task. Thus, motor and somatosensory circuits for controlling and sensing facial movements are causally essential for inferring amusement from another's smile. These findings highlight the functional relevance of IFG and SI to amusement understanding and suggest that EA abilities may be grounded in sensorimotor networks for moving and feeling the body. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. A Regulatory Network Analysis of Orphan Genes in Arabidopsis Thaliana

    Science.gov (United States)

    Singh, Pramesh; Chen, Tianlong; Arendsee, Zebulun; Wurtele, Eve S.; Bassler, Kevin E.

    Orphan genes, which are genes unique to each particular species, have recently drawn significant attention for their potential usefulness for organismal robustness. Their origin and regulatory interaction patterns remain largely undiscovered. Recently, methods that use the context likelihood of relatedness to infer a network followed by modularity maximizing community detection algorithms on the inferred network to find the functional structure of regulatory networks were shown to be effective. We apply improved versions of these methods to gene expression data from Arabidopsis thaliana, identify groups (clusters) of interacting genes with related patterns of expression and analyze the structure within those groups. Focusing on clusters that contain orphan genes, we compare the identified clusters to gene ontology (GO) terms, regulons, and pathway designations and analyze their hierarchical structure. We predict new regulatory interactions and unravel the structure of the regulatory interaction patterns of orphan genes. Work supported by the NSF through Grants DMR-1507371 and IOS-1546858.

  14. Sign Inference for Dynamic Signed Networks via Dictionary Learning

    Directory of Open Access Journals (Sweden)

    Yi Cen

    2013-01-01

    Full Text Available Mobile online social network (mOSN is a burgeoning research area. However, most existing works referring to mOSNs deal with static network structures and simply encode whether relationships among entities exist or not. In contrast, relationships in signed mOSNs can be positive or negative and may be changed with time and locations. Applying certain global characteristics of social balance, in this paper, we aim to infer the unknown relationships in dynamic signed mOSNs and formulate this sign inference problem as a low-rank matrix estimation problem. Specifically, motivated by the Singular Value Thresholding (SVT algorithm, a compact dictionary is selected from the observed dataset. Based on this compact dictionary, the relationships in the dynamic signed mOSNs are estimated via solving the formulated problem. Furthermore, the estimation accuracy is improved by employing a dictionary self-updating mechanism.

  15. An Algebraic Approach to Inference in Complex Networked Structures

    Science.gov (United States)

    2015-07-09

    44], [45],[46] where the shift is the elementary non-trivial filter that generates, under an appropriate notion of shift invariance, all linear ... elementary filter, and its output is a graph signal with the value at vertex n of the graph given approximately by a weighted linear combination of...AFRL-AFOSR-VA-TR-2015-0265 An Algebraic Approach to Inference in Complex Networked Structures Jose Moura CARNEGIE MELLON UNIVERSITY Final Report 07

  16. Inference of Transmission Network Structure from HIV Phylogenetic Trees.

    Science.gov (United States)

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; Britton, Tom; Leitner, Thomas

    2017-01-01

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.

  17. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    Science.gov (United States)

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Constructing an integrated gene similarity network for the identification of disease genes.

    Science.gov (United States)

    Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin

    2017-09-20

    Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .

  19. Inferring causal genomic alterations in breast cancer using gene expression data

    Science.gov (United States)

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  20. Inferring the role of transcription factors in regulatory networks

    Directory of Open Access Journals (Sweden)

    Le Borgne Michel

    2008-05-01

    Full Text Available Abstract Background Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. Results We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges, and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions, by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions. In addition, we report predictions for 14.5% of all interactions. Conclusion Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine

  1. Human synthetic lethal inference as potential anti-cancer target gene detection

    Directory of Open Access Journals (Sweden)

    Solé Ricard V

    2009-12-01

    Full Text Available Abstract Background Two genes are called synthetic lethal (SL if mutation of either alone is not lethal, but mutation of both leads to death or a significant decrease in organism's fitness. The detection of SL gene pairs constitutes a promising alternative for anti-cancer therapy. As cancer cells exhibit a large number of mutations, the identification of these mutated genes' SL partners may provide specific anti-cancer drug candidates, with minor perturbations to the healthy cells. Since existent SL data is mainly restricted to yeast screenings, the road towards human SL candidates is limited to inference methods. Results In the present work, we use phylogenetic analysis and database manipulation (BioGRID for interactions, Ensembl and NCBI for homology, Gene Ontology for GO attributes in order to reconstruct the phylogenetically-inferred SL gene network for human. In addition, available data on cancer mutated genes (COSMIC and Cancer Gene Census databases as well as on existent approved drugs (DrugBank database supports our selection of cancer-therapy candidates. Conclusions Our work provides a complementary alternative to the current methods for drug discovering and gene target identification in anti-cancer research. Novel SL screening analysis and the use of highly curated databases would contribute to improve the results of this methodology.

  2. Introduction: Cancer Gene Networks.

    Science.gov (United States)

    Clarke, Robert

    2017-01-01

    Constructing, evaluating, and interpreting gene networks generally sits within the broader field of systems biology, which continues to emerge rapidly, particular with respect to its application to understanding the complexity of signaling in the context of cancer biology. For the purposes of this volume, we take a broad definition of systems biology. Considering an organism or disease within an organism as a system, systems biology is the study of the integrated and coordinated interactions of the network(s) of genes, their variants both natural and mutated (e.g., polymorphisms, rearrangements, alternate splicing, mutations), their proteins and isoforms, and the organic and inorganic molecules with which they interact, to execute the biochemical reactions (e.g., as enzymes, substrates, products) that reflect the function of that system. Central to systems biology, and perhaps the only approach that can effectively manage the complexity of such systems, is the building of quantitative multiscale predictive models. The predictions of the models can vary substantially depending on the nature of the model and its inputoutput relationships. For example, a model may predict the outcome of a specific molecular reaction(s), a cellular phenotype (e.g., alive, dead, growth arrest, proliferation, and motility), a change in the respective prevalence of cell or subpopulations, a patient or patient subgroup outcome(s). Such models necessarily require computers. Computational modeling can be thought of as using machine learning and related tools to integrate the very high dimensional data generated from modern, high throughput omics technologies including genomics (next generation sequencing), transcriptomics (gene expression microarrays; RNAseq), metabolomics and proteomics (ultra high performance liquid chromatography, mass spectrometry), and "subomic" technologies to study the kinome, methylome, and others. Mathematical modeling can be thought of as the use of ordinary

  3. Inferring hidden causal relations between pathway members using reduced Google matrix of directed biological networks

    Science.gov (United States)

    2018-01-01

    Signaling pathways represent parts of the global biological molecular network which connects them into a seamless whole through complex direct and indirect (hidden) crosstalk whose structure can change during development or in pathological conditions. We suggest a novel methodology, called Googlomics, for the structural analysis of directed biological networks using spectral analysis of their Google matrices, using parallels with quantum scattering theory, developed for nuclear and mesoscopic physics and quantum chaos. We introduce analytical “reduced Google matrix” method for the analysis of biological network structure. The method allows inferring hidden causal relations between the members of a signaling pathway or a functionally related group of genes. We investigate how the structure of hidden causal relations can be reprogrammed as a result of changes in the transcriptional network layer during cancerogenesis. The suggested Googlomics approach rigorously characterizes complex systemic changes in the wiring of large causal biological networks in a computationally efficient way. PMID:29370181

  4. Mutational robustness of gene regulatory networks.

    Directory of Open Access Journals (Sweden)

    Aalt D J van Dijk

    Full Text Available Mutational robustness of gene regulatory networks refers to their ability to generate constant biological output upon mutations that change network structure. Such networks contain regulatory interactions (transcription factor-target gene interactions but often also protein-protein interactions between transcription factors. Using computational modeling, we study factors that influence robustness and we infer several network properties governing it. These include the type of mutation, i.e. whether a regulatory interaction or a protein-protein interaction is mutated, and in the case of mutation of a regulatory interaction, the sign of the interaction (activating vs. repressive. In addition, we analyze the effect of combinations of mutations and we compare networks containing monomeric with those containing dimeric transcription factors. Our results are consistent with available data on biological networks, for example based on evolutionary conservation of network features. As a novel and remarkable property, we predict that networks are more robust against mutations in monomer than in dimer transcription factors, a prediction for which analysis of conservation of DNA binding residues in monomeric vs. dimeric transcription factors provides indirect evidence.

  5. Convergent evolution of gene networks by single-gene duplications in higher eukaryotes

    OpenAIRE

    Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich

    2004-01-01

    By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix–loop–helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks e...

  6. Pathogenomic inference of virulence-associated genes in Leptospira interrogans.

    Directory of Open Access Journals (Sweden)

    Jason S Lehmann

    Full Text Available Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.

  7. Pathogenomic inference of virulence-associated genes in Leptospira interrogans.

    Science.gov (United States)

    Lehmann, Jason S; Fouts, Derrick E; Haft, Daniel H; Cannella, Anthony P; Ricaldi, Jessica N; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M; Matthias, Michael A

    2013-01-01

    Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.

  8. Qualitative reasoning for biological network inference from systematic perturbation experiments.

    Science.gov (United States)

    Badaloni, Silvana; Di Camillo, Barbara; Sambo, Francesco

    2012-01-01

    The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present Systematic Perturbation-Qualitative Reasoning (SPQR), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modeled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art.

  9. Efficient Reverse-Engineering of a Developmental Gene Regulatory Network

    Science.gov (United States)

    Cicin-Sain, Damjan; Ashyraliyev, Maksat; Jaeger, Johannes

    2012-01-01

    Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to

  10. Network inference from functional experimental data (Conference Presentation)

    Science.gov (United States)

    Desrosiers, Patrick; Labrecque, Simon; Tremblay, Maxime; Bélanger, Mathieu; De Dorlodot, Bertrand; Côté, Daniel C.

    2016-03-01

    Functional connectivity maps of neuronal networks are critical tools to understand how neurons form circuits, how information is encoded and processed by neurons, how memory is shaped, and how these basic processes are altered under pathological conditions. Current light microscopy allows to observe calcium or electrical activity of thousands of neurons simultaneously, yet assessing comprehensive connectivity maps directly from such data remains a non-trivial analytical task. There exist simple statistical methods, such as cross-correlation and Granger causality, but they only detect linear interactions between neurons. Other more involved inference methods inspired by information theory, such as mutual information and transfer entropy, identify more accurately connections between neurons but also require more computational resources. We carried out a comparative study of common connectivity inference methods. The relative accuracy and computational cost of each method was determined via simulated fluorescence traces generated with realistic computational models of interacting neurons in networks of different topologies (clustered or non-clustered) and sizes (10-1000 neurons). To bridge the computational and experimental works, we observed the intracellular calcium activity of live hippocampal neuronal cultures infected with the fluorescent calcium marker GCaMP6f. The spontaneous activity of the networks, consisting of 50-100 neurons per field of view, was recorded from 20 to 50 Hz on a microscope controlled by a homemade software. We implemented all connectivity inference methods in the software, which rapidly loads calcium fluorescence movies, segments the images, extracts the fluorescence traces, and assesses the functional connections (with strengths and directions) between each pair of neurons. We used this software to assess, in real time, the functional connectivity from real calcium imaging data in basal conditions, under plasticity protocols, and epileptic

  11. Phylogenetic relationships of Hemiptera inferred from mitochondrial and nuclear genes.

    Science.gov (United States)

    Song, Nan; Li, Hu; Cai, Wanzhi; Yan, Fengming; Wang, Jianyun; Song, Fan

    2016-11-01

    Here, we reconstructed the Hemiptera phylogeny based on the expanded mitochondrial protein-coding genes and the nuclear 18S rRNA gene, separately. The differential rates of change across lineages may associate with long-branch attraction (LBA) effect and result in conflicting estimates of phylogeny from different types of data. To reduce the potential effects of systematic biases on inferences of topology, various data coding schemes, site removal method, and different algorithms were utilized in phylogenetic reconstruction. We show that the outgroups Phthiraptera, Thysanoptera, and the ingroup Sternorrhyncha share similar base composition, and exhibit "long branches" relative to other hemipterans. Thus, the long-branch attraction between these groups is suspected to cause the failure of recovering Hemiptera under the homogeneous model. In contrast, a monophyletic Hemiptera is supported when heterogeneous model is utilized in the analysis. Although higher level phylogenetic relationships within Hemiptera remain to be answered, consensus between analyses is beginning to converge on a stable phylogeny.

  12. The Network Completion Problem: Inferring Missing Nodes and Edges in Networks

    Energy Technology Data Exchange (ETDEWEB)

    Kim, M; Leskovec, J

    2011-11-14

    Network structures, such as social networks, web graphs and networks from systems biology, play important roles in many areas of science and our everyday lives. In order to study the networks one needs to first collect reliable large scale network data. While the social and information networks have become ubiquitous, the challenge of collecting complete network data still persists. Many times the collected network data is incomplete with nodes and edges missing. Commonly, only a part of the network can be observed and we would like to infer the unobserved part of the network. We address this issue by studying the Network Completion Problem: Given a network with missing nodes and edges, can we complete the missing part? We cast the problem in the Expectation Maximization (EM) framework where we use the observed part of the network to fit a model of network structure, and then we estimate the missing part of the network using the model, re-estimate the parameters and so on. We combine the EM with the Kronecker graphs model and design a scalable Metropolized Gibbs sampling approach that allows for the estimation of the model parameters as well as the inference about missing nodes and edges of the network. Experiments on synthetic and several real-world networks show that our approach can effectively recover the network even when about half of the nodes in the network are missing. Our algorithm outperforms not only classical link-prediction approaches but also the state of the art Stochastic block modeling approach. Furthermore, our algorithm easily scales to networks with tens of thousands of nodes.

  13. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    Science.gov (United States)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  14. Inference of the oxidative stress network in Anopheles stephensi upon Plasmodium infection.

    Science.gov (United States)

    Shrinet, Jatin; Nandal, Umesh Kumar; Adak, Tridibes; Bhatnagar, Raj K; Sunil, Sujatha

    2014-01-01

    Ookinete invasion of Anopheles midgut is a critical step for malaria transmission; the parasite numbers drop drastically and practically reach a minimum during the parasite's whole life cycle. At this stage, the parasite as well as the vector undergoes immense oxidative stress. Thereafter, the vector undergoes oxidative stress at different time points as the parasite invades its tissues during the parasite development. The present study was undertaken to reconstruct the network of differentially expressed genes involved in oxidative stress in Anopheles stephensi during Plasmodium development and maturation in the midgut. Using high throughput next generation sequencing methods, we generated the transcriptome of the An. stephensi midgut during Plasmodium vinckei petteri oocyst invasion of the midgut epithelium. Further, we utilized large datasets available on public domain on Anopheles during Plasmodium ookinete invasion and Drosophila datasets and arrived upon clusters of genes that may play a role in oxidative stress. Finally, we used support vector machines for the functional prediction of the un-annotated genes of An. stephensi. Integrating the results from all the different data analyses, we identified a total of 516 genes that were involved in oxidative stress in An. stephensi during Plasmodium development. The significantly regulated genes were further extracted from this gene cluster and used to infer an oxidative stress network of An. stephensi. Using system biology approaches, we have been able to ascertain the role of several putative genes in An. stephensi with respect to oxidative stress. Further experimental validations of these genes are underway.

  15. Convergent evolution of gene networks by single-gene duplications in higher eukaryotes.

    Science.gov (United States)

    Amoutzias, Gregory D; Robertson, David L; Oliver, Stephen G; Bornberg-Bauer, Erich

    2004-03-01

    By combining phylogenetic, proteomic and structural information, we have elucidated the evolutionary driving forces for the gene-regulatory interaction networks of basic helix-loop-helix transcription factors. We infer that recurrent events of single-gene duplication and domain rearrangement repeatedly gave rise to distinct networks with almost identical hub-based topologies, and multiple activators and repressors. We thus provide the first empirical evidence for scale-free protein networks emerging through single-gene duplications, the dominant importance of molecular modularity in the bottom-up construction of complex biological entities, and the convergent evolution of networks.

  16. Learning gene regulatory networks from gene expression data using weighted consensus

    KAUST Repository

    Fujii, Chisato; Kuwahara, Hiroyuki; Yu, Ge; Guo, Lili; Gao, Xin

    2016-01-01

    An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.

  17. Learning gene regulatory networks from gene expression data using weighted consensus

    KAUST Repository

    Fujii, Chisato

    2016-08-25

    An accurate determination of the network structure of gene regulatory systems from high-throughput gene expression data is an essential yet challenging step in studying how the expression of endogenous genes is controlled through a complex interaction of gene products and DNA. While numerous methods have been proposed to infer the structure of gene regulatory networks, none of them seem to work consistently over different data sets with high accuracy. A recent study to compare gene network inference methods showed that an average-ranking-based consensus method consistently performs well under various settings. Here, we propose a linear programming-based consensus method for the inference of gene regulatory networks. Unlike the average-ranking-based one, which treats the contribution of each individual method equally, our new consensus method assigns a weight to each method based on its credibility. As a case study, we applied the proposed consensus method on synthetic and real microarray data sets, and compared its performance to that of the average-ranking-based consensus and individual inference methods. Our results show that our weighted consensus method achieves superior performance over the unweighted one, suggesting that assigning weights to different individual methods rather than giving them equal weights improves the accuracy. © 2016 Elsevier B.V.

  18. Supervised dictionary learning for inferring concurrent brain networks.

    Science.gov (United States)

    Zhao, Shijie; Han, Junwei; Lv, Jinglei; Jiang, Xi; Hu, Xintao; Zhao, Yu; Ge, Bao; Guo, Lei; Liu, Tianming

    2015-10-01

    Task-based fMRI (tfMRI) has been widely used to explore functional brain networks via predefined stimulus paradigm in the fMRI scan. Traditionally, the general linear model (GLM) has been a dominant approach to detect task-evoked networks. However, GLM focuses on task-evoked or event-evoked brain responses and possibly ignores the intrinsic brain functions. In comparison, dictionary learning and sparse coding methods have attracted much attention recently, and these methods have shown the promise of automatically and systematically decomposing fMRI signals into meaningful task-evoked and intrinsic concurrent networks. Nevertheless, two notable limitations of current data-driven dictionary learning method are that the prior knowledge of task paradigm is not sufficiently utilized and that the establishment of correspondences among dictionary atoms in different brains have been challenging. In this paper, we propose a novel supervised dictionary learning and sparse coding method for inferring functional networks from tfMRI data, which takes both of the advantages of model-driven method and data-driven method. The basic idea is to fix the task stimulus curves as predefined model-driven dictionary atoms and only optimize the other portion of data-driven dictionary atoms. Application of this novel methodology on the publicly available human connectome project (HCP) tfMRI datasets has achieved promising results.

  19. Perturbation Biology: Inferring Signaling Networks in Cellular Systems

    Science.gov (United States)

    Miller, Martin L.; Gauthier, Nicholas P.; Jing, Xiaohong; Kaushik, Poorvi; He, Qin; Mills, Gordon; Solit, David B.; Pratilas, Christine A.; Weigt, Martin; Braunstein, Alfredo; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2013-01-01

    We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology. PMID:24367245

  20. Griffin: A Tool for Symbolic Inference of Synchronous Boolean Molecular Networks

    Directory of Open Access Journals (Sweden)

    Stalin Muñoz

    2018-03-01

    Full Text Available Boolean networks are important models of biochemical systems, located at the high end of the abstraction spectrum. A number of Boolean gene networks have been inferred following essentially the same method. Such a method first considers experimental data for a typically underdetermined “regulation” graph. Next, Boolean networks are inferred by using biological constraints to narrow the search space, such as a desired set of (fixed-point or cyclic attractors. We describe Griffin, a computer tool enhancing this method. Griffin incorporates a number of well-established algorithms, such as Dubrova and Teslenko's algorithm for finding attractors in synchronous Boolean networks. In addition, a formal definition of regulation allows Griffin to employ “symbolic” techniques, able to represent both large sets of network states and Boolean constraints. We observe that when the set of attractors is required to be an exact set, prohibiting additional attractors, a naive Boolean coding of this constraint may be unfeasible. Such cases may be intractable even with symbolic methods, as the number of Boolean constraints may be astronomically large. To overcome this problem, we employ an Artificial Intelligence technique known as “clause learning” considerably increasing Griffin's scalability. Without clause learning only toy examples prohibiting additional attractors are solvable: only one out of seven queries reported here is answered. With clause learning, by contrast, all seven queries are answered. We illustrate Griffin with three case studies drawn from the Arabidopsis thaliana literature. Griffin is available at: http://turing.iimas.unam.mx/griffin.

  1. Optimal structural inference of signaling pathways from unordered and overlapping gene sets.

    Science.gov (United States)

    Acharya, Lipi R; Judeh, Thair; Wang, Guangdi; Zhu, Dongxiao

    2012-02-15

    A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a 'search and score' network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and

  2. ERC analysis: web-based inference of gene function via evolutionary rate covariation.

    Science.gov (United States)

    Wolfe, Nicholas W; Clark, Nathan L

    2015-12-01

    The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. Analyses and data at http://csb.pitt.edu/erc_analysis/ nclark@pitt.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Inference on network statistics by restricting to the network space: applications to sexual history data.

    Science.gov (United States)

    Goyal, Ravi; De Gruttola, Victor

    2018-01-30

    Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. Approximation Methods for Inference and Learning in Belief Networks: Progress and Future Directions

    National Research Council Canada - National Science Library

    Pazzan, Michael

    1997-01-01

    .... In this research project, we have investigated methods and implemented algorithms for efficiently making certain classes of inference in belief networks, and for automatically learning certain...

  5. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    Science.gov (United States)

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of

  6. CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.

    Science.gov (United States)

    McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T

    2014-06-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.

  7. LASSIM-A network inference toolbox for genome-wide mechanistic modeling.

    Directory of Open Access Journals (Sweden)

    Rasmus Magnusson

    2017-06-01

    Full Text Available Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM, which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE for gene regulatory networks (GRNs. LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models

  8. Inference and interrogation of a coregulatory network in the context of lipid accumulation in Yarrowia lipolytica.

    Science.gov (United States)

    Trébulle, Pauline; Nicaud, Jean-Marc; Leplat, Christophe; Elati, Mohamed

    2017-01-01

    Complex phenotypes, such as lipid accumulation, result from cooperativity between regulators and the integration of multiscale information. However, the elucidation of such regulatory programs by experimental approaches may be challenging, particularly in context-specific conditions. In particular, we know very little about the regulators of lipid accumulation in the oleaginous yeast of industrial interest Yarrowia lipolytica . This lack of knowledge limits the development of this yeast as an industrial platform, due to the time-consuming and costly laboratory efforts required to design strains with the desired phenotypes. In this study, we aimed to identify context-specific regulators and mechanisms, to guide explorations of the regulation of lipid accumulation in Y. lipolytica . Using gene regulatory network inference, and considering the expression of 6539 genes over 26 time points from GSE35447 for biolipid production and a list of 151 transcription factors, we reconstructed a gene regulatory network comprising 111 transcription factors, 4451 target genes and 17048 regulatory interactions (YL-GRN-1) supported by evidence of protein-protein interactions. This study, based on network interrogation and wet laboratory validation (a) highlights the relevance of our proposed measure, the transcription factors influence, for identifying phases corresponding to changes in physiological state without prior knowledge (b) suggests new potential regulators and drivers of lipid accumulation and (c) experimentally validates the impact of six of the nine regulators identified on lipid accumulation, with variations in lipid content from +43.2% to -31.2% on glucose or glycerol.

  9. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    2010-01-01

    Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....

  10. cDREM: inferring dynamic combinatorial gene regulation.

    Science.gov (United States)

    Wise, Aaron; Bar-Joseph, Ziv

    2015-04-01

    Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

  11. Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks.

    Science.gov (United States)

    Haibe-Kains, Benjamin; Olsen, Catharina; Djebbari, Amira; Bontempi, Gianluca; Correll, Mick; Bouton, Christopher; Quackenbush, John

    2012-01-01

    Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.

  12. Inference

    DEFF Research Database (Denmark)

    Møller, Jesper

    (This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...

  13. Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

    Directory of Open Access Journals (Sweden)

    Mingxin Gan

    2014-01-01

    Full Text Available Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

  14. Inference of neuronal network spike dynamics and topology from calcium imaging data

    Directory of Open Access Journals (Sweden)

    Henry eLütcke

    2013-12-01

    Full Text Available Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP occurrence ('spike trains' from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.

  15. Recurrent neural network-based modeling of gene regulatory network using elephant swarm water search algorithm.

    Science.gov (United States)

    Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar

    2017-08-01

    Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.

  16. Network-assisted crop systems genetics: network inference and integrative analysis.

    Science.gov (United States)

    Lee, Tak; Kim, Hyojin; Lee, Insuk

    2015-04-01

    Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.

    Science.gov (United States)

    Fan, Jianqing; Tong, Xin; Zeng, Yao

    When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.

  18. Relaxation rates of gene expression kinetics reveal the feedback signs of autoregulatory gene networks

    Science.gov (United States)

    Jia, Chen; Qian, Hong; Chen, Min; Zhang, Michael Q.

    2018-03-01

    The transient response to a stimulus and subsequent recovery to a steady state are the fundamental characteristics of a living organism. Here we study the relaxation kinetics of autoregulatory gene networks based on the chemical master equation model of single-cell stochastic gene expression with nonlinear feedback regulation. We report a novel relation between the rate of relaxation, characterized by the spectral gap of the Markov model, and the feedback sign of the underlying gene circuit. When a network has no feedback, the relaxation rate is exactly the decaying rate of the protein. We further show that positive feedback always slows down the relaxation kinetics while negative feedback always speeds it up. Numerical simulations demonstrate that this relation provides a possible method to infer the feedback topology of autoregulatory gene networks by using time-series data of gene expression.

  19. Inference method using bayesian network for diagnosis of pulmonary nodules

    International Nuclear Information System (INIS)

    Kawagishi, Masami; Iizuka, Yoshio; Yamamoto, Hiroyuki; Yakami, Masahiro; Kubo, Takeshi; Fujimoto, Koji; Togashi, Kaori

    2010-01-01

    This report describes the improvements of a naive Bayes model that infers the diagnosis of pulmonary nodules in chest CT images based on the findings obtained when a radiologist interprets the CT images. We have previously introduced an inference model using a naive Bayes classifier and have reported its clinical value based on evaluation using clinical data. In the present report, we introduce the following improvements to the original inference model: the selection of findings based on correlations and the generation of a model using only these findings, and the introduction of classifiers that integrate several simple classifiers each of which is specialized for specific diagnosis. These improvements were found to increase the inference accuracy by 10.4% (p<.01) as compared to the original model in 100 cases (222 nodules) based on leave-one-out evaluation. (author)

  20. Efficient design and inference in distributed Bayesian networks: an overview

    NARCIS (Netherlands)

    de Oude, P.; Groen, F.C.A.; Pavlin, G.; Bezhanishvili, N.; Löbner, S.; Schwabe, K.; Spada, L.

    2011-01-01

    This paper discusses an approach to distributed Bayesian modeling and inference, which is relevant for an important class of contemporary real world situation assessment applications. By explicitly considering the locality of causal relations, the presented approach (i) supports coherent distributed

  1. Inferring Trust Relationships in Web-Based Social Networks

    National Research Council Canada - National Science Library

    Golbeck, Jennifer; Hendler, James

    2006-01-01

    The growth of web-based social networking and the properties of those networks have created great potential for producing intelligent software that integrates a user's social network and preferences...

  2. Mocapy++ - a toolkit for inference and learning in dynamic Bayesian networks

    DEFF Research Database (Denmark)

    Paluszewski, Martin; Hamelryck, Thomas Wim

    2010-01-01

    Background Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations...

  3. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem

  4. Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

    Science.gov (United States)

    Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

    2011-12-01

    An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away") and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.

  5. Inferring causal molecular networks: empirical assessment through a community-based effort.

    Science.gov (United States)

    Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K; Carlin, Daniel E; Zhang, Yang; Sokolov, Artem; Paull, Evan O; Wong, Chris K; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V; Favorov, Alexander V; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W; Long, Byron L; Noren, David P; Bisberg, Alexander J; Mills, Gordon B; Gray, Joe W; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A; Fertig, Elana J; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M; Spellman, Paul T; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach

    2016-04-01

    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

  6. Learning gene regulatory networks from only positive and unlabeled data

    Directory of Open Access Journals (Sweden)

    Elkan Charles

    2010-05-01

    Full Text Available Abstract Background Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. Results A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. Conclusions Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.

  7. Efficient inference of overlapping communities in complex networks

    DEFF Research Database (Denmark)

    Fruergaard, Bjarne Ørum; Herlau, Tue

    2014-01-01

    We discuss two views on extending existing methods for complex network modeling which we dub the communities first and the networks first view, respectively. Inspired by the networks first view that we attribute to White, Boorman, and Breiger (1976)[1], we formulate the multiple-networks stochastic...

  8. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback.

    Science.gov (United States)

    Orhan, A Emin; Ma, Wei Ji

    2017-07-26

    Animals perform near-optimal probabilistic inference in a wide range of psychophysical tasks. Probabilistic inference requires trial-to-trial representation of the uncertainties associated with task variables and subsequent use of this representation. Previous work has implemented such computations using neural networks with hand-crafted and task-dependent operations. We show that generic neural networks trained with a simple error-based learning rule perform near-optimal probabilistic inference in nine common psychophysical tasks. In a probabilistic categorization task, error-based learning in a generic network simultaneously explains a monkey's learning curve and the evolution of qualitative aspects of its choice behavior. In all tasks, the number of neurons required for a given level of performance grows sublinearly with the input population size, a substantial improvement on previous implementations of probabilistic inference. The trained networks develop a novel sparsity-based probabilistic population code. Our results suggest that probabilistic inference emerges naturally in generic neural networks trained with error-based learning rules.Behavioural tasks often require probability distributions to be inferred about task specific variables. Here, the authors demonstrate that generic neural networks can be trained using a simple error-based learning rule to perform such probabilistic computations efficiently without any need for task specific operations.

  9. Software defined network inference with evolutionary optimal observation matrices

    OpenAIRE

    Malboubi, M; Gong, Y; Yang, Z; Wang, X; Chuah, CN; Sharma, P

    2017-01-01

    © 2017 Elsevier B.V. A key requirement for network management is the accurate and reliable monitoring of relevant network characteristics. In today's large-scale networks, this is a challenging task due to the scarcity of network measurement resources and the hard constraints that this imposes. This paper proposes a new framework, called SNIPER, which leverages the flexibility provided by Software-Defined Networking (SDN) to design the optimal observation or measurement matrix that can lead t...

  10. Inferring metabolic states in uncharacterized environments using gene-expression measurements.

    Directory of Open Access Journals (Sweden)

    Sergio Rossell

    Full Text Available The large size of metabolic networks entails an overwhelming multiplicity in the possible steady-state flux distributions that are compatible with stoichiometric constraints. This space of possibilities is largest in the frequent situation where the nutrients available to the cells are unknown. These two factors: network size and lack of knowledge of nutrient availability, challenge the identification of the actual metabolic state of living cells among the myriad possibilities. Here we address this challenge by developing a method that integrates gene-expression measurements with genome-scale models of metabolism as a means of inferring metabolic states. Our method explores the space of alternative flux distributions that maximize the agreement between gene expression and metabolic fluxes, and thereby identifies reactions that are likely to be active in the culture from which the gene-expression measurements were taken. These active reactions are used to build environment-specific metabolic models and to predict actual metabolic states. We applied our method to model the metabolic states of Saccharomyces cerevisiae growing in rich media supplemented with either glucose or ethanol as the main energy source. The resulting models comprise about 50% of the reactions in the original model, and predict environment-specific essential genes with high sensitivity. By minimizing the sum of fluxes while forcing our predicted active reactions to carry flux, we predicted the metabolic states of these yeast cultures that are in large agreement with what is known about yeast physiology. Most notably, our method predicts the Crabtree effect in yeast cells growing in excess glucose, a long-known phenomenon that could not have been predicted by traditional constraint-based modeling approaches. Our method is of immediate practical relevance for medical and industrial applications, such as the identification of novel drug targets, and the development of

  11. Inference of gene-phenotype associations via protein-protein interaction and orthology.

    Directory of Open Access Journals (Sweden)

    Panwen Wang

    Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.

  12. Inferring gene expression dynamics via functional regression analysis

    Directory of Open Access Journals (Sweden)

    Leng Xiaoyan

    2008-01-01

    Full Text Available Abstract Background Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene expression associated with different developmental stages to each other to study patterns of long-term developmental gene regulation. We use tools from functional data analysis to study dynamic changes by relating temporal gene expression profiles of different developmental stages to each other. Results We demonstrate that functional regression methodology can pinpoint relationships that exist between temporary gene expression profiles for different life cycle phases and incorporates dimension reduction as needed for these high-dimensional data. By applying these tools, gene expression profiles for pupa and adult phases are found to be strongly related to the profiles of the same genes obtained during the embryo phase. Moreover, one can distinguish between gene groups that exhibit relationships with positive and others with negative associations between later life and embryonal expression profiles. Specifically, we find a positive relationship in expression for muscle development related genes, and a negative relationship for strictly maternal genes for Drosophila, using temporal gene expression profiles. Conclusion Our findings point to specific reactivation patterns of gene expression during the Drosophila life cycle which differ in characteristic ways between various gene groups. Functional regression emerges as a useful tool for relating gene expression patterns from different developmental stages, and avoids the problems with large numbers of parameters and multiple testing that affect alternative approaches.

  13. Diurnal Transcriptome and Gene Network Represented through Sparse Modeling in Brachypodium distachyon

    Directory of Open Access Journals (Sweden)

    Satoru Koda

    2017-11-01

    Full Text Available We report the comprehensive identification of periodic genes and their network inference, based on a gene co-expression analysis and an Auto-Regressive eXogenous (ARX model with a group smoothly clipped absolute deviation (SCAD method using a time-series transcriptome dataset in a model grass, Brachypodium distachyon. To reveal the diurnal changes in the transcriptome in B. distachyon, we performed RNA-seq analysis of its leaves sampled through a diurnal cycle of over 48 h at 4 h intervals using three biological replications, and identified 3,621 periodic genes through our wavelet analysis. The expression data are feasible to infer network sparsity based on ARX models. We found that genes involved in biological processes such as transcriptional regulation, protein degradation, and post-transcriptional modification and photosynthesis are significantly enriched in the periodic genes, suggesting that these processes might be regulated by circadian rhythm in B. distachyon. On the basis of the time-series expression patterns of the periodic genes, we constructed a chronological gene co-expression network and identified putative transcription factors encoding genes that might be involved in the time-specific regulatory transcriptional network. Moreover, we inferred a transcriptional network composed of the periodic genes in B. distachyon, aiming to identify genes associated with other genes through variable selection by grouping time points for each gene. Based on the ARX model with the group SCAD regularization using our time-series expression datasets of the periodic genes, we constructed gene networks and found that the networks represent typical scale-free structure. Our findings demonstrate that the diurnal changes in the transcriptome in B. distachyon leaves have a sparse network structure, demonstrating the spatiotemporal gene regulatory network over the cyclic phase transitions in B. distachyon diurnal growth.

  14. Current approaches to gene regulatory network modelling

    Directory of Open Access Journals (Sweden)

    Brazma Alvis

    2007-09-01

    Full Text Available Abstract Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these categories. We will study the topology of gene regulatory networks in yeast in more detail, comparing a direct network derived from transcription factor binding data and an indirect network derived from genome-wide expression data in mutants. Regarding the network dynamics we briefly describe discrete and continuous approaches to network modelling, then describe a hybrid model called Finite State Linear Model and demonstrate that some simple network dynamics can be simulated in this model.

  15. Inferring Structure and Forecasting Dynamics on Evolving Networks

    Science.gov (United States)

    2016-01-05

    Graphs ........................................................................................................................ 23 7. Sacred Values...5) Team Formation; (6) Games of Graphs; (7) Sacred Values and Legitimacy in Network Interactions; (8) Network processes in Geo-Social Context. 1...Authority, Cooperation and Competition in Religious Networks Key Papers: McBride 2015a [72] and McBride 2015b [73] McBride (2015a) examines

  16. Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

    Science.gov (United States)

    Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

    2015-01-01

    Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.

  17. Ontology Mapping Neural Network: An Approach to Learning and Inferring Correspondences among Ontologies

    Science.gov (United States)

    Peng, Yefei

    2010-01-01

    An ontology mapping neural network (OMNN) is proposed in order to learn and infer correspondences among ontologies. It extends the Identical Elements Neural Network (IENN)'s ability to represent and map complex relationships. The learning dynamics of simultaneous (interlaced) training of similar tasks interact at the shared connections of the…

  18. Learning Gene Regulatory Networks Computationally from Gene Expression Data Using Weighted Consensus

    KAUST Repository

    Fujii, Chisato

    2015-04-16

    Gene regulatory networks analyze the relationships between genes allowing us to un- derstand the gene regulatory interactions in systems biology. Gene expression data from the microarray experiments is used to obtain the gene regulatory networks. How- ever, the microarray data is discrete, noisy and non-linear which makes learning the networks a challenging problem and existing gene network inference methods do not give consistent results. Current state-of-the-art study uses the average-ranking-based consensus method to combine and average the ranked predictions from individual methods. However each individual method has an equal contribution to the consen- sus prediction. We have developed a linear programming-based consensus approach which uses learned weights from linear programming among individual methods such that the methods have di↵erent weights depending on their performance. Our result reveals that assigning di↵erent weights to individual methods rather than giving them equal weights improves the performance of the consensus. The linear programming- based consensus method is evaluated and it had the best performance on in silico and Saccharomyces cerevisiae networks, and the second best on the Escherichia coli network outperformed by Inferelator Pipeline method which gives inconsistent results across a wide range of microarray data sets.

  19. Feature network models for proximity data : statistical inference, model selection, network representations and links with related models

    NARCIS (Netherlands)

    Frank, Laurence Emmanuelle

    2006-01-01

    Feature Network Models (FNM) are graphical structures that represent proximity data in a discrete space with the use of features. A statistical inference theory is introduced, based on the additivity properties of networks and the linear regression framework. Considering features as predictor

  20. The Reconstruction and Analysis of Gene Regulatory Networks.

    Science.gov (United States)

    Zheng, Guangyong; Huang, Tao

    2018-01-01

    In post-genomic era, an important task is to explore the function of individual biological molecules (i.e., gene, noncoding RNA, protein, metabolite) and their organization in living cells. For this end, gene regulatory networks (GRNs) are constructed to show relationship between biological molecules, in which the vertices of network denote biological molecules and the edges of network present connection between nodes (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). Biologists can understand not only the function of biological molecules but also the organization of components of living cells through interpreting the GRNs, since a gene regulatory network is a comprehensively physiological map of living cells and reflects influence of genetic and epigenetic factors (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). In this paper, we will review the inference methods of GRN reconstruction and analysis approaches of network structure. As a powerful tool for studying complex diseases and biological processes, the applications of the network method in pathway analysis and disease gene identification will be introduced.

  1. MINER: exploratory analysis of gene interaction networks by machine learning from expression data

    Directory of Open Access Journals (Sweden)

    Sivieng Jane

    2009-12-01

    Full Text Available Abstract Background The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. Results We have developed MINER (Microarray Interactive Network Exploration and Representation, an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Conclusion Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.

  2. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  3. Functional inference of complex anatomical tendinous networks at a macroscopic scale via sparse experimentation.

    Science.gov (United States)

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16(th) century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines.

  4. Scaffold filling, contig fusion and comparative gene order inference

    Directory of Open Access Journals (Sweden)

    Rounsley Steve

    2010-06-01

    Full Text Available Abstract Background There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? Results Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. Conclusions The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

  5. Scaffold filling, contig fusion and comparative gene order inference.

    Science.gov (United States)

    Muñoz, Adriana; Zheng, Chunfang; Zhu, Qian; Albert, Victor A; Rounsley, Steve; Sankoff, David

    2010-06-04

    There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

  6. Inferring topologies via driving-based generalized synchronization of two-layer networks

    Science.gov (United States)

    Wang, Yingfei; Wu, Xiaoqun; Feng, Hui; Lu, Jun-an; Xu, Yuhua

    2016-05-01

    The interaction topology among the constituents of a complex network plays a crucial role in the network’s evolutionary mechanisms and functional behaviors. However, some network topologies are usually unknown or uncertain. Meanwhile, coupling delays are ubiquitous in various man-made and natural networks. Hence, it is necessary to gain knowledge of the whole or partial topology of a complex dynamical network by taking into consideration communication delay. In this paper, topology identification of complex dynamical networks is investigated via generalized synchronization of a two-layer network. Particularly, based on the LaSalle-type invariance principle of stochastic differential delay equations, an adaptive control technique is proposed by constructing an auxiliary layer and designing proper control input and updating laws so that the unknown topology can be recovered upon successful generalized synchronization. Numerical simulations are provided to illustrate the effectiveness of the proposed method. The technique provides a certain theoretical basis for topology inference of complex networks. In particular, when the considered network is composed of systems with high-dimension or complicated dynamics, a simpler response layer can be constructed, which is conducive to circuit design. Moreover, it is practical to take into consideration perturbations caused by control input. Finally, the method is applicable to infer topology of a subnetwork embedded within a complex system and locate hidden sources. We hope the results can provide basic insight into further research endeavors on understanding practical and economical topology inference of networks.

  7. Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and

  8. MATRIX-VECTOR ALGORITHMS OF LOCAL POSTERIORI INFERENCE IN ALGEBRAIC BAYESIAN NETWORKS ON QUANTA PROPOSITIONS

    Directory of Open Access Journals (Sweden)

    A. A. Zolotin

    2015-07-01

    Full Text Available Posteriori inference is one of the three kinds of probabilistic-logic inferences in the probabilistic graphical models theory and the base for processing of knowledge patterns with probabilistic uncertainty using Bayesian networks. The paper deals with a task of local posteriori inference description in algebraic Bayesian networks that represent a class of probabilistic graphical models by means of matrix-vector equations. The latter are essentially based on the use of tensor product of matrices, Kronecker degree and Hadamard product. Matrix equations for calculating posteriori probabilities vectors within posteriori inference in knowledge patterns with quanta propositions are obtained. Similar equations of the same type have already been discussed within the confines of the theory of algebraic Bayesian networks, but they were built only for the case of posteriori inference in the knowledge patterns on the ideals of conjuncts. During synthesis and development of matrix-vector equations on quanta propositions probability vectors, a number of earlier results concerning normalizing factors in posteriori inference and assignment of linear projective operator with a selector vector was adapted. We consider all three types of incoming evidences - deterministic, stochastic and inaccurate - combined with scalar and interval estimation of probability truth of propositional formulas in the knowledge patterns. Linear programming problems are formed. Their solution gives the desired interval values of posterior probabilities in the case of inaccurate evidence or interval estimates in a knowledge pattern. That sort of description of a posteriori inference gives the possibility to extend the set of knowledge pattern types that we can use in the local and global posteriori inference, as well as simplify complex software implementation by use of existing third-party libraries, effectively supporting submission and processing of matrices and vectors when

  9. Infering and Calibrating Triadic Closure in a Dynamic Network

    Science.gov (United States)

    Mantzaris, Alexander V.; Higham, Desmond J.

    In the social sciences, the hypothesis of triadic closure contends that new links in a social contact network arise preferentially between those who currently share neighbours. Here, in a proof-of-principle study, we show how to calibrate a recently proposed evolving network model to time-dependent connectivity data. The probabilistic edge birth rate in the model contains a triadic closure term, so we are also able to assess statistically the evidence for this effect. The approach is shown to work on data generated synthetically from the model. We then apply this methodology to some real, large-scale data that records the build up of connections in a business-related social networking site, and find evidence for triadic closure.

  10. Inferring kangaroo phylogeny from incongruent nuclear and mitochondrial genes.

    Directory of Open Access Journals (Sweden)

    Matthew J Phillips

    Full Text Available The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus and M. (Osphranter, as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus. A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby within M. (Osphranter rather than as expected, with M. (Notamacropus. Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.

  11. The causal inference of cortical neural networks during music improvisations.

    Science.gov (United States)

    Wan, Xiaogeng; Crüts, Björn; Jensen, Henrik Jeldtoft

    2014-01-01

    We present an EEG study of two music improvisation experiments. Professional musicians with high level of improvisation skills were asked to perform music either according to notes (composed music) or in improvisation. Each piece of music was performed in two different modes: strict mode and "let-go" mode. Synchronized EEG data was measured from both musicians and listeners. We used one of the most reliable causality measures: conditional Mutual Information from Mixed Embedding (MIME), to analyze directed correlations between different EEG channels, which was combined with network theory to construct both intra-brain and cross-brain networks. Differences were identified in intra-brain neural networks between composed music and improvisation and between strict mode and "let-go" mode. Particular brain regions such as frontal, parietal and temporal regions were found to play a key role in differentiating the brain activities between different playing conditions. By comparing the level of degree centralities in intra-brain neural networks, we found a difference between the response of musicians and the listeners when comparing the different playing conditions.

  12. The Causal Inference of Cortical Neural Networks during Music Improvisations

    Science.gov (United States)

    Wan, Xiaogeng; Crüts, Björn; Jensen, Henrik Jeldtoft

    2014-01-01

    We present an EEG study of two music improvisation experiments. Professional musicians with high level of improvisation skills were asked to perform music either according to notes (composed music) or in improvisation. Each piece of music was performed in two different modes: strict mode and “let-go” mode. Synchronized EEG data was measured from both musicians and listeners. We used one of the most reliable causality measures: conditional Mutual Information from Mixed Embedding (MIME), to analyze directed correlations between different EEG channels, which was combined with network theory to construct both intra-brain and cross-brain networks. Differences were identified in intra-brain neural networks between composed music and improvisation and between strict mode and “let-go” mode. Particular brain regions such as frontal, parietal and temporal regions were found to play a key role in differentiating the brain activities between different playing conditions. By comparing the level of degree centralities in intra-brain neural networks, we found a difference between the response of musicians and the listeners when comparing the different playing conditions. PMID:25489852

  13. Network Inference and Maximum Entropy Estimation on Information Diagrams

    Czech Academy of Sciences Publication Activity Database

    Martin, E.A.; Hlinka, J.; Meinke, A.; Děchtěrenko, Filip; Tintěra, J.; Oliver, I.; Davidsen, J.

    2017-01-01

    Roč. 7, č. 1 (2017), s. 1-15, č. článku 7062. ISSN 2045-2322 R&D Projects: GA ČR GA13-23940S Institutional support: RVO:68081740 Keywords : complex networks * mutual information * entropy maximization * fMRI Subject RIV: AN - Psychology OBOR OECD: Cognitive sciences Impact factor: 4.259, year: 2016

  14. Unifying Inference of Meso-Scale Structures in Networks.

    Science.gov (United States)

    Tunç, Birkan; Verma, Ragini

    2015-01-01

    Networks are among the most prevalent formal representations in scientific studies, employed to depict interactions between objects such as molecules, neuronal clusters, or social groups. Studies performed at meso-scale that involve grouping of objects based on their distinctive interaction patterns form one of the main lines of investigation in network science. In a social network, for instance, meso-scale structures can correspond to isolated social groupings or groups of individuals that serve as a communication core. Currently, the research on different meso-scale structures such as community and core-periphery structures has been conducted via independent approaches, which precludes the possibility of an algorithmic design that can handle multiple meso-scale structures and deciding which structure explains the observed data better. In this study, we propose a unified formulation for the algorithmic detection and analysis of different meso-scale structures. This facilitates the investigation of hybrid structures that capture the interplay between multiple meso-scale structures and statistical comparison of competing structures, all of which have been hitherto unavailable. We demonstrate the applicability of the methodology in analyzing the human brain network, by determining the dominant organizational structure (communities) of the brain, as well as its auxiliary characteristics (core-periphery).

  15. Unifying Inference of Meso-Scale Structures in Networks.

    Directory of Open Access Journals (Sweden)

    Birkan Tunç

    Full Text Available Networks are among the most prevalent formal representations in scientific studies, employed to depict interactions between objects such as molecules, neuronal clusters, or social groups. Studies performed at meso-scale that involve grouping of objects based on their distinctive interaction patterns form one of the main lines of investigation in network science. In a social network, for instance, meso-scale structures can correspond to isolated social groupings or groups of individuals that serve as a communication core. Currently, the research on different meso-scale structures such as community and core-periphery structures has been conducted via independent approaches, which precludes the possibility of an algorithmic design that can handle multiple meso-scale structures and deciding which structure explains the observed data better. In this study, we propose a unified formulation for the algorithmic detection and analysis of different meso-scale structures. This facilitates the investigation of hybrid structures that capture the interplay between multiple meso-scale structures and statistical comparison of competing structures, all of which have been hitherto unavailable. We demonstrate the applicability of the methodology in analyzing the human brain network, by determining the dominant organizational structure (communities of the brain, as well as its auxiliary characteristics (core-periphery.

  16. The causal inference of cortical neural networks during music improvisations.

    Directory of Open Access Journals (Sweden)

    Xiaogeng Wan

    Full Text Available We present an EEG study of two music improvisation experiments. Professional musicians with high level of improvisation skills were asked to perform music either according to notes (composed music or in improvisation. Each piece of music was performed in two different modes: strict mode and "let-go" mode. Synchronized EEG data was measured from both musicians and listeners. We used one of the most reliable causality measures: conditional Mutual Information from Mixed Embedding (MIME, to analyze directed correlations between different EEG channels, which was combined with network theory to construct both intra-brain and cross-brain networks. Differences were identified in intra-brain neural networks between composed music and improvisation and between strict mode and "let-go" mode. Particular brain regions such as frontal, parietal and temporal regions were found to play a key role in differentiating the brain activities between different playing conditions. By comparing the level of degree centralities in intra-brain neural networks, we found a difference between the response of musicians and the listeners when comparing the different playing conditions.

  17. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  18. Fine-granularity inference and estimations to network traffic for SDN.

    Directory of Open Access Journals (Sweden)

    Dingde Jiang

    Full Text Available An end-to-end network traffic matrix is significantly helpful for network management and for Software Defined Networks (SDN. However, the end-to-end network traffic matrix's inferences and estimations are a challenging problem. Moreover, attaining the traffic matrix in high-speed networks for SDN is a prohibitive challenge. This paper investigates how to estimate and recover the end-to-end network traffic matrix in fine time granularity from the sampled traffic traces, which is a hard inverse problem. Different from previous methods, the fractal interpolation is used to reconstruct the finer-granularity network traffic. Then, the cubic spline interpolation method is used to obtain the smooth reconstruction values. To attain an accurate the end-to-end network traffic in fine time granularity, we perform a weighted-geometric-average process for two interpolation results that are obtained. The simulation results show that our approaches are feasible and effective.

  19. Fine-granularity inference and estimations to network traffic for SDN.

    Science.gov (United States)

    Jiang, Dingde; Huo, Liuwei; Li, Ya

    2018-01-01

    An end-to-end network traffic matrix is significantly helpful for network management and for Software Defined Networks (SDN). However, the end-to-end network traffic matrix's inferences and estimations are a challenging problem. Moreover, attaining the traffic matrix in high-speed networks for SDN is a prohibitive challenge. This paper investigates how to estimate and recover the end-to-end network traffic matrix in fine time granularity from the sampled traffic traces, which is a hard inverse problem. Different from previous methods, the fractal interpolation is used to reconstruct the finer-granularity network traffic. Then, the cubic spline interpolation method is used to obtain the smooth reconstruction values. To attain an accurate the end-to-end network traffic in fine time granularity, we perform a weighted-geometric-average process for two interpolation results that are obtained. The simulation results show that our approaches are feasible and effective.

  20. Probabilistic reasoning in intelligent systems networks of plausible inference

    CERN Document Server

    Pearl, Judea

    1988-01-01

    Probabilistic Reasoning in Intelligent Systems is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertainty--and offers techniques, based on belief networks, that provid

  1. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks.

    Directory of Open Access Journals (Sweden)

    Nolan Priedigkeit

    2015-02-01

    Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.

  2. RegnANN: Reverse Engineering Gene Networks using Artificial Neural Networks.

    Directory of Open Access Journals (Sweden)

    Marco Grimaldi

    Full Text Available RegnANN is a novel method for reverse engineering gene networks based on an ensemble of multilayer perceptrons. The algorithm builds a regressor for each gene in the network, estimating its neighborhood independently. The overall network is obtained by joining all the neighborhoods. RegnANN makes no assumptions about the nature of the relationships between the variables, potentially capturing high-order and non linear dependencies between expression patterns. The evaluation focuses on synthetic data mimicking plausible submodules of larger networks and on biological data consisting of submodules of Escherichia coli. We consider Barabasi and Erdös-Rényi topologies together with two methods for data generation. We verify the effect of factors such as network size and amount of data to the accuracy of the inference algorithm. The accuracy scores obtained with RegnANN is methodically compared with the performance of three reference algorithms: ARACNE, CLR and KELLER. Our evaluation indicates that RegnANN compares favorably with the inference methods tested. The robustness of RegnANN, its ability to discover second order correlations and the agreement between results obtained with this new methods on both synthetic and biological data are promising and they stimulate its application to a wider range of problems.

  3. Network Model-Assisted Inference from Respondent-Driven Sampling Data.

    Science.gov (United States)

    Gile, Krista J; Handcock, Mark S

    2015-06-01

    Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.

  4. Network Model-Assisted Inference from Respondent-Driven Sampling Data

    Science.gov (United States)

    Gile, Krista J.; Handcock, Mark S.

    2015-01-01

    Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328

  5. Information retrieval system with ability of analogical inference using semantic network and function of fuzzification

    Energy Technology Data Exchange (ETDEWEB)

    Nakamura, K; Iwai, S

    1982-01-01

    In information retrieval system, it is necessary to grasp user's subject of interest in order to present appropriate documents to the user. In this paper, the authors propose a model of human ability of analogical inference based on association between key words and, using it, construct an information retrieval system in which the computer with the ability learns its user's subject of interest through question-answering with the user. In this system, the association between key words is represented by a semantic network, and a function of fuzzification of input information is introduced in the semantic network to implement the ability of analogical inference based on the association. Finally, the effect of analogical inference on the learning efficiency of the system is investigated. 5 references.

  6. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

    Science.gov (United States)

    2013-01-01

    Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as

  7. The gene regulatory network for breast cancer: Integrated regulatory landscape of cancer hallmarks

    Directory of Open Access Journals (Sweden)

    Frank eEmmert-Streib

    2014-02-01

    Full Text Available In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of $351$ patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome $21$ is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.

  8. Network Inference and Maximum Entropy Estimation on Information Diagrams

    Czech Academy of Sciences Publication Activity Database

    Martin, E.A.; Hlinka, Jaroslav; Meinke, A.; Děchtěrenko, Filip; Tintěra, J.; Oliver, I.; Davidsen, J.

    2017-01-01

    Roč. 7, č. 1 (2017), č. článku 7062. ISSN 2045-2322 R&D Projects: GA ČR GA13-23940S; GA MZd(CZ) NV15-29835A Grant - others:GA MŠk(CZ) LO1611 Institutional support: RVO:67985807 Keywords : complex networks * mutual information * entropy maximization * fMRI Subject RIV: BD - Theory of Information OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) Impact factor: 4.259, year: 2016

  9. Protein Inference from the Integration of Tandem MS Data and Interactome Networks.

    Science.gov (United States)

    Zhong, Jiancheng; Wang, Jianxing; Ding, Xiaojun; Zhang, Zhen; Li, Min; Wu, Fang-Xiang; Pan, Yi

    2017-01-01

    Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

  10. Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons

    Science.gov (United States)

    Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

    2011-01-01

    An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows (“explaining away”) and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons. PMID:22219717

  11. Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

    Directory of Open Access Journals (Sweden)

    Dejan Pecevski

    2011-12-01

    Full Text Available An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away" and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.

  12. Inferring cultural regions from correlation networks of given baby names

    Science.gov (United States)

    Pomorski, Mateusz; Krawczyk, Małgorzata J.; Kułakowski, Krzysztof; Kwapień, Jarosław; Ausloos, Marcel

    2016-03-01

    We report investigations on the statistical characteristics of the baby names given between 1910 and 2010 in the United States of America. For each year, the 100 most frequent names in the USA are sorted out. For these names, the correlations between the names profiles are calculated for all pairs of states (minus Hawaii and Alaska). The correlations are used to form a weighted network which is found to vary mildly in time. In fact, the structure of communities in the network remains quite stable till about 1980. The goal is that the calculated structure approximately reproduces the usually accepted geopolitical regions: the Northeast, the South, and the "Midwest + West" as the third one. Furthermore, the dataset reveals that the name distribution satisfies the Zipf law, separately for each state and each year, i.e. the name frequency f ∝r-α, where r is the name rank. Between 1920 and 1980, the exponent α is the largest one for the set of states classified as 'the South', but the smallest one for the set of states classified as "Midwest + West". Our interpretation is that the pool of selected names was quite narrow in the Southern states. The data is compared with some related statistics of names in Belgium, a country also with different regions, but having quite a different scale than the USA. There, the Zipf exponent is low for young people and for the Brussels citizens.

  13. Cascading Failures in Networks: Inference, Intervention and Robustness to WMDs

    Science.gov (United States)

    2016-08-01

    has  had   impact  in  the  community:   4     “Clustering  sparse  graphs,”  a  paper  from  the  first  year  of  this  project,  has  already...This  project  has  had  a  huge  positive   impact  on  the  development  of  talent  in  the  mathematics  of   networks.  Specifically...connectivity  of   cellphone  networks  under  attack  etc.                                               7   What  do  you

  14. Inferring Network Controls from Topology Using the Chomp Database

    Science.gov (United States)

    2015-12-03

    the-art methods in gene expres- sion analysis. We call this new method SW1PerS, which stands for Sliding Windows and 1-dimensional Persistence Scoring...protein compressibility. Jpn. J. Ind. Appl. Math . 32, 1 (2015), 1–17. [5] Goullet, A., Harker, S., Kalies, W., Kasti, D., and Mischaikow, K. Ef- ficient...computing homology of complexes and maps. Found. Comput. Math . 14, 1 (2014), 151–184. [9] Kalies, W. D., Mischaikow, K., and Vandervorst, R. C. Lattice

  15. Genes2FANs: connecting genes through functional association networks

    Science.gov (United States)

    2012-01-01

    Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in

  16. A bayesian inference-based detection mechanism to defend medical smartphone networks against insider attacks

    DEFF Research Database (Denmark)

    Meng, Weizhi; Li, Wenjuan; Xiang, Yang

    2017-01-01

    and experience for both patients and healthcare workers, and the underlying network architecture to support such devices is also referred to as medical smartphone networks (MSNs). MSNs, similar to other networks, are subject to a wide range of attacks (e.g. leakage of sensitive patient information by a malicious...... insider). In this work, we focus on MSNs and present a compact but efficient trust-based approach using Bayesian inference to identify malicious nodes in such an environment. We then demonstrate the effectiveness of our approach in detecting malicious nodes by evaluating the deployment of our proposed...

  17. Evolutionary history of the third chromosome gene arrangements of Drosophila pseudoobscura inferred from inversion breakpoints.

    Science.gov (United States)

    Wallace, Andre G; Detweiler, Don; Schaeffer, Stephen W

    2011-08-01

    The third chromosome of Drosophila pseudoobscura is polymorphic for numerous gene arrangements that form classical clines in North America. The polytene salivary chromosomes isolated from natural populations revealed changes in gene order that allowed the different gene arrangements to be linked together by paracentric inversions representing one of the first cases where genetic data were used to construct a phylogeny. Although the inversion phylogeny can be used to determine the relationships among the gene arrangements, the cytogenetic data are unable to infer the ancestral arrangement or the age of the different chromosome types. These are both important properties if one is to infer the evolutionary forces responsible for the spread and maintenance of the chromosomes. Here, we employ the nucleotide sequences of 18 regions distributed across the third chromosome in 80-100 D. pseudoobscura strains to test whether five gene arrangements are of unique or multiple origin, what the ancestral arrangement was, and what are the ages of the different arrangements. Each strain carried one of six commonly found gene arrangements and the sequences were used to infer their evolutionary relationships. Breakpoint regions in the center of the chromosome supported monophyly of the gene arrangements, whereas regions at the ends of the chromosome gave phylogenies that provided less support for monophyly of the chromosomes either because the individual markers did not have enough phylogenetically informative sites or genetic exchange scrambled information among the gene arrangements. A data set where the genetic markers were concatenated strongly supported a unique origin of the different gene arrangements. The inversion polymorphism of D. pseudoobscura is estimated to be about a million years old. We have also shown that the generated phylogeny is consistent with the cytological phylogeny of this species. In addition, the data presented here support hypothetical as the ancestral

  18. Sparsity in Model Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Zagorski, M.

    2011-01-01

    We propose a gene regulatory network model which incorporates the microscopic interactions between genes and transcription factors. In particular the gene's expression level is determined by deterministic synchronous dynamics with contribution from excitatory interactions. We study the structure of networks that have a particular '' function '' and are subject to the natural selection pressure. The question of network robustness against point mutations is addressed, and we conclude that only a small part of connections defined as '' essential '' for cell's existence is fragile. Additionally, the obtained networks are sparse with narrow in-degree and broad out-degree, properties well known from experimental study of biological regulatory networks. Furthermore, during sampling procedure we observe that significantly different genotypes can emerge under mutation-selection balance. All the preceding features hold for the model parameters which lay in the experimentally relevant range. (author)

  19. Resistance Genes in Global Crop Breeding Networks.

    Science.gov (United States)

    Garrett, K A; Andersen, K F; Asche, F; Bowden, R L; Forbes, G A; Kulakow, P A; Zhou, B

    2017-10-01

    Resistance genes are a major tool for managing crop diseases. The networks of crop breeders who exchange resistance genes and deploy them in varieties help to determine the global landscape of resistance and epidemics, an important system for maintaining food security. These networks function as a complex adaptive system, with associated strengths and vulnerabilities, and implications for policies to support resistance gene deployment strategies. Extensions of epidemic network analysis can be used to evaluate the multilayer agricultural networks that support and influence crop breeding networks. Here, we evaluate the general structure of crop breeding networks for cassava, potato, rice, and wheat. All four are clustered due to phytosanitary and intellectual property regulations, and linked through CGIAR hubs. Cassava networks primarily include public breeding groups, whereas others are more mixed. These systems must adapt to global change in climate and land use, the emergence of new diseases, and disruptive breeding technologies. Research priorities to support policy include how best to maintain both diversity and redundancy in the roles played by individual crop breeding groups (public versus private and global versus local), and how best to manage connectivity to optimize resistance gene deployment while avoiding risks to the useful life of resistance genes. [Formula: see text] Copyright © 2017 The Author(s). This is an open access article distributed under the CC BY 4.0 International license .

  20. Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.

    Science.gov (United States)

    Johnston, Iain G; Williams, Ben P

    2016-02-24

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Crowdsourcing the nodulation gene network discovery environment.

    Science.gov (United States)

    Li, Yupeng; Jackson, Scott A

    2016-05-26

    The Legumes (Fabaceae) are an economically and ecologically important group of plant species with the conspicuous capacity for symbiotic nitrogen fixation in root nodules, specialized plant organs containing symbiotic microbes. With the aim of understanding the underlying molecular mechanisms leading to nodulation, many efforts are underway to identify nodulation-related genes and determine how these genes interact with each other. In order to accurately and efficiently reconstruct nodulation gene network, a crowdsourcing platform, CrowdNodNet, was created. The platform implements the jQuery and vis.js JavaScript libraries, so that users are able to interactively visualize and edit the gene network, and easily access the information about the network, e.g. gene lists, gene interactions and gene functional annotations. In addition, all the gene information is written on MediaWiki pages, enabling users to edit and contribute to the network curation. Utilizing the continuously updated, collaboratively written, and community-reviewed Wikipedia model, the platform could, in a short time, become a comprehensive knowledge base of nodulation-related pathways. The platform could also be used for other biological processes, and thus has great potential for integrating and advancing our understanding of the functional genomics and systems biology of any process for any species. The platform is available at http://crowd.bioops.info/ , and the source code can be openly accessed at https://github.com/bioops/crowdnodnet under MIT License.

  2. Combinatorial explosion in model gene networks

    Science.gov (United States)

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such

  3. Learning gene networks under SNP perturbations using eQTL datasets.

    Directory of Open Access Journals (Sweden)

    Lingxue Zhang

    2014-02-01

    Full Text Available The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network

  4. Deconstructing the pluripotency gene regulatory network

    KAUST Repository

    Li, Mo

    2018-04-04

    Pluripotent stem cells can be isolated from embryos or derived by reprogramming. Pluripotency is stabilized by an interconnected network of pluripotency genes that cooperatively regulate gene expression. Here we describe the molecular principles of pluripotency gene function and highlight post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as an important regulatory layer of pluripotency. We also discuss heterogeneity in pluripotency regulation, alternative pluripotency states and future directions of pluripotent stem cell research.

  5. Deconstructing the pluripotency gene regulatory network

    KAUST Repository

    Li, Mo; Belmonte, Juan Carlos Izpisua

    2018-01-01

    Pluripotent stem cells can be isolated from embryos or derived by reprogramming. Pluripotency is stabilized by an interconnected network of pluripotency genes that cooperatively regulate gene expression. Here we describe the molecular principles of pluripotency gene function and highlight post-transcriptional controls, particularly those induced by RNA-binding proteins and alternative splicing, as an important regulatory layer of pluripotency. We also discuss heterogeneity in pluripotency regulation, alternative pluripotency states and future directions of pluripotent stem cell research.

  6. Statistical inference approach to structural reconstruction of complex networks from binary time series

    Science.gov (United States)

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  7. An empirical Bayesian approach for model-based inference of cellular signaling networks

    Directory of Open Access Journals (Sweden)

    Klinke David J

    2009-11-01

    Full Text Available Abstract Background A common challenge in systems biology is to infer mechanistic descriptions of biological process given limited observations of a biological system. Mathematical models are frequently used to represent a belief about the causal relationships among proteins within a signaling network. Bayesian methods provide an attractive framework for inferring the validity of those beliefs in the context of the available data. However, efficient sampling of high-dimensional parameter space and appropriate convergence criteria provide barriers for implementing an empirical Bayesian approach. The objective of this study was to apply an Adaptive Markov chain Monte Carlo technique to a typical study of cellular signaling pathways. Results As an illustrative example, a kinetic model for the early signaling events associated with the epidermal growth factor (EGF signaling network was calibrated against dynamic measurements observed in primary rat hepatocytes. A convergence criterion, based upon the Gelman-Rubin potential scale reduction factor, was applied to the model predictions. The posterior distributions of the parameters exhibited complicated structure, including significant covariance between specific parameters and a broad range of variance among the parameters. The model predictions, in contrast, were narrowly distributed and were used to identify areas of agreement among a collection of experimental studies. Conclusion In summary, an empirical Bayesian approach was developed for inferring the confidence that one can place in a particular model that describes signal transduction mechanisms and for inferring inconsistencies in experimental measurements.

  8. Model-free inference of direct network interactions from nonlinear collective dynamics.

    Science.gov (United States)

    Casadiego, Jose; Nitzan, Mor; Hallerberg, Sarah; Timme, Marc

    2017-12-19

    The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.

  9. Characterization of differentially expressed genes using high-dimensional co-expression networks

    DEFF Research Database (Denmark)

    Coelho Goncalves de Abreu, Gabriel; Labouriau, Rodrigo S.

    2010-01-01

    We present a technique to characterize differentially expressed genes in terms of their position in a high-dimensional co-expression network. The set-up of Gaussian graphical models is used to construct representations of the co-expression network in such a way that redundancy and the propagation...... that allow to make effective inference in problems with high degree of complexity (e.g. several thousands of genes) and small number of observations (e.g. 10-100) as typically occurs in high throughput gene expression studies. Taking advantage of the internal structure of decomposable graphical models, we...... construct a compact representation of the co-expression network that allows to identify the regions with high concentration of differentially expressed genes. It is argued that differentially expressed genes located in highly interconnected regions of the co-expression network are less informative than...

  10. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  11. Evolutionary inference across eukaryotes identifies specific pressures favoring mitochondrial gene retention

    OpenAIRE

    Williams, Ben; Johnston, Iain

    2016-01-01

    Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modelling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondri...

  12. Identifying noncoding risk variants using disease-relevant gene regulatory networks.

    Science.gov (United States)

    Gao, Long; Uzun, Yasin; Gao, Peng; He, Bing; Ma, Xiaoke; Wang, Jiahui; Han, Shizhong; Tan, Kai

    2018-02-16

    Identifying noncoding risk variants remains a challenging task. Because noncoding variants exert their effects in the context of a gene regulatory network (GRN), we hypothesize that explicit use of disease-relevant GRNs can significantly improve the inference accuracy of noncoding risk variants. We describe Annotation of Regulatory Variants using Integrated Networks (ARVIN), a general computational framework for predicting causal noncoding variants. It employs a set of novel regulatory network-based features, combined with sequence-based features to infer noncoding risk variants. Using known causal variants in gene promoters and enhancers in a number of diseases, we show ARVIN outperforms state-of-the-art methods that use sequence-based features alone. Additional experimental validation using reporter assay further demonstrates the accuracy of ARVIN. Application of ARVIN to seven autoimmune diseases provides a holistic view of the gene subnetwork perturbed by the combinatorial action of the entire set of risk noncoding mutations.

  13. Cross-Dependency Inference in Multi-Layered Networks: A Collaborative Filtering Perspective.

    Science.gov (United States)

    Chen, Chen; Tong, Hanghang; Xie, Lei; Ying, Lei; He, Qing

    2017-08-01

    The increasingly connected world has catalyzed the fusion of networks from different domains, which facilitates the emergence of a new network model-multi-layered networks. Examples of such kind of network systems include critical infrastructure networks, biological systems, organization-level collaborations, cross-platform e-commerce, and so forth. One crucial structure that distances multi-layered network from other network models is its cross-layer dependency, which describes the associations between the nodes from different layers. Needless to say, the cross-layer dependency in the network plays an essential role in many data mining applications like system robustness analysis and complex network control. However, it remains a daunting task to know the exact dependency relationships due to noise, limited accessibility, and so forth. In this article, we tackle the cross-layer dependency inference problem by modeling it as a collective collaborative filtering problem. Based on this idea, we propose an effective algorithm Fascinate that can reveal unobserved dependencies with linear complexity. Moreover, we derive Fascinate-ZERO, an online variant of Fascinate that can respond to a newly added node timely by checking its neighborhood dependencies. We perform extensive evaluations on real datasets to substantiate the superiority of our proposed approaches.

  14. Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

    Science.gov (United States)

    White, H; Racine, J

    2001-01-01

    We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.

  15. Mutated Genes in Schizophrenia Map to Brain Networks

    Science.gov (United States)

    ... Matters NIH Research Matters August 12, 2013 Mutated Genes in Schizophrenia Map to Brain Networks Schizophrenia networks ... have a high number of spontaneous mutations in genes that form a network in the front region ...

  16. Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.

    Science.gov (United States)

    Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús

    2008-10-01

    Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.

  17. Methods for Inferring Health-Related Social Networks among Coworkers from Online Communication Patterns

    Science.gov (United States)

    Matthews, Luke J.; DeWan, Peter; Rula, Elizabeth Y.

    2013-01-01

    Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network. PMID

  18. Methods for inferring health-related social networks among coworkers from online communication patterns.

    Science.gov (United States)

    Matthews, Luke J; DeWan, Peter; Rula, Elizabeth Y

    2013-01-01

    Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1) the absolute number of emails exchanged, (2) logistic regression probability of an offline relationship, and (3) the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI) across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a social network.

  19. Methods for inferring health-related social networks among coworkers from online communication patterns.

    Directory of Open Access Journals (Sweden)

    Luke J Matthews

    Full Text Available Studies of social networks, mapped using self-reported contacts, have demonstrated the strong influence of social connections on the propensity for individuals to adopt or maintain healthy behaviors and on their likelihood to adopt health risks such as obesity. Social network analysis may prove useful for businesses and organizations that wish to improve the health of their populations by identifying key network positions. Health traits have been shown to correlate across friendship ties, but evaluating network effects in large coworker populations presents the challenge of obtaining sufficiently comprehensive network data. The purpose of this study was to evaluate methods for using online communication data to generate comprehensive network maps that reproduce the health-associated properties of an offline social network. In this study, we examined three techniques for inferring social relationships from email traffic data in an employee population using thresholds based on: (1 the absolute number of emails exchanged, (2 logistic regression probability of an offline relationship, and (3 the highest ranked email exchange partners. As a model of the offline social network in the same population, a network map was created using social ties reported in a survey instrument. The email networks were evaluated based on the proportion of survey ties captured, comparisons of common network metrics, and autocorrelation of body mass index (BMI across social ties. Results demonstrated that logistic regression predicted the greatest proportion of offline social ties, thresholding on number of emails exchanged produced the best match to offline network metrics, and ranked email partners demonstrated the strongest autocorrelation of BMI. Since each method had unique strengths, researchers should choose a method based on the aspects of offline behavior of interest. Ranked email partners may be particularly useful for purposes related to health traits in a

  20. Using neutral network to infer the hydrodynamic yield of aspherical sources

    International Nuclear Information System (INIS)

    Moran, B.; Glenn, L.A.

    1993-07-01

    We distinguish two kinds of difficulties with yield determination from aspherical sources. The first kind, the spoofing difficulty, occurs when a fraction of the energy of the explosion is channeled in such a way that it is not detected by the CORRTEX cable. In this case, neither neural networks nor any expert system can be expected to accurately estimate the yield without detailed information about device emplacement within the canister. Numerical simulations however, can provide an upper bound on the undetected fraction of the explosive energy. In the second instance, the interpretation difficulty, the data appear abnormal when analyzed using similar-explosion-scaling and the assumption of a spherical front. The inferred yield varies with time and the confidence in the yield estimate decreases. It is this kind of problem we address in this paper and for which neural networks can make a contribution. We used a back propagation neural network to infer the hydrodynamic yield of simulated aspherical sources. We trained the network using a subset of simulations from 3 different aspherical sources, with 3 different yields, and 3 satellite offset separations. The trained network was able to predict the yield within 15% in all cases and to identify the correct type of aspherical source in most cases. The predictive capability of the network increased with a larger training set. The neural network approach can easily incorporate information from new calculations or experiments and is therefore flexible and easy to maintain. We describe the potential capabilities and limitations in using such networks for yield estimations

  1. Using neural networks to infer the hydrodynamic yield of aspherical sources

    International Nuclear Information System (INIS)

    Moran, B.; Glenn, L.

    1993-01-01

    We distinguish two kinds of difficulties with yield determination from aspherical sources. The first kind, the spoofing difficulty, occurs when a fraction of the energy of the explosion is channeled in such a way that it is not detected by the CORRTEX cable. In this case, neither neural networks nor any expert system can be expected to accurately estimate the yield without detailed information about device emplacement within the canister. Numerical simulations however, can provide an upper bound on the undetected fraction of the explosive energy. In the second instance, the interpretation difficulty, the data appear abnormal when analyzed using similar-explosion-scaling and the assumption of a spherical front. The inferred yield varies with time and the confidence in the yield estimate decreases. It is this kind of problem we address in this paper and for which neural networks can make a contribution. We used a back propagation neural network to infer the hydrodynamic yield of simulated aspherical sources. We trained the network using a subset of simulations from 3 different aspherical sources, with 3 different yield, and 3 satellite offset separations. The trained network was able to predict the yield within 15% in all cases and to identify the correct type of aspherical source in most cases. The predictive capability of the network increased with a larger training set. The neural network approach can easily incorporate information from new calculations or experiments and is therefore flexible and easy to maintain. We describe the potential capabilities and limitations in using such networks for yield estimations

  2. Evaluating the Limits of Network Topology Inference Via Virtualized Network Emulation

    Science.gov (United States)

    2015-06-01

    virtualized environment. First, we automatically build topological ground truth according to various network generation models and create emulated Cisco ...to various network generation models and create emulated Cisco router networks by leveraging and modifying existing emulation software. We then au... markets , to verifying compliance with policy, as in recent “network neutrality” rules established in the United States. The Internet is a network of

  3. Gene regulation is governed by a core network in hepatocellular carcinoma.

    Science.gov (United States)

    Gu, Zuguang; Zhang, Chenyu; Wang, Jin

    2012-05-01

    Hepatocellular carcinoma (HCC) is one of the most lethal cancers worldwide, and the mechanisms that lead to the disease are still relatively unclear. However, with the development of high-throughput technologies it is possible to gain a systematic view of biological systems to enhance the understanding of the roles of genes associated with HCC. Thus, analysis of the mechanism of molecule interactions in the context of gene regulatory networks can reveal specific sub-networks that lead to the development of HCC. In this study, we aimed to identify the most important gene regulations that are dysfunctional in HCC generation. Our method for constructing gene regulatory network is based on predicted target interactions, experimentally-supported interactions, and co-expression model. Regulators in the network included both transcription factors and microRNAs to provide a complete view of gene regulation. Analysis of gene regulatory network revealed that gene regulation in HCC is highly modular, in which different sets of regulators take charge of specific biological processes. We found that microRNAs mainly control biological functions related to mitochondria and oxidative reduction, while transcription factors control immune responses, extracellular activity and the cell cycle. On the higher level of gene regulation, there exists a core network that organizes regulations between different modules and maintains the robustness of the whole network. There is direct experimental evidence for most of the regulators in the core gene regulatory network relating to HCC. We infer it is the central controller of gene regulation. Finally, we explored the influence of the core gene regulatory network on biological pathways. Our analysis provides insights into the mechanism of transcriptional and post-transcriptional control in HCC. In particular, we highlight the importance of the core gene regulatory network; we propose that it is highly related to HCC and we believe further

  4. Inferring a Drive-Response Network from Time Series of Topological Measures in Complex Networks with Transfer Entropy

    Directory of Open Access Journals (Sweden)

    Xinbo Ai

    2014-11-01

    Full Text Available Topological measures are crucial to describe, classify and understand complex networks. Lots of measures are proposed to characterize specific features of specific networks, but the relationships among these measures remain unclear. Taking into account that pulling networks from different domains together for statistical analysis might provide incorrect conclusions, we conduct our investigation with data observed from the same network in the form of simultaneously measured time series. We synthesize a transfer entropy-based framework to quantify the relationships among topological measures, and then to provide a holistic scenario of these measures by inferring a drive-response network. Techniques from Symbolic Transfer Entropy, Effective Transfer Entropy, and Partial Transfer Entropy are synthesized to deal with challenges such as time series being non-stationary, finite sample effects and indirect effects. We resort to kernel density estimation to assess significance of the results based on surrogate data. The framework is applied to study 20 measures across 2779 records in the Technology Exchange Network, and the results are consistent with some existing knowledge. With the drive-response network, we evaluate the influence of each measure by calculating its strength, and cluster them into three classes, i.e., driving measures, responding measures and standalone measures, according to the network communities.

  5. Limitations of a metabolic network-based reverse ecology method for inferring host-pathogen interactions.

    Science.gov (United States)

    Takemoto, Kazuhiro; Aie, Kazuki

    2017-05-25

    Host-pathogen interactions are important in a wide range of research fields. Given the importance of metabolic crosstalk between hosts and pathogens, a metabolic network-based reverse ecology method was proposed to infer these interactions. However, the validity of this method remains unclear because of the various explanations presented and the influence of potentially confounding factors that have thus far been neglected. We re-evaluated the importance of the reverse ecology method for evaluating host-pathogen interactions while statistically controlling for confounding effects using oxygen requirement, genome, metabolic network, and phylogeny data. Our data analyses showed that host-pathogen interactions were more strongly influenced by genome size, primary network parameters (e.g., number of edges), oxygen requirement, and phylogeny than the reserve ecology-based measures. These results indicate the limitations of the reverse ecology method; however, they do not discount the importance of adopting reverse ecology approaches altogether. Rather, we highlight the need for developing more suitable methods for inferring host-pathogen interactions and conducting more careful examinations of the relationships between metabolic networks and host-pathogen interactions.

  6. Transcriptional delay stabilizes bistable gene networks.

    Science.gov (United States)

    Gupta, Chinmaya; López, José Manuel; Ott, William; Josić, Krešimir; Bennett, Matthew R

    2013-08-02

    Transcriptional delay can significantly impact the dynamics of gene networks. Here we examine how such delay affects bistable systems. We investigate several stochastic models of bistable gene networks and find that increasing delay dramatically increases the mean residence times near stable states. To explain this, we introduce a non-Markovian, analytically tractable reduced model. The model shows that stabilization is the consequence of an increased number of failed transitions between stable states. Each of the bistable systems that we simulate behaves in this manner.

  7. Inferring the functional effect of gene expression changes in signaling pathways

    Science.gov (United States)

    Sebastián-León, Patricia; Carbonell, José; Salavert, Francisco; Sanchez, Rubén; Medina, Ignacio; Dopazo, Joaquín

    2013-01-01

    Signaling pathways constitute a valuable source of information that allows interpreting the way in which alterations in gene activities affect to particular cell functionalities. There are web tools available that allow viewing and editing pathways, as well as representing experimental data on them. However, few methods aimed to identify the signaling circuits, within a pathway, associated to the biological problem studied exist and none of them provide a convenient graphical web interface. We present PATHiWAYS, a web-based signaling pathway visualization system that infers changes in signaling that affect cell functionality from the measurements of gene expression values in typical expression microarray case–control experiments. A simple probabilistic model of the pathway is used to estimate the probabilities for signal transmission from any receptor to any final effector molecule (taking into account the pathway topology) using for this the individual probabilities of gene product presence/absence inferred from gene expression values. Significant changes in these probabilities allow linking different cell functionalities triggered by the pathway to the biological problem studied. PATHiWAYS is available at: http://pathiways.babelomics.org/. PMID:23748960

  8. Opinion Dynamics on Networks with Inference of Unobservable States of Others

    Science.gov (United States)

    Fujie, Ryo

    In most opinion formation models which have been proposed, the agents decide their states (i.e. opinions) by referring to the states of others. However, the referred states of others are not necessarily observable and may be inferred. To investigate the effect of an inference of the states of others on opinion dynamics, I propose an extended voter model on networks where observable and referable node sets are different. These sets for a node defined as the nearest to the mo-th neighbors for observable nodes and the nearest to the mr-th neighbors for referable nodes. The state of referable but unobservable node which is the m-th neighbor (mo pagerank'' is conserved. This conserved quantity coincides with the fixation probability. On the other hand, in the case of mo =mr = 1 , the model comes down to the standard voter model on networks and the conserved quantity is a degree-weighted superposition of the states. Thus, the introduction of the inference changes the important opinion spreaders from the high-degree nodes to the high-betweenness pagerank nodes. This work is supported by the Collaboration Research Program of IDEAS, Chubu University IDEAS2016233.

  9. Uncovering co-expression gene network modules regulating fruit acidity in diverse apples.

    Science.gov (United States)

    Bai, Yang; Dougherty, Laura; Cheng, Lailiang; Zhong, Gan-Yuan; Xu, Kenong

    2015-08-16

    Acidity is a major contributor to fruit quality. Several organic acids are present in apple fruit, but malic acid is predominant and determines fruit acidity. The trait is largely controlled by the Malic acid (Ma) locus, underpinning which Ma1 that putatively encodes a vacuolar aluminum-activated malate transporter1 (ALMT1)-like protein is a strong candidate gene. We hypothesize that fruit acidity is governed by a gene network in which Ma1 is key member. The goal of this study is to identify the gene network and the potential mechanisms through which the network operates. Guided by Ma1, we analyzed the transcriptomes of mature fruit of contrasting acidity from six apple accessions of genotype Ma_ (MaMa or Mama) and four of mama using RNA-seq and identified 1301 fruit acidity associated genes, among which 18 were most significant acidity genes (MSAGs). Network inferring using weighted gene co-expression network analysis (WGCNA) revealed five co-expression gene network modules of significant (P acidity. Overall, this study provides important insight into the Ma1-mediated gene network controlling acidity in mature apple fruit of diverse genetic background.

  10. Inferring regulatory networks from experimental morphological phenotypes: a computational method reverse-engineers planarian regeneration.

    Directory of Open Access Journals (Sweden)

    Daniel Lobo

    2015-06-01

    Full Text Available Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method

  11. Supervisory Adaptive Network-Based Fuzzy Inference System (SANFIS Design for Empirical Test of Mobile Robot

    Directory of Open Access Journals (Sweden)

    Yi-Jen Mon

    2012-10-01

    Full Text Available A supervisory Adaptive Network-based Fuzzy Inference System (SANFIS is proposed for the empirical control of a mobile robot. This controller includes an ANFIS controller and a supervisory controller. The ANFIS controller is off-line tuned by an adaptive fuzzy inference system, the supervisory controller is designed to compensate for the approximation error between the ANFIS controller and the ideal controller, and drive the trajectory of the system onto a specified surface (called the sliding surface or switching surface while maintaining the trajectory onto this switching surface continuously to guarantee the system stability. This SANFIS controller can achieve favourable empirical control performance of the mobile robot in the empirical tests of driving the mobile robot with a square path. Practical experimental results demonstrate that the proposed SANFIS can achieve better control performance than that achieved using an ANFIS controller for empirical control of the mobile robot.

  12. Harnessing diversity towards the reconstructing of large scale gene regulatory networks.

    Directory of Open Access Journals (Sweden)

    Takeshi Hase

    Full Text Available Elucidating gene regulatory network (GRN from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks.

  13. Sequence-based model of gap gene regulatory network.

    Science.gov (United States)

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3

  14. Stochastic Boolean networks: An efficient approach to modeling gene regulatory networks

    Directory of Open Access Journals (Sweden)

    Liang Jinghang

    2012-08-01

    network inferred from a T cell immune response dataset. An SBN can also implement the function of an asynchronous PBN and is potentially useful in a hybrid approach in combination with a continuous or single-molecule level stochastic model. Conclusions Stochastic Boolean networks (SBNs are proposed as an efficient approach to modelling gene regulatory networks (GRNs. The SBN approach is able to recover biologically-proven regulatory behaviours, such as the oscillatory dynamics of the p53-Mdm2 network and the dynamic attractors in a T cell immune response network. The proposed approach can further predict the network dynamics when the genes are under perturbation, thus providing biologically meaningful insights for a better understanding of the dynamics of GRNs. The algorithms and methods described in this paper have been implemented in Matlab packages, which are attached as Additional files.

  15. Interactive visualization of gene regulatory networks with associated gene expression time series data

    NARCIS (Netherlands)

    Westenberg, M.A.; Hijum, van S.A.F.T.; Lulko, A.T.; Kuipers, O.P.; Roerdink, J.B.T.M.; Linsen, L.; Hagen, H.; Hamann, B.

    2008-01-01

    We present GENeVis, an application to visualize gene expression time series data in a gene regulatory network context. This is a network of regulator proteins that regulate the expression of their respective target genes. The networks are represented as graphs, in which the nodes represent genes,

  16. Clinical Outcome Prediction in Aneurysmal Subarachnoid Hemorrhage Using Bayesian Neural Networks with Fuzzy Logic Inferences

    Directory of Open Access Journals (Sweden)

    Benjamin W. Y. Lo

    2013-01-01

    Full Text Available Objective. The novel clinical prediction approach of Bayesian neural networks with fuzzy logic inferences is created and applied to derive prognostic decision rules in cerebral aneurysmal subarachnoid hemorrhage (aSAH. Methods. The approach of Bayesian neural networks with fuzzy logic inferences was applied to data from five trials of Tirilazad for aneurysmal subarachnoid hemorrhage (3551 patients. Results. Bayesian meta-analyses of observational studies on aSAH prognostic factors gave generalizable posterior distributions of population mean log odd ratios (ORs. Similar trends were noted in Bayesian and linear regression ORs. Significant outcome predictors include normal motor response, cerebral infarction, history of myocardial infarction, cerebral edema, history of diabetes mellitus, fever on day 8, prior subarachnoid hemorrhage, admission angiographic vasospasm, neurological grade, intraventricular hemorrhage, ruptured aneurysm size, history of hypertension, vasospasm day, age and mean arterial pressure. Heteroscedasticity was present in the nontransformed dataset. Artificial neural networks found nonlinear relationships with 11 hidden variables in 1 layer, using the multilayer perceptron model. Fuzzy logic decision rules (centroid defuzzification technique denoted cut-off points for poor prognosis at greater than 2.5 clusters. Discussion. This aSAH prognostic system makes use of existing knowledge, recognizes unknown areas, incorporates one's clinical reasoning, and compensates for uncertainty in prognostication.

  17. Using Regularization to Infer Cell Line Specificity in Logical Network Models of Signaling Pathways

    Directory of Open Access Journals (Sweden)

    Sébastien De Landtsheer

    2018-05-01

    Full Text Available Understanding the functional properties of cells of different origins is a long-standing challenge of personalized medicine. Especially in cancer, the high heterogeneity observed in patients slows down the development of effective cures. The molecular differences between cell types or between healthy and diseased cellular states are usually determined by the wiring of regulatory networks. Understanding these molecular and cellular differences at the systems level would improve patient stratification and facilitate the design of rational intervention strategies. Models of cellular regulatory networks frequently make weak assumptions about the distribution of model parameters across cell types or patients. These assumptions are usually expressed in the form of regularization of the objective function of the optimization problem. We propose a new method of regularization for network models of signaling pathways based on the local density of the inferred parameter values within the parameter space. Our method reduces the complexity of models by creating groups of cell line-specific parameters which can then be optimized together. We demonstrate the use of our method by recovering the correct topology and inferring accurate values of the parameters of a small synthetic model. To show the value of our method in a realistic setting, we re-analyze a recently published phosphoproteomic dataset from a panel of 14 colon cancer cell lines. We conclude that our method efficiently reduces model complexity and helps recovering context-specific regulatory information.

  18. Fast Inference of Deep Neural Networks in FPGAs for Particle Physics

    Energy Technology Data Exchange (ETDEWEB)

    Duarte, Javier [Fermilab; Han, Song [MIT; Harris, Philip [MIT; Jindariani, Sergo [Fermilab; Kreinar, Edward [EIS Intl., Herndon; Kreis, Benjamin [Fermilab; Ngadiuba, Jennifer [CERN; Pierini, Maurizio [CERN; Rivera, Ryan [Fermilab; Tran, Nhan [Fermilab; Wu, Zhenbin [Illinois U., Chicago

    2018-04-16

    Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.

  19. Intraspecific relationship within the genus convolvulus l. inferred by rbcl gene using different phylogenetic approaches

    International Nuclear Information System (INIS)

    Kausar, S.; Qamarunnisa, S.

    2016-01-01

    A molecular systematics analysis was conducted using sequence data of chloroplast rbcL gene for the genus Convolvulus L., by distance and character based phylogenetic methods. Fifteen representative members from genus Convolvulus L., were included as in group whereas two members from a sister family Solanaceae were taken as out group to root the tree. Intraspecific relationships within Convolvulus were inferred by distance matrix, maximum parsimony and bayesian analysis. Transition/transversion ratio was also calculated and it was revealed that in the investigated Convolvulus species, transitional changes were more prevalent in rbcL gene. The nature of rbcL gene in the present study was observed to be conserved, as it does not show major variations between examined species. Distance matrix represented the minimal genetic variations between some species (C. glomeratus and C. pyrrhotrichus), thus exhibiting them as close relatives. The result of parsimonious and bayesian analysis revealed almost similar clades however maximum parsimony based tree was unable to establish relationship between some Convolvulus species. The bayesian inference method was found to be the method of choice for establishing intraspecific associations between Convolvulus species using rbcL data as it clearly defined the connections supported by posterior probability values. (author)

  20. Diffany: an ontology-driven framework to infer, visualise and analyse differential molecular networks.

    Science.gov (United States)

    Van Landeghem, Sofie; Van Parys, Thomas; Dubois, Marieke; Inzé, Dirk; Van de Peer, Yves

    2016-01-05

    Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/.

  1. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian

    2016-01-06

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.

  2. A Multiobjective Fuzzy Inference System based Deployment Strategy for a Distributed Mobile Sensor Network

    Directory of Open Access Journals (Sweden)

    Amol P. Bhondekar

    2010-03-01

    Full Text Available Sensor deployment scheme highly governs the effectiveness of distributed wireless sensor network. Issues such as energy conservation and clustering make the deployment problem much more complex. A multiobjective Fuzzy Inference System based strategy for mobile sensor deployment is presented in this paper. This strategy gives a synergistic combination of energy capacity, clustering and peer-to-peer deployment. Performance of our strategy is evaluated in terms of coverage, uniformity, speed and clustering. Our algorithm is compared against a modified distributed self-spreading algorithm to exhibit better performance.

  3. Integration Strategy Is a Key Step in Network-Based Analysis and Dramatically Affects Network Topological Properties and Inferring Outcomes

    Science.gov (United States)

    Jin, Nana; Wu, Deng; Gong, Yonghui; Bi, Xiaoman; Jiang, Hong; Li, Kongning; Wang, Qianghu

    2014-01-01

    An increasing number of experiments have been designed to detect intracellular and intercellular molecular interactions. Based on these molecular interactions (especially protein interactions), molecular networks have been built for using in several typical applications, such as the discovery of new disease genes and the identification of drug targets and molecular complexes. Because the data are incomplete and a considerable number of false-positive interactions exist, protein interactions from different sources are commonly integrated in network analyses to build a stable molecular network. Although various types of integration strategies are being applied in current studies, the topological properties of the networks from these different integration strategies, especially typical applications based on these network integration strategies, have not been rigorously evaluated. In this paper, systematic analyses were performed to evaluate 11 frequently used methods using two types of integration strategies: empirical and machine learning methods. The topological properties of the networks of these different integration strategies were found to significantly differ. Moreover, these networks were found to dramatically affect the outcomes of typical applications, such as disease gene predictions, drug target detections, and molecular complex identifications. The analysis presented in this paper could provide an important basis for future network-based biological researches. PMID:25243127

  4. In silico model-based inference: a contemporary approach for hypothesis testing in network biology.

    Science.gov (United States)

    Klinke, David J

    2014-01-01

    Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.

  5. Inferring animal social networks and leadership: applications for passive monitoring arrays.

    Science.gov (United States)

    Jacoby, David M P; Papastamatiou, Yannis P; Freeman, Robin

    2016-11-01

    Analyses of animal social networks have frequently benefited from techniques derived from other disciplines. Recently, machine learning algorithms have been adopted to infer social associations from time-series data gathered using remote, telemetry systems situated at provisioning sites. We adapt and modify existing inference methods to reveal the underlying social structure of wide-ranging marine predators moving through spatial arrays of passive acoustic receivers. From six months of tracking data for grey reef sharks (Carcharhinus amblyrhynchos) at Palmyra atoll in the Pacific Ocean, we demonstrate that some individuals emerge as leaders within the population and that this behavioural coordination is predicted by both sex and the duration of co-occurrences between conspecifics. In doing so, we provide the first evidence of long-term, spatially extensive social processes in wild sharks. To achieve these results, we interrogate simulated and real tracking data with the explicit purpose of drawing attention to the key considerations in the use and interpretation of inference methods and their impact on resultant social structure. We provide a modified translation of the GMMEvents method for R, including new analyses quantifying the directionality and duration of social events with the aim of encouraging the careful use of these methods more widely in less tractable social animal systems but where passive telemetry is already widespread. © 2016 The Authors.

  6. Generic Properties of Random Gene Regulatory Networks.

    Science.gov (United States)

    Li, Zhiyuan; Bianco, Simone; Zhang, Zhaoyang; Tang, Chao

    2013-12-01

    Modeling gene regulatory networks (GRNs) is an important topic in systems biology. Although there has been much work focusing on various specific systems, the generic behavior of GRNs with continuous variables is still elusive. In particular, it is not clear typically how attractors partition among the three types of orbits: steady state, periodic and chaotic, and how the dynamical properties change with network's topological characteristics. In this work, we first investigated these questions in random GRNs with different network sizes, connectivity, fraction of inhibitory links and transcription regulation rules. Then we searched for the core motifs that govern the dynamic behavior of large GRNs. We show that the stability of a random GRN is typically governed by a few embedding motifs of small sizes, and therefore can in general be understood in the context of these short motifs. Our results provide insights for the study and design of genetic networks.

  7. Network-Guided Key Gene Discovery for a Given Cellular Process

    DEFF Research Database (Denmark)

    He, Feng Q; Ollert, Markus

    2018-01-01

    Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis...... have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data...

  8. Inferring network structure in non-normal and mixed discrete-continuous genomic data.

    Science.gov (United States)

    Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

    2018-03-01

    Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.

  9. Investigations on Incipient Fault Diagnosis of Power Transformer Using Neural Networks and Adaptive Neurofuzzy Inference System

    Directory of Open Access Journals (Sweden)

    Nandkumar Wagh

    2014-01-01

    Full Text Available Continuity of power supply is of utmost importance to the consumers and is only possible by coordination and reliable operation of power system components. Power transformer is such a prime equipment of the transmission and distribution system and needs to be continuously monitored for its well-being. Since ratio methods cannot provide correct diagnosis due to the borderline problems and the probability of existence of multiple faults, artificial intelligence could be the best approach. Dissolved gas analysis (DGA interpretation may provide an insight into the developing incipient faults and is adopted as the preliminary diagnosis tool. In the proposed work, a comparison of the diagnosis ability of backpropagation (BP, radial basis function (RBF neural network, and adaptive neurofuzzy inference system (ANFIS has been investigated and the diagnosis results in terms of error measure, accuracy, network training time, and number of iterations are presented.

  10. Using adaptive network based fuzzy inference system to forecast regional electricity loads

    International Nuclear Information System (INIS)

    Ying, L.-C.; Pan, M.-C.

    2008-01-01

    Since accurate regional load forecasting is very important for improvement of the management performance of the electric industry, various regional load forecasting methods have been developed. The purpose of this study is to apply the adaptive network based fuzzy inference system (ANFIS) model to forecast the regional electricity loads in Taiwan and demonstrate the forecasting performance of this model. Based on the mean absolute percentage errors and statistical results, we can see that the ANFIS model has better forecasting performance than the regression model, artificial neural network (ANN) model, support vector machines with genetic algorithms (SVMG) model, recurrent support vector machines with genetic algorithms (RSVMG) model and hybrid ellipsoidal fuzzy systems for time series forecasting (HEFST) model. Thus, the ANFIS model is a promising alternative for forecasting regional electricity loads

  11. Using adaptive network based fuzzy inference system to forecast regional electricity loads

    Energy Technology Data Exchange (ETDEWEB)

    Ying, Li-Chih [Department of Marketing Management, Central Taiwan University of Science and Technology, 11, Pu-tzu Lane, Peitun, Taichung City 406 (China); Pan, Mei-Chiu [Graduate Institute of Management Sciences, Nanhua University, 32, Chung Keng Li, Dalin, Chiayi 622 (China)

    2008-02-15

    Since accurate regional load forecasting is very important for improvement of the management performance of the electric industry, various regional load forecasting methods have been developed. The purpose of this study is to apply the adaptive network based fuzzy inference system (ANFIS) model to forecast the regional electricity loads in Taiwan and demonstrate the forecasting performance of this model. Based on the mean absolute percentage errors and statistical results, we can see that the ANFIS model has better forecasting performance than the regression model, artificial neural network (ANN) model, support vector machines with genetic algorithms (SVMG) model, recurrent support vector machines with genetic algorithms (RSVMG) model and hybrid ellipsoidal fuzzy systems for time series forecasting (HEFST) model. Thus, the ANFIS model is a promising alternative for forecasting regional electricity loads. (author)

  12. Effects of Sample Size and Dimensionality on the Performance of Four Algorithms for Inference of Association Networks in Metabonomics

    NARCIS (Netherlands)

    Suarez Diez, M.; Saccenti, E.

    2015-01-01

    We investigated the effect of sample size and dimensionality on the performance of four algorithms (ARACNE, CLR, CORR, and PCLRC) when they are used for the inference of metabolite association networks. We report that as many as 100-400 samples may be necessary to obtain stable network estimations,

  13. Phylogenetic Relationships of Pseudorasbora, Pseudopungtungia, and Pungtungia (Teleostei; Cypriniformes; Gobioninae Inferred from Multiple Nuclear Gene Sequences

    Directory of Open Access Journals (Sweden)

    Keun-Yong Kim

    2013-01-01

    Full Text Available Gobionine species belonging to the genera Pseudorasbora, Pseudopungtungia, and Pungtungia (Teleostei; Cypriniformes; Cyprinidae have been heavily studied because of problems on taxonomy, threats of extinction, invasion, and human health. Nucleotide sequences of three nuclear genes, that is, recombination activating protein gene 1 (rag1, recombination activating gene 2 (rag2, and early growth response 1 gene (egr1, from Pseudorasbora, Pseudopungtungia, and Pungtungia species residing in China, Japan, and Korea, were analyzed to elucidate their intergeneric and interspecific phylogenetic relationships. In the phylogenetic tree inferred from their multiple gene sequences, Pseudorasbora, Pseudopungtungia and Pungtungia species ramified into three phylogenetically distinct clades; the “tenuicorpa” clade composed of Pseudopungtungia tenuicorpa, the “parva” clade composed of all Pseudorasbora species/subspecies, and the “herzi” clade composed of Pseudopungtungia nigra, and Pungtungia herzi. The genus Pseudorasbora was recovered as monophyletic, while the genus Pseudopungtungia was recovered as polyphyletic. Our phylogenetic result implies the unstable taxonomic status of the genus Pseudopungtungia.

  14. Hybrid stochastic simplifications for multiscale gene networks

    Directory of Open Access Journals (Sweden)

    Debussche Arnaud

    2009-09-01

    Full Text Available Abstract Background Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. Results We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion 123 which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Conclusion Hybrid simplifications can be used for onion-like (multi-layered approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach.

  15. A graphical user interface for a method to infer kinetics and network architecture (MIKANA).

    Science.gov (United States)

    Mourão, Márcio A; Srividhya, Jeyaraman; McSharry, Patrick E; Crampin, Edmund J; Schnell, Santiago

    2011-01-01

    One of the main challenges in the biomedical sciences is the determination of reaction mechanisms that constitute a biochemical pathway. During the last decades, advances have been made in building complex diagrams showing the static interactions of proteins. The challenge for systems biologists is to build realistic models of the dynamical behavior of reactants, intermediates and products. For this purpose, several methods have been recently proposed to deduce the reaction mechanisms or to estimate the kinetic parameters of the elementary reactions that constitute the pathway. One such method is MIKANA: Method to Infer Kinetics And Network Architecture. MIKANA is a computational method to infer both reaction mechanisms and estimate the kinetic parameters of biochemical pathways from time course data. To make it available to the scientific community, we developed a Graphical User Interface (GUI) for MIKANA. Among other features, the GUI validates and processes an input time course data, displays the inferred reactions, generates the differential equations for the chemical species in the pathway and plots the prediction curves on top of the input time course data. We also added a new feature to MIKANA that allows the user to exclude a priori known reactions from the inferred mechanism. This addition improves the performance of the method. In this article, we illustrate the GUI for MIKANA with three examples: an irreversible Michaelis-Menten reaction mechanism; the interaction map of chemical species of the muscle glycolytic pathway; and the glycolytic pathway of Lactococcus lactis. We also describe the code and methods in sufficient detail to allow researchers to further develop the code or reproduce the experiments described. The code for MIKANA is open source, free for academic and non-academic use and is available for download (Information S1).

  16. Simulation and Statistical Inference of Stochastic Reaction Networks with Applications to Epidemic Models

    KAUST Repository

    Moraes, Alvaro

    2015-01-01

    Epidemics have shaped, sometimes more than wars and natural disasters, demo- graphic aspects of human populations around the world, their health habits and their economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and current examples of potential hazards at planetary scale. During the spread of an epidemic disease, there are phenomena, like the sudden extinction of the epidemic, that can not be captured by deterministic models. As a consequence, stochastic models have been proposed during the last decades. A typical forward problem in the stochastic setting could be the approximation of the expected number of infected individuals found in one month from now. On the other hand, a typical inverse problem could be, given a discretely observed set of epidemiological data, infer the transmission rate of the epidemic or its basic reproduction number. Markovian epidemic models are stochastic models belonging to a wide class of pure jump processes known as Stochastic Reaction Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions but they also have applications in neural networks, virus kinetics, and dynamics of social networks, among others. 4 This PhD thesis is focused on novel fast simulation algorithms and statistical inference methods for SRNs. Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide accurate estimates of expected values of a given observable of SRNs at a prescribed final time. They are designed to control the global approximation error up to a user-selected accuracy and up to a certain confidence level, and with near optimal computational work. We also present novel dual-weighted residual expansions for fast estimation of weak and strong errors arising from the MLMC methodology. Regarding the statistical inference

  17. Platform dependence of inference on gene-wise and gene-set involvement in human lung development

    Directory of Open Access Journals (Sweden)

    Kho Alvin T

    2009-06-01

    Full Text Available Abstract Background With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age. Results We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis. Conclusion We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.

  18. Inference of topology and the nature of synapses, and the flow of information in neuronal networks

    Science.gov (United States)

    Borges, F. S.; Lameu, E. L.; Iarosz, K. C.; Protachevicz, P. R.; Caldas, I. L.; Viana, R. L.; Macau, E. E. N.; Batista, A. M.; Baptista, M. S.

    2018-02-01

    The characterization of neuronal connectivity is one of the most important matters in neuroscience. In this work, we show that a recently proposed informational quantity, the causal mutual information, employed with an appropriate methodology, can be used not only to correctly infer the direction of the underlying physical synapses, but also to identify their excitatory or inhibitory nature, considering easy to handle and measure bivariate time series. The success of our approach relies on a surprising property found in neuronal networks by which nonadjacent neurons do "understand" each other (positive mutual information), however, this exchange of information is not capable of causing effect (zero transfer entropy). Remarkably, inhibitory connections, responsible for enhancing synchronization, transfer more information than excitatory connections, known to enhance entropy in the network. We also demonstrate that our methodology can be used to correctly infer directionality of synapses even in the presence of dynamic and observational Gaussian noise, and is also successful in providing the effective directionality of intermodular connectivity, when only mean fields can be measured.

  19. Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Benjamin A Logsdon

    Full Text Available Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL, which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1, and genes involved in endocytosis (RCY1, the spindle checkpoint (BUB2, sulfonate catabolism (JLP1, and cell-cell communication (PRM7. Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.

  20. Inferring phylogenetic networks by the maximum parsimony criterion: a case study.

    Science.gov (United States)

    Jin, Guohua; Nakhleh, Luay; Snir, Sagi; Tuller, Tamir

    2007-01-01

    Horizontal gene transfer (HGT) may result in genes whose evolutionary histories disagree with each other, as well as with the species tree. In this case, reconciling the species and gene trees results in a network of relationships, known as the "phylogenetic network" of the set of species. A phylogenetic network that incorporates HGT consists of an underlying species tree that captures vertical inheritance and a set of edges which model the "horizontal" transfer of genetic material. In a series of papers, Nakhleh and colleagues have recently formulated a maximum parsimony (MP) criterion for phylogenetic networks, provided an array of computationally efficient algorithms and heuristics for computing it, and demonstrated its plausibility on simulated data. In this article, we study the performance and robustness of this criterion on biological data. Our findings indicate that MP is very promising when its application is extended to the domain of phylogenetic network reconstruction and HGT detection. In all cases we investigated, the MP criterion detected the correct number of HGT events required to map the evolutionary history of a gene data set onto the species phylogeny. Furthermore, our results indicate that the criterion is robust with respect to both incomplete taxon sampling and the use of different site substitution matrices. Finally, our results show that the MP criterion is very promising in detecting HGT in chimeric genes, whose evolutionary histories are a mix of vertical and horizontal evolution. Besides the performance analysis of MP, our findings offer new insights into the evolution of 4 biological data sets and new possible explanations of HGT scenarios in their evolutionary history.

  1. An approach for reduction of false predictions in reverse engineering of gene regulatory networks.

    Science.gov (United States)

    Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar

    2018-05-14

    A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives

  2. Gene regulatory networks elucidating huanglongbing disease mechanisms.

    Directory of Open Access Journals (Sweden)

    Federico Martinelli

    Full Text Available Next-generation sequencing was exploited to gain deeper insight into the response to infection by Candidatus liberibacter asiaticus (CaLas, especially the immune disregulation and metabolic dysfunction caused by source-sink disruption. Previous fruit transcriptome data were compared with additional RNA-Seq data in three tissues: immature fruit, and young and mature leaves. Four categories of orchard trees were studied: symptomatic, asymptomatic, apparently healthy, and healthy. Principal component analysis found distinct expression patterns between immature and mature fruits and leaf samples for all four categories of trees. A predicted protein - protein interaction network identified HLB-regulated genes for sugar transporters playing key roles in the overall plant responses. Gene set and pathway enrichment analyses highlight the role of sucrose and starch metabolism in disease symptom development in all tissues. HLB-regulated genes (glucose-phosphate-transporter, invertase, starch-related genes would likely determine the source-sink relationship disruption. In infected leaves, transcriptomic changes were observed for light reactions genes (downregulation, sucrose metabolism (upregulation, and starch biosynthesis (upregulation. In parallel, symptomatic fruits over-expressed genes involved in photosynthesis, sucrose and raffinose metabolism, and downregulated starch biosynthesis. We visualized gene networks between tissues inducing a source-sink shift. CaLas alters the hormone crosstalk, resulting in weak and ineffective tissue-specific plant immune responses necessary for bacterial clearance. Accordingly, expression of WRKYs (including WRKY70 was higher in fruits than in leaves. Systemic acquired responses were inadequately activated in young leaves, generally considered the sites where most new infections occur.

  3. Inferring the transcriptional landscape of bovine skeletal muscle by integrating co-expression networks.

    Directory of Open Access Journals (Sweden)

    Nicholas J Hudson

    Full Text Available BACKGROUND: Despite modern technologies and novel computational approaches, decoding causal transcriptional regulation remains challenging. This is particularly true for less well studied organisms and when only gene expression data is available. In muscle a small number of well characterised transcription factors are proposed to regulate development. Therefore, muscle appears to be a tractable system for proposing new computational approaches. METHODOLOGY/PRINCIPAL FINDINGS: Here we report a simple algorithm that asks "which transcriptional regulator has the highest average absolute co-expression correlation to the genes in a co-expression module?" It correctly infers a number of known causal regulators of fundamental biological processes, including cell cycle activity (E2F1, glycolysis (HLF, mitochondrial transcription (TFB2M, adipogenesis (PIAS1, neuronal development (TLX3, immune function (IRF1 and vasculogenesis (SOX17, within a skeletal muscle context. However, none of the canonical pro-myogenic transcription factors (MYOD1, MYOG, MYF5, MYF6 and MEF2C were linked to muscle structural gene expression modules. Co-expression values were computed using developing bovine muscle from 60 days post conception (early foetal to 30 months post natal (adulthood for two breeds of cattle, in addition to a nutritional comparison with a third breed. A number of transcriptional landscapes were constructed and integrated into an always correlated landscape. One notable feature was a 'metabolic axis' formed from glycolysis genes at one end, nuclear-encoded mitochondrial protein genes at the other, and centrally tethered by mitochondrially-encoded mitochondrial protein genes. CONCLUSIONS/SIGNIFICANCE: The new module-to-regulator algorithm complements our recently described Regulatory Impact Factor analysis. Together with a simple examination of a co-expression module's contents, these three gene expression approaches are starting to illuminate the in vivo

  4. Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles.

    Science.gov (United States)

    Yu, Yun; Warnow, Tandy; Nakhleh, Luay

    2011-11-01

    One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is Minimize Deep Coalescence (MDC). Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. In this article, we propose new MDC formulations for the cases where the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary. Further, we prove structural theorems that allow us to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. In addition, we devise MDC-based algorithms for cases when multiple alleles per species may be sampled. We study the performance of these methods in coalescent-based computer simulations.

  5. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    Science.gov (United States)

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Dinoflagellate phylogeny as inferred from heat shock protein 90 and ribosomal gene sequences.

    Directory of Open Access Journals (Sweden)

    Mona Hoppenrath

    2010-10-01

    Full Text Available Interrelationships among dinoflagellates in molecular phylogenies are largely unresolved, especially in the deepest branches. Ribosomal DNA (rDNA sequences provide phylogenetic signals only at the tips of the dinoflagellate tree. Two reasons for the poor resolution of deep dinoflagellate relationships using rDNA sequences are (1 most sites are relatively conserved and (2 there are different evolutionary rates among sites in different lineages. Therefore, alternative molecular markers are required to address the deeper phylogenetic relationships among dinoflagellates. Preliminary evidence indicates that the heat shock protein 90 gene (Hsp90 will provide an informative marker, mainly because this gene is relatively long and appears to have relatively uniform rates of evolution in different lineages.We more than doubled the previous dataset of Hsp90 sequences from dinoflagellates by generating additional sequences from 17 different species, representing seven different orders. In order to concatenate the Hsp90 data with rDNA sequences, we supplemented the Hsp90 sequences with three new SSU rDNA sequences and five new LSU rDNA sequences. The new Hsp90 sequences were generated, in part, from four additional heterotrophic dinoflagellates and the type species for six different genera. Molecular phylogenetic analyses resulted in a paraphyletic assemblage near the base of the dinoflagellate tree consisting of only athecate species. However, Noctiluca was never part of this assemblage and branched in a position that was nested within other lineages of dinokaryotes. The phylogenetic trees inferred from Hsp90 sequences were consistent with trees inferred from rDNA sequences in that the backbone of the dinoflagellate clade was largely unresolved.The sequence conservation in both Hsp90 and rDNA sequences and the poor resolution of the deepest nodes suggests that dinoflagellates reflect an explosive radiation in morphological diversity in their recent

  7. Phylogeny of Celastraceae tribe Euonymeae inferred from morphological characters and nuclear and plastid genes.

    Science.gov (United States)

    Simmons, Mark P; McKenna, Miles J; Bacon, Christine D; Yakobson, Kendra; Cappa, Jennifer J; Archer, Robert H; Ford, Andrew J

    2012-01-01

    The phylogeny of Celastraceae tribe Euonymeae (≈ 230 species in eight genera in both the Old and New Worlds) was inferred using morphological characters together with plastid (matK, trnL-F) and nuclear (ITS and 26S rDNA) genes. Tribe Euonymeae has been defined as those genera of Celastraceae with generally opposite leaves, isomerous carpels, loculicidally dehiscent capsules, and arillate seeds (except Microtropis). Euonymus is the most diverse (129 species) and widely cultivated genus in the tribe. We infer that tribe Euonymeae consists of at least six separate lineages within Celastraceae and that a revised natural classification of the family is needed. Microtropis and Quetzalia are inferred to be distinct sister groups that together are sister to Zinowiewia. The endangered Monimopetalum chinense is an isolated and early derived lineage of Celastraceae that represents an important component of phylogenetic diversity within the family. Hedraianthera is sister to Brassiantha, and we describe a second species (Brassiantha hedraiantheroides A.J. Ford) that represents the first reported occurrence of this genus in Australia. Euonymus globularis, from eastern Australia, is sister to Menepetalum, which is endemic to New Caledonia, and we erect a new genus (Dinghoua R.H. Archer) for it. The Madagascan species of Euonymus are sister to Pleurostylia and recognized as a distinct genus (Astrocassine ined.). Glyptopetalum, Torralbasia, and Xylonymus are all closely related to Euonymus sensu stricto and are questionably distinct from it. Current intrageneric classifications of Euonymus are not completely natural and require revision. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. Phylogeny of the Celastraceae inferred from phytochrome B gene sequence and morphology.

    Science.gov (United States)

    Simmons, M P; Clevinger, C C; Savolainen, V; Archer, R H; Mathews, S; Doyle, J J

    2001-02-01

    Phylogenetic relationships within Celastraceae were inferred using a simultaneous analysis of 61 morphological characters and 1123 base pairs of phytochrome B exon 1 from the nuclear genome. No gaps were inferred, and the gene tree topology suggests that the primers were specific to a single locus that did not duplicate among the lineages sampled. This region of phytochrome B was most useful for examining relationships among closely related genera. Fifty-one species from 38 genera of Celastraceae were sampled. The Celastraceae sensu lato (including Hippocrateaceae) were resolved as a monophyletic group. Loesener's subfamilies and tribes of Celastraceae were not supported. The Hippocrateaceae were resolved as a monophyletic group nested within a paraphyletic Celastraceae sensu stricto. Goupia was resolved as more closely related to Euphorbiaceae, Corynocarpaceae, and Linaceae than to Celastraceae. Plagiopteron (Flacourtiaceae) was resolved as the sister group of Hippocrateoideae. Brexia (Brexiaceae) was resolved as closely related to Elaeodendron and Pleurostylia. Canotia was resolved as the sister group of Acanthothamnus within Celastraceae. Perrottetia and Mortonia were resolved as the sister group of the rest of the Celastraceae. Siphonodon was resolved as a derived member of Celastraceae. Maytenus was resolved as three disparate groups, suggesting that this large genus needs to be recircumscribed.

  9. Inference of directed climate networks: role of instability of causality estimation methods

    Science.gov (United States)

    Hlinka, Jaroslav; Hartman, David; Vejmelka, Martin; Paluš, Milan

    2013-04-01

    Climate data are increasingly analyzed by complex network analysis methods, including graph-theoretical approaches [1]. For such analysis, links between localized nodes of climate network are typically quantified by some statistical measures of dependence (connectivity) between measured variables of interest. To obtain information on the directionality of the interactions in the networks, a wide range of methods exists. These can be broadly divided into linear and nonlinear methods, with some of the latter having the theoretical advantage of being model-free, and principally a generalization of the former [2]. However, as a trade-off, this generality comes together with lower accuracy - in particular if the system was close to linear. In an overall stationary system, this may potentially lead to higher variability in the nonlinear network estimates. Therefore, with the same control of false alarms, this may lead to lower sensitivity for detection of real changes in the network structure. These problems are discussed on the example of daily SAT and SLP data from the NCEP/NCAR reanalysis dataset. We first reduce the dimensionality of data using PCA with VARIMAX rotation to detect several dozens of components that together explain most of the data variability. We further construct directed climate networks applying a selection of most widely used methods - variants of linear Granger causality and conditional mutual information. Finally, we assess the stability of the detected directed climate networks by computing them in sliding time windows. To understand the origin of the observed instabilities and their range, we also apply the same procedure to two types of surrogate data: either with non-stationarity in network structure removed, or imposed in a controlled way. In general, the linear methods show stable results in terms of overall similarity of directed climate networks inferred. For instance, for different decades of SAT data, the Spearman correlation of edge

  10. Challenges to inferring causality from viral information dispersion in dynamic social networks

    Science.gov (United States)

    Ternovski, John

    2014-06-01

    Understanding the mechanism behind large-scale information dispersion through complex networks has important implications for a variety of industries ranging from cyber-security to public health. With the unprecedented availability of public data from online social networks (OSNs) and the low cost nature of most OSN outreach, randomized controlled experiments, the "gold standard" of causal inference methodologies, have been used with increasing regularity to study viral information dispersion. And while these studies have dramatically furthered our understanding of how information disseminates through social networks by isolating causal mechanisms, there are still major methodological concerns that need to be addressed in future research. This paper delineates why modern OSNs are markedly different from traditional sociological social networks and why these differences present unique challenges to experimentalists and data scientists. The dynamic nature of OSNs is particularly troublesome for researchers implementing experimental designs, so this paper identifies major sources of bias arising from network mutability and suggests strategies to circumvent and adjust for these biases. This paper also discusses the practical considerations of data quality and collection, which may adversely impact the efficiency of the estimator. The major experimental methodologies used in the current literature on virality are assessed at length, and their strengths and limits identified. Other, as-yetunsolved threats to the efficiency and unbiasedness of causal estimators--such as missing data--are also discussed. This paper integrates methodologies and learnings from a variety of fields under an experimental and data science framework in order to systematically consolidate and identify current methodological limitations of randomized controlled experiments conducted in OSNs.

  11. Emulation of reionization simulations for Bayesian inference of astrophysics parameters using neural networks

    Science.gov (United States)

    Schmit, C. J.; Pritchard, J. R.

    2018-03-01

    Next generation radio experiments such as LOFAR, HERA, and SKA are expected to probe the Epoch of Reionization (EoR) and claim a first direct detection of the cosmic 21cm signal within the next decade. Data volumes will be enormous and can thus potentially revolutionize our understanding of the early Universe and galaxy formation. However, numerical modelling of the EoR can be prohibitively expensive for Bayesian parameter inference and how to optimally extract information from incoming data is currently unclear. Emulation techniques for fast model evaluations have recently been proposed as a way to bypass costly simulations. We consider the use of artificial neural networks as a blind emulation technique. We study the impact of training duration and training set size on the quality of the network prediction and the resulting best-fitting values of a parameter search. A direct comparison is drawn between our emulation technique and an equivalent analysis using 21CMMC. We find good predictive capabilities of our network using training sets of as low as 100 model evaluations, which is within the capabilities of fully numerical radiative transfer codes.

  12. Artificial neural networks and neuro-fuzzy inference systems as virtual sensors for hydrogen safety prediction

    Energy Technology Data Exchange (ETDEWEB)

    Karri, Vishy; Ho, Tien [School of Engineering, University of Tasmania, GPO Box 252-65, Hobart, Tasmania 7001 (Australia); Madsen, Ole [Department of Production, Aalborg University, Fibigerstraede 16, DK-9220 Aalborg (Denmark)

    2008-06-15

    Hydrogen is increasingly investigated as an alternative fuel to petroleum products in running internal combustion engines and as powering remote area power systems using generators. The safety issues related to hydrogen gas are further exasperated by expensive instrumentation required to measure the percentage of explosive limits, flow rates and production pressure. This paper investigates the use of model based virtual sensors (rather than expensive physical sensors) in connection with hydrogen production with a Hogen 20 electrolyzer system. The virtual sensors are used to predict relevant hydrogen safety parameters, such as the percentage of lower explosive limit, hydrogen pressure and hydrogen flow rate as a function of different input conditions of power supplied (voltage and current), the feed of de-ionized water and Hogen 20 electrolyzer system parameters. The virtual sensors are developed by means of the application of various Artificial Intelligent techniques. To train and appraise the neural network models as virtual sensors, the Hogen 20 electrolyzer is instrumented with necessary sensors to gather experimental data which together with MATLAB neural networks toolbox and tailor made adaptive neuro-fuzzy inference systems (ANFIS) were used as predictive tools to estimate hydrogen safety parameters. It was shown that using the neural networks hydrogen safety parameters were predicted to less than 3% of percentage average root mean square error. The most accurate prediction was achieved by using ANFIS. (author)

  13. Nonlinear Inference in Partially Observed Physical Systems and Deep Neural Networks

    Science.gov (United States)

    Rozdeba, Paul J.

    The problem of model state and parameter estimation is a significant challenge in nonlinear systems. Due to practical considerations of experimental design, it is often the case that physical systems are partially observed, meaning that data is only available for a subset of the degrees of freedom required to fully model the observed system's behaviors and, ultimately, predict future observations. Estimation in this context is highly complicated by the presence of chaos, stochasticity, and measurement noise in dynamical systems. One of the aims of this dissertation is to simultaneously analyze state and parameter estimation in as a regularized inverse problem, where the introduction of a model makes it possible to reverse the forward problem of partial, noisy observation; and as a statistical inference problem using data assimilation to transfer information from measurements to the model states and parameters. Ultimately these two formulations achieve the same goal. Similar aspects that appear in both are highlighted as a means for better understanding the structure of the nonlinear inference problem. An alternative approach to data assimilation that uses model reduction is then examined as a way to eliminate unresolved nonlinear gating variables from neuron models. In this formulation, only measured variables enter into the model, and the resulting errors are themselves modeled by nonlinear stochastic processes with memory. Finally, variational annealing, a data assimilation method previously applied to dynamical systems, is introduced as a potentially useful tool for understanding deep neural network training in machine learning by exploiting similarities between the two problems.

  14. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian

    2016-02-20

    © 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  15. An efficient forward-reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Vilanova, Pedro

    2016-01-07

    In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  16. A fast and efficient gene-network reconstruction method from multiple over-expression experiments

    Directory of Open Access Journals (Sweden)

    Thurner Stefan

    2009-08-01

    Full Text Available Abstract Background Reverse engineering of gene regulatory networks presents one of the big challenges in systems biology. Gene regulatory networks are usually inferred from a set of single-gene over-expressions and/or knockout experiments. Functional relationships between genes are retrieved either from the steady state gene expressions or from respective time series. Results We present a novel algorithm for gene network reconstruction on the basis of steady-state gene-chip data from over-expression experiments. The algorithm is based on a straight forward solution of a linear gene-dynamics equation, where experimental data is fed in as a first predictor for the solution. We compare the algorithm's performance with the NIR algorithm, both on the well known E. coli experimental data and on in-silico experiments. Conclusion We show superiority of the proposed algorithm in the number of correctly reconstructed links and discuss computational time and robustness. The proposed algorithm is not limited by combinatorial explosion problems and can be used in principle for large networks.

  17. Bayesian inference based modelling for gene transcriptional dynamics by integrating multiple source of knowledge

    Directory of Open Access Journals (Sweden)

    Wang Shu-Qiang

    2012-07-01

    Full Text Available Abstract Background A key challenge in the post genome era is to identify genome-wide transcriptional regulatory networks, which specify the interactions between transcription factors and their target genes. Numerous methods have been developed for reconstructing gene regulatory networks from expression data. However, most of them are based on coarse grained qualitative models, and cannot provide a quantitative view of regulatory systems. Results A binding affinity based regulatory model is proposed to quantify the transcriptional regulatory network. Multiple quantities, including binding affinity and the activity level of transcription factor (TF are incorporated into a general learning model. The sequence features of the promoter and the possible occupancy of nucleosomes are exploited to estimate the binding probability of regulators. Comparing with the previous models that only employ microarray data, the proposed model can bridge the gap between the relative background frequency of the observed nucleotide and the gene's transcription rate. Conclusions We testify the proposed approach on two real-world microarray datasets. Experimental results show that the proposed model can effectively identify the parameters and the activity level of TF. Moreover, the kinetic parameters introduced in the proposed model can reveal more biological sense than previous models can do.

  18. Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles.

    Science.gov (United States)

    Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko

    2016-06-01

    Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks

    Directory of Open Access Journals (Sweden)

    Hamelryck Thomas

    2010-03-01

    Full Text Available Abstract Background Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs. It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations. Results The program package is freely available under the GNU General Public Licence (GPL from SourceForge http://sourceforge.net/projects/mocapy. The package contains the source for building the Mocapy++ library, several usage examples and the user manual. Conclusions Mocapy++ is especially suitable for constructing probabilistic models of biomolecular structure, due to its support for directional statistics. In particular, it supports the Kent distribution on the sphere and the bivariate von Mises distribution on the torus. These distributions have proven useful to formulate probabilistic models of protein and RNA structure in atomic detail.

  20. Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data

    Directory of Open Access Journals (Sweden)

    de los Reyes Benildo G

    2008-04-01

    Full Text Available Abstract Background Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM, facilitates the inference. Identifying NMs defined by specific transcription factors (TF establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO information. Results The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1 Gene set enrichment

  1. Paper-based synthetic gene networks.

    Science.gov (United States)

    Pardee, Keith; Green, Alexander A; Ferrante, Tom; Cameron, D Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J

    2014-11-06

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides an alternate, versatile venue for synthetic biologists to operate and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze dried onto paper, enabling the inexpensive, sterile, and abiotic distribution of synthetic-biology-based technologies for the clinic, global health, industry, research, and education. For field use, we create circuits with colorimetric outputs for detection by eye and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small-molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors.

  2. Paper-based Synthetic Gene Networks

    Science.gov (United States)

    Pardee, Keith; Green, Alexander A.; Ferrante, Tom; Cameron, D. Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J.

    2014-01-01

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides a new venue for synthetic biologists to operate, and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze-dried onto paper, enabling the inexpensive, sterile and abiotic distribution of synthetic biology-based technologies for the clinic, global health, industry, research and education. For field use, we create circuits with colorimetric outputs for detection by eye, and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors. PMID:25417167

  3. Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.

    Science.gov (United States)

    Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani

    2017-03-27

    Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm

  4. Functional coupling networks inferred from prefrontal cortex activity show experience-related effective plasticity

    Directory of Open Access Journals (Sweden)

    Gaia Tavoni

    2017-10-01

    Full Text Available Functional coupling networks are widely used to characterize collective patterns of activity in neural populations. Here, we ask whether functional couplings reflect the subtle changes, such as in physiological interactions, believed to take place during learning. We infer functional network models reproducing the spiking activity of simultaneously recorded neurons in prefrontal cortex (PFC of rats, during the performance of a cross-modal rule shift task (task epoch, and during preceding and following sleep epochs. A large-scale study of the 96 recorded sessions allows us to detect, in about 20% of sessions, effective plasticity between the sleep epochs. These coupling modifications are correlated with the coupling values in the task epoch, and are supported by a small subset of the recorded neurons, which we identify by means of an automatized procedure. These potentiated groups increase their coativation frequency in the spiking data between the two sleep epochs, and, hence, participate to putative experience-related cell assemblies. Study of the reactivation dynamics of the potentiated groups suggests a possible connection with behavioral learning. Reactivation is largely driven by hippocampal ripple events when the rule is not yet learned, and may be much more autonomous, and presumably sustained by the potentiated PFC network, when learning is consolidated. Cell assemblies coding for memories are widely believed to emerge through synaptic modification resulting from learning, yet their identification from activity is very arduous. We propose a functional-connectivity-based approach to identify experience-related cell assemblies from multielectrode recordings in vivo, and apply it to the prefrontal cortex activity of rats recorded during a task epoch and the preceding and following sleep epochs. We infer functional couplings between the recorded cells in each epoch. Comparisons of the functional coupling networks across the epochs allow us

  5. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  6. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

    Science.gov (United States)

    Wu, Yufeng

    2012-03-01

    Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

  7. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues.

    Science.gov (United States)

    Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong

    2017-10-02

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.

  8. Discovering implicit entity relation with the gene-citation-gene network.

    Directory of Open Access Journals (Sweden)

    Min Song

    Full Text Available In this paper, we apply the entitymetrics model to our constructed Gene-Citation-Gene (GCG network. Based on the premise there is a hidden, but plausible, relationship between an entity in one article and an entity in its citing article, we constructed a GCG network of gene pairs implicitly connected through citation. We compare the performance of this GCG network to a gene-gene (GG network constructed over the same corpus but which uses gene pairs explicitly connected through traditional co-occurrence. Using 331,411 MEDLINE abstracts collected from 18,323 seed articles and their references, we identify 25 gene pairs. A comparison of these pairs with interactions found in BioGRID reveal that 96% of the gene pairs in the GCG network have known interactions. We measure network performance using degree, weighted degree, closeness, betweenness centrality and PageRank. Combining all measures, we find the GCG network has more gene pairs, but a lower matching rate than the GG network. However, combining top ranked genes in both networks produces a matching rate of 35.53%. By visualizing both the GG and GCG networks, we find that cancer is the most dominant disease associated with the genes in both networks. Overall, the study indicates that the GCG network can be useful for detecting gene interaction in an implicit manner.

  9. A network-based gene expression signature informs prognosis and treatment for colorectal cancer patients.

    Directory of Open Access Journals (Sweden)

    Mingguang Shi

    Full Text Available Several studies have reported gene expression signatures that predict recurrence risk in stage II and III colorectal cancer (CRC patients with minimal gene membership overlap and undefined biological relevance. The goal of this study was to investigate biological themes underlying these signatures, to infer genes of potential mechanistic importance to the CRC recurrence phenotype and to test whether accurate prognostic models can be developed using mechanistically important genes.We investigated eight published CRC gene expression signatures and found no functional convergence in Gene Ontology enrichment analysis. Using a random walk-based approach, we integrated these signatures and publicly available somatic mutation data on a protein-protein interaction network and inferred 487 genes that were plausible candidate molecular underpinnings for the CRC recurrence phenotype. We named the list of 487 genes a NEM signature because it integrated information from Network, Expression, and Mutation. The signature showed significant enrichment in four biological processes closely related to cancer pathophysiology and provided good coverage of known oncogenes, tumor suppressors, and CRC-related signaling pathways. A NEM signature-based Survival Support Vector Machine prognostic model was trained using a microarray gene expression dataset and tested on an independent dataset. The model-based scores showed a 75.7% concordance with the real survival data and separated patients into two groups with significantly different relapse-free survival (p = 0.002. Similar results were obtained with reversed training and testing datasets (p = 0.007. Furthermore, adjuvant chemotherapy was significantly associated with prolonged survival of the high-risk patients (p = 0.006, but not beneficial to the low-risk patients (p = 0.491.The NEM signature not only reflects CRC biology but also informs patient prognosis and treatment response. Thus, the network

  10. Integration of biological networks and gene expression data using Cytoscape

    DEFF Research Database (Denmark)

    Cline, M.S.; Smoot, M.; Cerami, E.

    2007-01-01

    of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules......Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context...... and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape....

  11. Biomarker Gene Signature Discovery Integrating Network Knowledge

    Directory of Open Access Journals (Sweden)

    Holger Fröhlich

    2012-02-01

    Full Text Available Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

  12. A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

    Science.gov (United States)

    Newton, Richard; Wernisch, Lorenz

    2014-01-01

    Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247

  13. A dynamic discretization method for reliability inference in Dynamic Bayesian Networks

    International Nuclear Information System (INIS)

    Zhu, Jiandao; Collette, Matthew

    2015-01-01

    The material and modeling parameters that drive structural reliability analysis for marine structures are subject to a significant uncertainty. This is especially true when time-dependent degradation mechanisms such as structural fatigue cracking are considered. Through inspection and monitoring, information such as crack location and size can be obtained to improve these parameters and the corresponding reliability estimates. Dynamic Bayesian Networks (DBNs) are a powerful and flexible tool to model dynamic system behavior and update reliability and uncertainty analysis with life cycle data for problems such as fatigue cracking. However, a central challenge in using DBNs is the need to discretize certain types of continuous random variables to perform network inference while still accurately tracking low-probability failure events. Most existing discretization methods focus on getting the overall shape of the distribution correct, with less emphasis on the tail region. Therefore, a novel scheme is presented specifically to estimate the likelihood of low-probability failure events. The scheme is an iterative algorithm which dynamically partitions the discretization intervals at each iteration. Through applications to two stochastic crack-growth example problems, the algorithm is shown to be robust and accurate. Comparisons are presented between the proposed approach and existing methods for the discretization problem. - Highlights: • A dynamic discretization method is developed for low-probability events in DBNs. • The method is compared to existing approaches on two crack growth problems. • The method is shown to improve on existing methods for low-probability events

  14. Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.

    Science.gov (United States)

    Walker, Jeffrey A

    2016-01-01

    Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori . Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression. The standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O'Brien's OLS test, Anderson's permutation [Formula: see text]-test, two permutation F -tests (including GlobalAncova), and a rotation z -test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset. GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in

  15. Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jeffrey A. Walker

    2016-10-01

    Full Text Available Background Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined a priori. Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set. The original analysis of these data used a linear model (GLS of fixed effects with correlated error to infer effects of Hedonia and Eudaimonia on mean CTRA expression. Methods The standardized effects of Hedonia and Eudaimonia on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS linear models and generalized estimating equation (GEE models. The OLS estimates were tested using O’Brien’s OLS test, Anderson’s permutation ${r}_{F}^{2}$ r F 2 -test, two permutation F-tests (including GlobalAncova, and a rotation z-test (Roast. The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset. Results GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS

  16. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    Science.gov (United States)

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  17. Inferring species trees from gene trees in a radiation of California trapdoor spiders (Araneae, Antrodiaetidae, Aliatypus.

    Directory of Open Access Journals (Sweden)

    Jordan D Satler

    Full Text Available The California Floristic Province is a biodiversity hotspot, reflecting a complex geologic history, strong selective gradients, and a heterogeneous landscape. These factors have led to high endemic diversity across many lifeforms within this region, including the richest diversity of mygalomorph spiders (tarantulas, trapdoor spiders, and kin in North America. The trapdoor spider genus Aliatypus encompasses twelve described species, eleven of which are endemic to California. Several Aliatypus species show disjunct distributional patterns in California (some are found on both sides of the vast Central Valley, and the genus as a whole occupies an impressive variety of habitats.We collected specimens from 89 populations representing all described species. DNA sequence data were collected from seven gene regions, including two newly developed for spider systematics. Bayesian inference (in individual gene tree and species tree approaches recovered a general "3 clade" structure for the genus (A. gulosus, californicus group, erebus group, with three other phylogenetically isolated species differing slightly in position across different phylogenetic analyses. Because of extremely high intraspecific divergences in mitochondrial COI sequences, the relatively slowly evolving 28S rRNA gene was found to be more useful than mitochondrial data for identification of morphologically indistinguishable immatures. For multiple species spanning the Central Valley, explicit hypothesis testing suggests a lack of monophyly for regional populations (e.g., western Coast Range populations. Phylogenetic evidence clearly shows that syntopy is restricted to distant phylogenetic relatives, consistent with ecological niche conservatism.This study provides fundamental insight into a radiation of trapdoor spiders found in the biodiversity hotspot of California. Species relationships are clarified and undescribed lineages are discovered, with more geographic sampling likely to

  18. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  19. Insights gained from the reverse engineering of gene networks in keloid fibroblasts

    Directory of Open Access Journals (Sweden)

    Phan Toan

    2011-05-01

    Full Text Available Abstract Background Keloids are protrusive claw-like scars that have a propensity to recur even after surgery, and its molecular etiology remains elusive. The goal of reverse engineering is to infer gene networks from observational data, thus providing insight into the inner workings of a cell. However, most attempts at modeling biological networks have been done using simulated data. This study aims to highlight some of the issues involved in working with experimental data, and at the same time gain some insights into the transcriptional regulatory mechanism present in keloid fibroblasts. Methods Microarray data from our previous study was combined with microarray data obtained from the literature as well as new microarray data generated by our group. For the physical approach, we used the fREDUCE algorithm for correlating expression values to binding motifs. For the influence approach, we compared the Bayesian algorithm BANJO with the information theoretic method ARACNE in terms of performance in recovering known influence networks obtained from the KEGG database. In addition, we also compared the performance of different normalization methods as well as different types of gene networks. Results Using the physical approach, we found consensus sequences that were active in the keloid condition, as well as some sequences that were responsive to steroids, a commonly used treatment for keloids. From the influence approach, we found that BANJO was better at recovering the gene networks compared to ARACNE and that transcriptional networks were better suited for network recovery compared to cytokine-receptor interaction networks and intracellular signaling networks. We also found that the NFKB transcriptional network that was inferred from normal fibroblast data was more accurate compared to that inferred from keloid data, suggesting a more robust network in the keloid condition. Conclusions Consensus sequences that were found from this study are

  20. DREAM3: network inference using dynamic context likelihood of relatedness and the inferelator.

    Directory of Open Access Journals (Sweden)

    Aviv Madar

    2010-03-01

    Full Text Available Many current works aiming to learn regulatory networks from systems biology data must balance model complexity with respect to data availability and quality. Methods that learn regulatory associations based on unit-less metrics, such as Mutual Information, are attractive in that they scale well and reduce the number of free parameters (model complexity per interaction to a minimum. In contrast, methods for learning regulatory networks based on explicit dynamical models are more complex and scale less gracefully, but are attractive as they may allow direct prediction of transcriptional dynamics and resolve the directionality of many regulatory interactions.We aim to investigate whether scalable information based methods (like the Context Likelihood of Relatedness method and more explicit dynamical models (like Inferelator 1.0 prove synergistic when combined. We test a pipeline where a novel modification of the Context Likelihood of Relatedness (mixed-CLR, modified to use time series data is first used to define likely regulatory interactions and then Inferelator 1.0 is used for final model selection and to build an explicit dynamical model.Our method ranked 2nd out of 22 in the DREAM3 100-gene in silico networks challenge. Mixed-CLR and Inferelator 1.0 are complementary, demonstrating a large performance gain relative to any single tested method, with precision being especially high at low recall values. Partitioning the provided data set into four groups (knock-down, knock-out, time-series, and combined revealed that using comprehensive knock-out data alone provides optimal performance. Inferelator 1.0 proved particularly powerful at resolving the directionality of regulatory interactions, i.e. "who regulates who" (approximately of identified true positives were correctly resolved. Performance drops for high in-degree genes, i.e. as the number of regulators per target gene increases, but not with out-degree, i.e. performance is not affected by

  1. Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach.

    Science.gov (United States)

    Hu, Yan-Shi; Xin, Juncai; Hu, Ying; Zhang, Lei; Wang, Ju

    2017-04-27

    Our understanding of the molecular mechanisms underlying Alzheimer's disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules-neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module-indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. By

  2. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis.

    Science.gov (United States)

    van Dam, Jesse C J; Schaap, Peter J; Martins dos Santos, Vitor A P; Suárez-Diez, María

    2014-09-26

    Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different

  3. Differential reconstructed gene interaction networks for deriving toxicity threshold in chemical risk assessment.

    Science.gov (United States)

    Yang, Yi; Maxwell, Andrew; Zhang, Xiaowei; Wang, Nan; Perkins, Edward J; Zhang, Chaoyang; Gong, Ping

    2013-01-01

    Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting. Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential edge inference, mapping of genes with differential edges to pathways, and establishment of causal relationships between chemical concentration and perturbed pathways. A one-sample Gaussian process model and a linear regression model were used to identify genes that exhibited significant profile changes across an entire time course and between treatments, respectively. Interaction networks of differentially expressed (DE) genes were reconstructed for different treatments using a state space model and then compared to infer differential edges/interactions. DE genes possessing differential edges were mapped to biological pathways in databases such as KEGG pathways. Using the DNs approach, we analyzed a time-series Escherichia coli live cell gene expression dataset consisting of 4 treatments (control, 10, 100, 1000 mg/L naphthenic acids, NAs) and 18 time points. Through comparison of reconstructed networks and construction of differential networks, 80 genes were identified as DE genes with a significant number of differential edges, and 22 KEGG pathways were altered in a concentration-dependent manner. Some of these pathways were perturbed to a degree as high as 70% even at the lowest exposure concentration, implying a high sensitivity of our DNs approach

  4. Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease.

    Science.gov (United States)

    Johnson, Michael R; Shkura, Kirill; Langley, Sarah R; Delahaye-Duriez, Andree; Srivastava, Prashant; Hill, W David; Rackham, Owen J L; Davies, Gail; Harris, Sarah E; Moreno-Moral, Aida; Rotival, Maxime; Speed, Doug; Petrovski, Slavé; Katz, Anaïs; Hayward, Caroline; Porteous, David J; Smith, Blair H; Padmanabhan, Sandosh; Hocking, Lynne J; Starr, John M; Liewald, David C; Visconti, Alessia; Falchi, Mario; Bottolo, Leonardo; Rossetti, Tiziana; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Grote, Alexander; Helmstaedter, Christoph; Becker, Albert J; Kaminski, Rafal M; Deary, Ian J; Petretto, Enrico

    2016-02-01

    Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease-associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.

  5. Phylogenetic inference of Coxiella burnetii by 16S rRNA gene sequencing.

    Directory of Open Access Journals (Sweden)

    Heather P McLaughlin

    Full Text Available Coxiella burnetii is a human pathogen that causes the serious zoonotic disease Q fever. It is ubiquitous in the environment and due to its wide host range, long-range dispersal potential and classification as a bioterrorism agent, this microorganism is considered an HHS Select Agent. In the event of an outbreak or intentional release, laboratory strain typing methods can contribute to epidemiological investigations, law enforcement investigation and the public health response by providing critical information about the relatedness between C. burnetii isolates collected from different sources. Laboratory cultivation of C. burnetii is both time-consuming and challenging. Availability of strain collections is often limited and while several strain typing methods have been described over the years, a true gold-standard method is still elusive. Building upon epidemiological knowledge from limited, historical strain collections and typing data is essential to more accurately infer C. burnetii phylogeny. Harmonization of auspicious high-resolution laboratory typing techniques is critical to support epidemiological and law enforcement investigation. The single nucleotide polymorphism (SNP -based genotyping approach offers simplicity, rapidity and robustness. Herein, we demonstrate SNPs identified within 16S rRNA gene sequences can differentiate C. burnetii strains. Using this method, 55 isolates were assigned to six groups based on six polymorphisms. These 16S rRNA SNP-based genotyping results were largely congruent with those obtained by analyzing restriction-endonuclease (RE-digested DNA separated by SDS-PAGE and by the high-resolution approach based on SNPs within multispacer sequence typing (MST loci. The SNPs identified within the 16S rRNA gene can be used as targets for the development of additional SNP-based genotyping assays for C. burnetii.

  6. TIMP2 gene polymorphism as a potential tool to infer Brazilian population origin

    Directory of Open Access Journals (Sweden)

    da Silva RA

    2013-12-01

    Full Text Available Rodrigo Augusto da Silva,1 André Luis Shinohara,2 Denise Carleto Andia,1 Ariadne Letra,3 Regina Célia Peres,1 Ana Paula de Souza11Department of Morphology, Piracicaba Dental School, State University of Campinas, 2Oral Biology Program, Bauru Dental School, State University of São Paulo, São Paulo, Brazil; 3Department of Endodontics and Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center, Houston, TX, USAAbstract: Single nucleotide polymorphisms are genome variations that can be used as population-specific markers to infer genetic background and population origin. The Brazilian population is highly admixed due to immigration from several other populations. In particular, the state of São Paulo is recognized for the presence of Japanese individuals who seem likely to have contributed to a substantial proportion of ancestry in the modern Brazilian population. In the present study, we analyzed allele and genotype frequencies and associations of the –418G>C (rs8179090 single nucleotide polymorphism in the TIMP2 gene promoter in Brazilian and Japanese subjects, as well as in Japanese descendants from southeastern Brazil. The allele and genotype frequency analyses among groups demonstrated statistical significance (PC single nucleotide polymorphism of the TIMP2 gene, have a high probability of being Japanese or Japanese descendants. In addition to other genetic polymorphisms, the −418G>C TIMP2 polymorphism could be a population marker to assist in predicting Japanese ancestry, both in Japanese individuals and in admixed populations.Keywords: Brazilian, Japanese, polymorphism, allele, TIMP2

  7. Premature ventricular contraction detection combining deep neural networks and rules inference.

    Science.gov (United States)

    Zhou, Fei-Yan; Jin, Lin-Peng; Dong, Jun

    2017-06-01

    Premature ventricular contraction (PVC), which is a common form of cardiac arrhythmia caused by ectopic heartbeat, can lead to life-threatening cardiac conditions. Computer-aided PVC detection is of considerable importance in medical centers or outpatient ECG rooms. In this paper, we proposed a new approach that combined deep neural networks and rules inference for PVC detection. The detection performance and generalization were studied using publicly available databases: the MIT-BIH arrhythmia database (MIT-BIH-AR) and the Chinese Cardiovascular Disease Database (CCDD). The PVC detection accuracy on the MIT-BIH-AR database was 99.41%, with a sensitivity and specificity of 97.59% and 99.54%, respectively, which were better than the results from other existing methods. To test the generalization capability, the detection performance was also evaluated on the CCDD. The effectiveness of the proposed method was confirmed by the accuracy (98.03%), sensitivity (96.42%) and specificity (98.06%) with the dataset over 140,000 ECG recordings of the CCDD. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. An integer optimization algorithm for robust identification of non-linear gene regulatory networks

    Directory of Open Access Journals (Sweden)

    Chemmangattuvalappil Nishanth

    2012-09-01

    Full Text Available Abstract Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters

  9. Networks in biological systems: An investigation of the Gene Ontology as an evolving network

    International Nuclear Information System (INIS)

    Coronnello, C; Tumminello, M; Micciche, S; Mantegna, R.N.

    2009-01-01

    Many biological systems can be described as networks where different elements interact, in order to perform biological processes. We introduce a network associated with the Gene Ontology. Specifically, we construct a correlation-based network where the vertices are the terms of the Gene Ontology and the link between each two terms is weighted on the basis of the number of genes that they have in common. We analyze a filtered network obtained from the correlation-based network and we characterize its evolution over different releases of the Gene Ontology.

  10. Phylogenetic relationships and timing of diversification in gonorynchiform fishes inferred using nuclear gene DNA sequences (Teleostei: Ostariophysi).

    Science.gov (United States)

    Near, Thomas J; Dornburg, Alex; Friedman, Matt

    2014-11-01

    The Gonorynchiformes are the sister lineage of the species-rich Otophysi and provide important insights into the diversification of ostariophysan fishes. Phylogenies of gonorynchiforms inferred using morphological characters and mtDNA gene sequences provide differing resolutions with regard to the sister lineage of all other gonorynchiforms (Chanos vs. Gonorynchus) and support for monophyly of the two miniaturized lineages Cromeria and Grasseichthys. In this study the phylogeny and divergence times of gonorynchiforms are investigated with DNA sequences sampled from nine nuclear genes and a published morphological character matrix. Bayesian phylogenetic analyses reveal substantial congruence among individual gene trees with inferences from eight genes placing Gonorynchus as the sister lineage to all other gonorynchiforms. Seven gene trees resolve Cromeria and Grasseichthys as a clade, supporting previous inferences using morphological characters. Phylogenies resulting from either concatenating the nuclear genes, performing a multispecies coalescent species tree analysis, or combining the morphological and nuclear gene DNA sequences resolve Gonorynchus as the living sister lineage of all other gonorynchiforms, strongly support the monophyly of Cromeria and Grasseichthys, and resolve a clade containing Parakneria, Cromeria, and Grasseichthys. The morphological dataset, which includes 13 gonorynchiform fossil taxa that range in age from Early Cretaceous to Eocene, was analyzed in combination with DNA sequences from the nine nuclear genes and a relaxed molecular clock to estimate times of evolutionary divergence. This "tip dating" strategy accommodates uncertainty in the phylogenetic resolution of fossil taxa that provide calibration information in the relaxed molecular clock analysis. The estimated age of the most recent common ancestor (MRCA) of living gonorynchiforms is slightly older than estimates from previous node dating efforts, but the molecular tip dating

  11. Information theory and signal transduction systems: from molecular information processing to network inference.

    Science.gov (United States)

    Mc Mahon, Siobhan S; Sim, Aaron; Filippi, Sarah; Johnson, Robert; Liepe, Juliane; Smith, Dominic; Stumpf, Michael P H

    2014-11-01

    Sensing and responding to the environment are two essential functions that all biological organisms need to master for survival and successful reproduction. Developmental processes are marshalled by a diverse set of signalling and control systems, ranging from systems with simple chemical inputs and outputs to complex molecular and cellular networks with non-linear dynamics. Information theory provides a powerful and convenient framework in which such systems can be studied; but it also provides the means to reconstruct the structure and dynamics of molecular interaction networks underlying physiological and developmental processes. Here we supply a brief description of its basic concepts and introduce some useful tools for systems and developmental biologists. Along with a brief but thorough theoretical primer, we demonstrate the wide applicability and biological application-specific nuances by way of different illustrative vignettes. In particular, we focus on the characterisation of biological information processing efficiency, examining cell-fate decision making processes, gene regulatory network reconstruction, and efficient signal transduction experimental design. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Evolving chromosomes and gene regulatory networks

    Indian Academy of Sciences (India)

    Aswin

    Genes under H NS control can be. (a) regulated by H NS. (b) regulated by H NS and StpA. Because backup by StpA is partial. Page 19. Gene expression level. H NS regulated xenogenes. Other genes. Page 20 ... recollect: H&NS silences highl transcribable genes. Gene expression level unilateral. Other genes epistatic ...

  13. Gene repertoire evolution of Streptococcus pyogenes inferred from phylogenomic analysis with Streptococcus canis and Streptococcus dysgalactiae.

    Directory of Open Access Journals (Sweden)

    Tristan Lefébure

    Full Text Available Streptococcus pyogenes, is an important human pathogen classified within the pyogenic group of streptococci, exclusively adapted to the human host. Our goal was to employ a comparative evolutionary approach to better understand the genomic events concomitant with S. pyogenes human adaptation. As part of ascertaining these events, we sequenced the genome of one of the potential sister species, the agricultural pathogen S. canis, and combined it in a comparative genomics reconciliation analysis with two other closely related species, Streptococcus dysgalactiae and Streptococcus equi, to determine the genes that were gained and lost during S. pyogenes evolution. Genome wide phylogenetic analyses involving 15 Streptococcus species provided convincing support for a clade of S. equi, S. pyogenes, S. dysgalactiae, and S. canis and suggested that the most likely S. pyogenes sister species was S. dysgalactiae. The reconciliation analysis identified 113 genes that were gained on the lineage leading to S. pyogenes. Almost half (46% of these gained genes were phage associated and 14 showed significant matches to experimentally verified bacteria virulence factors. Subsequent to the origin of S. pyogenes, over half of the phage associated genes were involved in 90 different LGT events, mostly involving different strains of S. pyogenes, but with a high proportion involving the horse specific pathogen S. equi subsp. equi, with the directionality almost exclusively (86% in the S. pyogenes to S. equi direction. Streptococcus agalactiae appears to have played an important role in the evolution of S. pyogenes with a high proportion of LGTs originating from this species. Overall the analysis suggests that S. pyogenes adaptation to the human host was achieved in part by (i the integration of new virulence factors (e.g. speB, and the sal locus and (ii the construction of new regulation networks (e.g. rgg, and to some extent speB.

  14. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    Directory of Open Access Journals (Sweden)

    Hero Alfred

    2010-11-01

    Full Text Available Abstract Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP, the Indian Buffet Process (IBP, and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV, Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD, closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  15. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

    Science.gov (United States)

    Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence

    2010-11-09

    Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  16. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis

    NARCIS (Netherlands)

    Dam, van J.C.J.; Schaap, P.J.; Martins dos Santos, V.A.P.; Suarez Diez, M.

    2014-01-01

    Background: Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each

  17. Dynamical Bayesian inference of time-evolving interactions: From a pair of coupled oscillators to networks of oscillators

    Science.gov (United States)

    Duggento, Andrea; Stankovski, Tomislav; McClintock, Peter V. E.; Stefanovska, Aneta

    2012-12-01

    Living systems have time-evolving interactions that, until recently, could not be identified accurately from recorded time series in the presence of noise. Stankovski [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.109.024101 109, 024101 (2012)] introduced a method based on dynamical Bayesian inference that facilitates the simultaneous detection of time-varying synchronization, directionality of influence, and coupling functions. It can distinguish unsynchronized dynamics from noise-induced phase slips. The method is based on phase dynamics, with Bayesian inference of the time-evolving parameters being achieved by shaping the prior densities to incorporate knowledge of previous samples. We now present the method in detail using numerically generated data, data from an analog electronic circuit, and cardiorespiratory data. We also generalize the method to encompass networks of interacting oscillators and thus demonstrate its applicability to small-scale networks.

  18. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve

  19. [Phylogeny of protostome moulting animals (Ecdysozoa) inferred from 18 and 28S rRNA gene sequences].

    Science.gov (United States)

    Petrov, N B; Vladychenskaia, N S

    2005-01-01

    Reliability of reconstruction of phylogenetic relationships within a group of protostome moulting animals was evaluated by means of comparison of 18 and 28S rRNA gene sequences sets both taken separately and combined. Reliability of reconstructions was evaluated by values of the bootstrap support of major phylogenetic tree nodes and by degree of congruence of phylogenetic trees inferred by various methods. By both criteria, phylogenetic trees reconstructed from the combined 18 and 28S rRNA gene sequences were better than those inferred from 18 and 28S sequences taken separately. Results obtained are consistent with phylogenetic hypothesis separating protostome animals into two major clades, moulting Ecdysozoa (Priapulida + Kinorhyncha, Nematoda + Nematomorpha, Onychophora + Tardigrada, Myriapoda + Chelicerata, Crustacea + Hexapoda) and unmoulting Lophotrochozoa (Plathelminthes, Nemertini, Annelida, Mollusca, Echiura, Sipuncula). Clade Cephalorhyncha does not include nematomorphs (Nematomorpha). Conclusion was taken that it is necessary to use combined 18 and 28S data in phylogenetic studies.

  20. Designing Dietary Recommendations Using System Level Interactomics Analysis and Network-Based Inference

    Directory of Open Access Journals (Sweden)

    Tingting Zheng

    2017-09-01

    Full Text Available Background: A range of computational methods that rely on the analysis of genome-wide expression datasets have been developed and successfully used for drug repositioning. The success of these methods is based on the hypothesis that introducing a factor (in this case, a drug molecule that could reverse the disease gene expression signature will lead to a therapeutic effect. However, it has also been shown that globally reversing the disease expression signature is not a prerequisite for drug activity. On the other hand, the basic idea of significant anti-correlation in expression profiles could have great value for establishing diet-disease associations and could provide new insights into the role of dietary interventions in disease.Methods: We performed an integrated analysis of publicly available gene expression profiles for foods, diseases and drugs, by calculating pairwise similarity scores for diet and disease gene expression signatures and characterizing their topological features in protein-protein interaction networks.Results: We identified 485 diet-disease pairs where diet could positively influence disease development and 472 pairs where specific diets should be avoided in a disease state. Multiple evidence suggests that orange, whey and coconut fat could be beneficial for psoriasis, lung adenocarcinoma and macular degeneration, respectively. On the other hand, fructose-rich diet should be restricted in patients with chronic intermittent hypoxia and ovarian cancer. Since humans normally do not consume foods in isolation, we also applied different algorithms to predict synergism; as a result, 58 food pairs were predicted. Interestingly, the diets identified as anti-correlated with diseases showed a topological proximity to the disease proteins similar to that of the corresponding drugs.Conclusions: In conclusion, we provide a computational framework for establishing diet-disease associations and additional information on the role of

  1. Inferring the effective TOR-dependent network: a computational study in yeast.

    Science.gov (United States)

    Mohammadi, Shahin; Subramaniam, Shankar; Grama, Ananth

    2013-08-30

    Calorie restriction (CR) is one of the most conserved non-genetic interventions that extends healthspan in evolutionarily distant species, ranging from yeast to mammals. The target of rapamycin (TOR) has been shown to play a key role in mediating healthspan extension in response to CR by integrating different signals that monitor nutrient-availability and orchestrating various components of cellular machinery in response. Both genetic and pharmacological interventions that inhibit the TOR pathway exhibit a similar phenotype, which is not further amplified by CR. In this paper, we present the first comprehensive, computationally derived map of TOR downstream effectors, with the objective of discovering key lifespan mediators, their crosstalk, and high-level organization. We adopt a systematic approach for tracing information flow from the TOR complex and use it to identify relevant signaling elements. By constructing a high-level functional map of TOR downstream effectors, we show that our approach is not only capable of recapturing previously known pathways, but also suggests potential targets for future studies.Information flow scores provide an aggregate ranking of relevance of proteins with respect to the TOR signaling pathway. These rankings must be normalized for degree bias, appropriately interpreted, and mapped to associated roles in pathways. We propose a novel statistical framework for integrating information flow scores, the set of differentially expressed genes in response to rapamycin treatment, and the transcriptional regulatory network. We use this framework to identify the most relevant transcription factors in mediating the observed transcriptional response, and to construct the effective response network of the TOR pathway. This network is hypothesized to mediate life-span extension in response to TOR inhibition. Our approach, unlike experimental methods, is not limited to specific aspects of cellular response. Rather, it predicts transcriptional

  2. Information Warfare-Worthy Jamming Attack Detection Mechanism for Wireless Sensor Networks Using a Fuzzy Inference System

    Directory of Open Access Journals (Sweden)

    Sudip Misra

    2010-04-01

    Full Text Available The proposed mechanism for jamming attack detection for wireless sensor networks is novel in three respects: firstly, it upgrades the jammer to include versatile military jammers; secondly, it graduates from the existing node-centric detection system to the network-centric system making it robust and economical at the nodes, and thirdly, it tackles the problem through fuzzy inference system, as the decision regarding intensity of jamming is seldom crisp. The system with its high robustness, ability to grade nodes with jamming indices, and its true-detection rate as high as 99.8%, is worthy of consideration for information warfare defense purposes.

  3. Reveal genes functionally associated with ACADS by a network study.

    Science.gov (United States)

    Chen, Yulong; Su, Zhiguang

    2015-09-15

    Establishing a systematic network is aimed at finding essential human gene-gene/gene-disease pathway by means of network inter-connecting patterns and functional annotation analysis. In the present study, we have analyzed functional gene interactions of short-chain acyl-coenzyme A dehydrogenase gene (ACADS). ACADS plays a vital role in free fatty acid β-oxidation and regulates energy homeostasis. Modules of highly inter-connected genes in disease-specific ACADS network are derived by integrating gene function and protein interaction data. Among the 8 genes in ACADS web retrieved from both STRING and GeneMANIA, ACADS is effectively conjoined with 4 genes including HAHDA, HADHB, ECHS1 and ACAT1. The functional analysis is done via ontological briefing and candidate disease identification. We observed that the highly efficient-interlinked genes connected with ACADS are HAHDA, HADHB, ECHS1 and ACAT1. Interestingly, the ontological aspect of genes in the ACADS network reveals that ACADS, HAHDA and HADHB play equally vital roles in fatty acid metabolism. The gene ACAT1 together with ACADS indulges in ketone metabolism. Our computational gene web analysis also predicts potential candidate disease recognition, thus indicating the involvement of ACADS, HAHDA, HADHB, ECHS1 and ACAT1 not only with lipid metabolism but also with infant death syndrome, skeletal myopathy, acute hepatic encephalopathy, Reye-like syndrome, episodic ketosis, and metabolic acidosis. The current study presents a comprehensible layout of ACADS network, its functional strategies and candidate disease approach associated with ACADS network. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    Science.gov (United States)

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  5. Performance of single and concatenated sets of mitochondrial genes at inferring metazoan relationships relative to full mitogenome data.

    Directory of Open Access Journals (Sweden)

    Justin C Havird

    Full Text Available Mitochondrial (mt genes are some of the most popular and widely-utilized genetic loci in phylogenetic studies of metazoan taxa. However, their linked nature has raised questions on whether using the entire mitogenome for phylogenetics is overkill (at best or pseudoreplication (at worst. Moreover, no studies have addressed the comparative phylogenetic utility of mitochondrial genes across individual lineages within the entire Metazoa. To comment on the phylogenetic utility of individual mt genes as well as concatenated subsets of genes, we analyzed mitogenomic data from 1865 metazoan taxa in 372 separate lineages spanning genera to subphyla. Specifically, phylogenies inferred from these datasets were statistically compared to ones generated from all 13 mt protein-coding (PC genes (i.e., the "supergene" set to determine which single genes performed "best" at, and the minimum number of genes required to, recover the "supergene" topology. Surprisingly, the popular marker COX1 performed poorest, while ND5, ND4, and ND2 were most likely to reproduce the "supergene" topology. Averaged across all lineages, the longest ∼2 mt PC genes were sufficient to recreate the "supergene" topology, although this average increased to ∼5 genes for datasets with 40 or more taxa. Furthermore, concatenation of the three "best" performing mt PC genes outperformed that of the three longest mt PC genes (i.e, ND5, COX1, and ND4. Taken together, while not all mt PC genes are equally interchangeable in phylogenetic studies of the metazoans, some subset can serve as a proxy for the 13 mt PC genes. However, the exact number and identity of these genes is specific to the lineage in question and cannot be applied indiscriminately across the Metazoa.

  6. Inferring Weighted Directed Association Networks from Multivariate Time Series with the Small-Shuffle Symbolic Transfer Entropy Spectrum Method

    Directory of Open Access Journals (Sweden)

    Yanzhu Hu

    2016-09-01

    Full Text Available Complex network methodology is very useful for complex system exploration. However, the relationships among variables in complex systems are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a method, named small-shuffle symbolic transfer entropy spectrum (SSSTES, for inferring association networks from multivariate time series. The method can solve four problems for inferring association networks, i.e., strong correlation identification, correlation quantification, direction identification and temporal relation identification. The method can be divided into four layers. The first layer is the so-called data layer. Data input and processing are the things to do in this layer. In the second layer, we symbolize the model data, original data and shuffled data, from the previous layer and calculate circularly transfer entropy with different time lags for each pair of time series variables. Thirdly, we compose transfer entropy spectrums for pairwise time series with the previous layer’s output, a list of transfer entropy matrix. We also identify the correlation level between variables in this layer. In the last layer, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pairwise variables, and then get the weighted directed association network. Three sets of numerical simulated data from a linear system, a nonlinear system and a coupled Rossler system are used to show how the proposed approach works. Finally, we apply SSSTES to a real industrial system and get a better result than with two other methods.

  7. Inferred vs Realized Patterns of Gene Flow: An Analysis of Population Structure in the Andros Island Rock Iguana

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R.; Wallace, Lisa E.; Welch, Mark E.

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes. PMID:25229344

  8. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Science.gov (United States)

    Colosimo, Giuliano; Knapp, Charles R; Wallace, Lisa E; Welch, Mark E

    2014-01-01

    Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA) indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<0.01). These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  9. Inferred vs realized patterns of gene flow: an analysis of population structure in the Andros Island Rock Iguana.

    Directory of Open Access Journals (Sweden)

    Giuliano Colosimo

    Full Text Available Ecological data, the primary source of information on patterns and rates of migration, can be integrated with genetic data to more accurately describe the realized connectivity between geographically isolated demes. In this paper we implement this approach and discuss its implications for managing populations of the endangered Andros Island Rock Iguana, Cyclura cychlura cychlura. This iguana is endemic to Andros, a highly fragmented landmass of large islands and smaller cays. Field observations suggest that geographically isolated demes were panmictic due to high, inferred rates of gene flow. We expand on these observations using 16 polymorphic microsatellites to investigate the genetic structure and rates of gene flow from 188 Andros Iguanas collected across 23 island sites. Bayesian clustering of specimens assigned individuals to three distinct genotypic clusters. An analysis of molecular variance (AMOVA indicates that allele frequency differences are responsible for a significant portion of the genetic variance across the three defined clusters (Fst =  0.117, p<<0.01. These clusters are associated with larger islands and satellite cays isolated by broad water channels with strong currents. These findings imply that broad water channels present greater obstacles to gene flow than was inferred from field observation alone. Additionally, rates of gene flow were indirectly estimated using BAYESASS 3.0. The proportion of individuals originating from within each identified cluster varied from 94.5 to 98.7%, providing further support for local isolation. Our assessment reveals a major disparity between inferred and realized gene flow. We discuss our results in a conservation perspective for species inhabiting highly fragmented landscapes.

  10. A new and accurate fault location algorithm for combined transmission lines using Adaptive Network-Based Fuzzy Inference System

    Energy Technology Data Exchange (ETDEWEB)

    Sadeh, Javad; Afradi, Hamid [Electrical Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, P.O. Box: 91775-1111, Mashhad (Iran)

    2009-11-15

    This paper presents a new and accurate algorithm for locating faults in a combined overhead transmission line with underground power cable using Adaptive Network-Based Fuzzy Inference System (ANFIS). The proposed method uses 10 ANFIS networks and consists of 3 stages, including fault type classification, faulty section detection and exact fault location. In the first part, an ANFIS is used to determine the fault type, applying four inputs, i.e., fundamental component of three phase currents and zero sequence current. Another ANFIS network is used to detect the faulty section, whether the fault is on the overhead line or on the underground cable. Other eight ANFIS networks are utilized to pinpoint the faults (two for each fault type). Four inputs, i.e., the dc component of the current, fundamental frequency of the voltage and current and the angle between them, are used to train the neuro-fuzzy inference systems in order to accurately locate the faults on each part of the combined line. The proposed method is evaluated under different fault conditions such as different fault locations, different fault inception angles and different fault resistances. Simulation results confirm that the proposed method can be used as an efficient means for accurate fault location on the combined transmission lines. (author)

  11. A network of genes, genetic disorders, and brain areas.

    Directory of Open Access Journals (Sweden)

    Satoru Hayasaka

    Full Text Available The network-based approach has been used to describe the relationship among genes and various phenotypes, producing a network describing complex biological relationships. Such networks can be constructed by aggregating previously reported associations in the literature from various databases. In this work, we applied the network-based approach to investigate how different brain areas are associated to genetic disorders and genes. In particular, a tripartite network with genes, genetic diseases, and brain areas was constructed based on the associations among them reported in the literature through text mining. In the resulting network, a disproportionately large number of gene-disease and disease-brain associations were attributed to a small subset of genes, diseases, and brain areas. Furthermore, a small number of brain areas were found to be associated with a large number of the same genes and diseases. These core brain regions encompassed the areas identified by the previous genome-wide association studies, and suggest potential areas of focus in the future imaging genetics research. The approach outlined in this work demonstrates the utility of the network-based approach in studying genetic effects on the brain.

  12. A flood-based information flow analysis and network minimization method for gene regulatory networks.

    Science.gov (United States)

    Pavlogiannis, Andreas; Mozhayskiy, Vadim; Tagkopoulos, Ilias

    2013-04-24

    Biological networks tend to have high interconnectivity, complex topologies and multiple types of interactions. This renders difficult the identification of sub-networks that are involved in condition- specific responses. In addition, we generally lack scalable methods that can reveal the information flow in gene regulatory and biochemical pathways. Doing so will help us to identify key participants and paths under specific environmental and cellular context. This paper introduces the theory of network flooding, which aims to address the problem of network minimization and regulatory information flow in gene regulatory networks. Given a regulatory biological network, a set of source (input) nodes and optionally a set of sink (output) nodes, our task is to find (a) the minimal sub-network that encodes the regulatory program involving all input and output nodes and (b) the information flow from the source to the sink nodes of the network. Here, we describe a novel, scalable, network traversal algorithm and we assess its potential to achieve significant network size reduction in both synthetic and E. coli networks. Scalability and sensitivity analysis show that the proposed method scales well with the size of the network, and is robust to noise and missing data. The method of network flooding proves to be a useful, practical approach towards information flow analysis in gene regulatory networks. Further extension of the proposed theory has the potential to lead in a unifying framework for the simultaneous network minimization and information flow analysis across various "omics" levels.

  13. Complex network inference from P300 signals: Decoding brain state under visual stimulus for able-bodied and disabled subjects

    Science.gov (United States)

    Gao, Zhong-Ke; Cai, Qing; Dong, Na; Zhang, Shan-Shan; Bo, Yun; Zhang, Jie

    2016-10-01

    Distinguishing brain cognitive behavior underlying disabled and able-bodied subjects constitutes a challenging problem of significant importance. Complex network has established itself as a powerful tool for exploring functional brain networks, which sheds light on the inner workings of the human brain. Most existing works in constructing brain network focus on phase-synchronization measures between regional neural activities. In contrast, we propose a novel approach for inferring functional networks from P300 event-related potentials by integrating time and frequency domain information extracted from each channel signal, which we show to be efficient in subsequent pattern recognition. In particular, we construct brain network by regarding each channel signal as a node and determining the edges in terms of correlation of the extracted feature vectors. A six-choice P300 paradigm with six different images is used in testing our new approach, involving one able-bodied subject and three disabled subjects suffering from multiple sclerosis, cerebral palsy, traumatic brain and spinal-cord injury, respectively. We then exploit global efficiency, local efficiency and small-world indices from the derived brain networks to assess the network topological structure associated with different target images. The findings suggest that our method allows identifying brain cognitive behaviors related to visual stimulus between able-bodied and disabled subjects.

  14. An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Browne Fiona

    2006-12-01

    Full Text Available Protein-protein interactions (PPI play a key role in many biological systems. Over the past few years, an explosion in availability of functional biological data obtained from high-throughput technologies to infer PPI has been observed. However, results obtained from such experiments show high rates of false positives and false negatives predictions as well as systematic predictive bias. Recent research has revealed that several machine and statistical learning methods applied to integrate relatively weak, diverse sources of large-scale functional data may provide improved predictive accuracy and coverage of PPI. In this paper we describe the effects of applying different computational, integrative methods to predict PPI in Saccharomyces cerevisiae. We investigated the predictive ability of combining different sets of relatively strong and weak predictive datasets. We analysed several genomic datasets ranging from mRNA co-expression to marginal essentiality. Moreover, we expanded an existing multi-source dataset from S. cerevisiae by constructing a new set of putative interactions extracted from Gene Ontology (GO- driven annotations in the Saccharomyces Genome Database. Different classification techniques: Simple Naive Bayesian (SNB, Multilayer Perceptron (MLP and K-Nearest Neighbors (KNN were evaluated. Relatively simple classification methods (i.e. less computing intensive and mathematically complex, such as SNB, have been proven to be proficient at predicting PPI. SNB produced the “highest” predictive quality obtaining an area under Receiver Operating Characteristic (ROC curve (AUC value of 0.99. The lowest AUC value of 0.90 was obtained by the KNN classifier. This assessment also demonstrates the strong predictive power of GO-driven models, which offered predictive performance above 0.90 using the different machine learning and statistical techniques. As the predictive power of single-source datasets became weaker MLP and SNB performed

  15. Inference of sigma factor controlled networks by using numerical modeling applied to microarray time series data of the germinating prokaryote.

    Science.gov (United States)

    Strakova, Eva; Zikova, Alice; Vohradsky, Jiri

    2014-01-01

    A computational model of gene expression was applied to a novel test set of microarray time series measurements to reveal regulatory interactions between transcriptional regulators represented by 45 sigma factors and the genes expressed during germination of a prokaryote Streptomyces coelicolor. Using microarrays, the first 5.5 h of the process was recorded in 13 time points, which provided a database of gene expression time series on genome-wide scale. The computational modeling of the kinetic relations between the sigma factors, individual genes and genes clustered according to the similarity of their expression kinetics identified kinetically plausible sigma factor-controlled networks. Using genome sequence annotations, functional groups of genes that were predominantly controlled by specific sigma factors were identified. Using external binding data complementing the modeling approach, specific genes involved in the control of the studied process were identified and their function suggested.

  16. Disease candidate gene identification and prioritization using protein interaction networks

    Directory of Open Access Journals (Sweden)

    Aronow Bruce J

    2009-02-01

    Full Text Available Abstract Background Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN analyses. Results For the first time, extended versions of the PageRank and HITS algorithms, and the K-Step Markov method are applied to prioritize disease candidate genes in a training-test schema. Using a list of known disease-related genes from our earlier study as a training set ("seeds", and the rest of the known genes as a test list, we perform large-scale cross validation to rank the candidate genes and also evaluate and compare the performance of our approach. Under appropriate settings – for example, a back probability of 0.3 for PageRank with Priors and HITS with Priors, and step size 6 for K-Step Markov method – the three methods achieved a comparable AUC value, suggesting a similar performance. Conclusion Even though network-based methods are generally not as effective as integrated functional annotation-based methods for disease candidate gene prioritization, in a one-to-one comparison, PPIN-based candidate gene prioritization performs better than all other gene features or annotations. Additionally, we demonstrate that methods used for studying both social and Web networks can be successfully used for disease candidate gene prioritization.

  17. The vertebrate Hox gene regulatory network for hindbrain segmentation: Evolution and diversification: Coupling of a Hox gene regulatory network to hindbrain segmentation is an ancient trait originating at the base of vertebrates.

    Science.gov (United States)

    Parker, Hugo J; Bronner, Marianne E; Krumlauf, Robb

    2016-06-01

    Hindbrain development is orchestrated by a vertebrate gene regulatory network that generates segmental patterning along the anterior-posterior axis via Hox genes. Here, we review analyses of vertebrate and invertebrate chordate models that inform upon the evolutionary origin and diversification of this network. Evidence from the sea lamprey reveals that the hindbrain regulatory network generates rhombomeric compartments with segmental Hox expression and an underlying Hox code. We infer that this basal feature was present in ancestral vertebrates and, as an evolutionarily constrained developmental state, is fundamentally important for patterning of the vertebrate hindbrain across diverse lineages. Despite the common ground plan, vertebrates exhibit neuroanatomical diversity in lineage-specific patterns, with different vertebrates revealing variations of Hox expression in the hindbrain that could underlie this diversification. Invertebrate chordates lack hindbrain segmentation but exhibit some conserved aspects of this network, with retinoic acid signaling playing a role in establishing nested domains of Hox expression. © 2016 WILEY Periodicals, Inc.

  18. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  19. Identification of driving network of cellular differentiation from single sample time course gene expression data

    Science.gov (United States)

    Chen, Ye; Wolanyk, Nathaniel; Ilker, Tunc; Gao, Shouguo; Wang, Xujing

    Methods developed based on bifurcation theory have demonstrated their potential in driving network identification for complex human diseases, including the work by Chen, et al. Recently bifurcation theory has been successfully applied to model cellular differentiation. However, there one often faces a technical challenge in driving network prediction: time course cellular differentiation study often only contains one sample at each time point, while driving network prediction typically require multiple samples at each time point to infer the variation and interaction structures of candidate genes for the driving network. In this study, we investigate several methods to identify both the critical time point and the driving network through examination of how each time point affects the autocorrelation and phase locking. We apply these methods to a high-throughput sequencing (RNA-Seq) dataset of 42 subsets of thymocytes and mature peripheral T cells at multiple time points during their differentiation (GSE48138 from GEO). We compare the predicted driving genes with known transcription regulators of cellular differentiation. We will discuss the advantages and limitations of our proposed methods, as well as potential further improvements of our methods.

  20. Network Diffusion-Based Prioritization of Autism Risk Genes Identifies Significantly Connected Gene Modules

    Directory of Open Access Journals (Sweden)

    Ettore Mosca

    2017-09-01

    Full Text Available Autism spectrum disorder (ASD is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.

  1. The evolutionary history of ferns inferred from 25 low-copy nuclear genes.

    Science.gov (United States)

    Rothfels, Carl J; Li, Fay-Wei; Sigel, Erin M; Huiet, Layne; Larsson, Anders; Burge, Dylan O; Ruhsam, Markus; Deyholos, Michael; Soltis, Douglas E; Stewart, C Neal; Shaw, Shane W; Pokorny, Lisa; Chen, Tao; dePamphilis, Claude; DeGironimo, Lisa; Chen, Li; Wei, Xiaofeng; Sun, Xiao; Korall, Petra; Stevenson, Dennis W; Graham, Sean W; Wong, Gane K-S; Pryer, Kathleen M

    2015-07-01

    • Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies. © 2015 Botanical Society of America, Inc.

  2. A network approach to predict pathogenic genes for Fusarium graminearum.

    Science.gov (United States)

    Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

    2010-10-04

    Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which

  3. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging ... two types of methods differ primarily based on whether ..... negligible, allowing us to draw the qualitative conclusions .... research will be conducted to develop additional biologically.

  4. Functional modules by relating protein interaction networks and gene expression.

    Science.gov (United States)

    Tornow, Sabine; Mewes, H W

    2003-11-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.

  5. Gene networks underlying convergent and pleiotropic phenotypes in a large and systematically-phenotyped cohort with heterogeneous developmental disorders.

    Science.gov (United States)

    Andrews, Tallulah; Meader, Stephen; Vulto-van Silfhout, Anneke; Taylor, Avigail; Steinberg, Julia; Hehir-Kwa, Jayne; Pfundt, Rolph; de Leeuw, Nicole; de Vries, Bert B A; Webber, Caleb

    2015-03-01

    Readily-accessible and standardised capture of genotypic variation has revolutionised our understanding of the genetic contribution to disease. Unfortunately, the corresponding systematic capture of patient phenotypic variation needed to fully interpret the impact of genetic variation has lagged far behind. Exploiting deep and systematic phenotyping of a cohort of 197 patients presenting with heterogeneous developmental disorders and whose genomes harbour de novo CNVs, we systematically applied a range of commonly-used functional genomics approaches to identify the underlying molecular perturbations and their phenotypic impact. Grouping patients into 408 non-exclusive patient-phenotype groups, we identified a functional association amongst the genes disrupted in 209 (51%) groups. We find evidence for a significant number of molecular interactions amongst the association-contributing genes, including a single highly-interconnected network disrupted in 20% of patients with intellectual disability, and show using microcephaly how these molecular networks can be used as baits to identify additional members whose genes are variant in other patients with the same phenotype. Exploiting the systematic phenotyping of this cohort, we observe phenotypic concordance amongst patients whose variant genes contribute to the same functional association but note that (i) this relationship shows significant variation across the different approaches used to infer a commonly perturbed molecular pathway, and (ii) that the phenotypic similarities detected amongst patients who share the same inferred pathway perturbation result from these patients sharing many distinct phenotypes, rather than sharing a more specific phenotype, inferring that these pathways are best characterized by their pleiotropic effects.

  6. On the dynamics of a gene regulatory network

    International Nuclear Information System (INIS)

    Grammaticos, B; Carstea, A S; Ramani, A

    2006-01-01

    We examine the dynamics of a network of genes focusing on a periodic chain of genes, of arbitrary length. We show that within a given class of sigmoids representing the equilibrium probability of the binding of the RNA polymerase to the core promoter, the system possesses a single stable fixed point. By slightly modifying the sigmoid, introducing 'stiffer' forms, we show that it is possible to find network configurations exhibiting bistable behaviour. Our results do not depend crucially on the length of the chain considered: calculations with finite chains lead to similar results. However, a realistic study of regulatory genetic networks would require the consideration of more complex topologies and interactions

  7. Linear control theory for gene network modeling.

    Science.gov (United States)

    Shin, Yong-Jun; Bleris, Leonidas

    2010-09-16

    Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain) and linear state-space (time domain) can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.

  8. System Biology Approach: Gene Network Analysis for Muscular Dystrophy.

    Science.gov (United States)

    Censi, Federica; Calcagnini, Giovanni; Mattei, Eugenio; Giuliani, Alessandro

    2018-01-01

    Phenotypic changes at different organization levels from cell to entire organism are associated to changes in the pattern of gene expression. These changes involve the entire genome expression pattern and heavily rely upon correlation patterns among genes. The classical approach used to analyze gene expression data builds upon the application of supervised statistical techniques to detect genes differentially expressed among two or more phenotypes (e.g., normal vs. disease). The use of an a posteriori, unsupervised approach based on principal component analysis (PCA) and the subsequent construction of gene correlation networks can shed a light on unexpected behaviour of gene regulation system while maintaining a more naturalistic view on the studied system.In this chapter we applied an unsupervised method to discriminate DMD patient and controls. The genes having the highest absolute scores in the discrimination between the groups were then analyzed in terms of gene expression networks, on the basis of their mutual correlation in the two groups. The correlation network structures suggest two different modes of gene regulation in the two groups, reminiscent of important aspects of DMD pathogenesis.

  9. SELANSI: a toolbox for simulation of stochastic gene regulatory networks.

    Science.gov (United States)

    Pájaro, Manuel; Otero-Muras, Irene; Vázquez, Carlos; Alonso, Antonio A

    2018-03-01

    Gene regulation is inherently stochastic. In many applications concerning Systems and Synthetic Biology such as the reverse engineering and the de novo design of genetic circuits, stochastic effects (yet potentially crucial) are often neglected due to the high computational cost of stochastic simulations. With advances in these fields there is an increasing need of tools providing accurate approximations of the stochastic dynamics of gene regulatory networks (GRNs) with reduced computational effort. This work presents SELANSI (SEmi-LAgrangian SImulation of GRNs), a software toolbox for the simulation of stochastic multidimensional gene regulatory networks. SELANSI exploits intrinsic structural properties of gene regulatory networks to accurately approximate the corresponding Chemical Master Equation with a partial integral differential equation that is solved by a semi-lagrangian method with high efficiency. Networks under consideration might involve multiple genes with self and cross regulations, in which genes can be regulated by different transcription factors. Moreover, the validity of the method is not restricted to a particular type of kinetics. The tool offers total flexibility regarding network topology, kinetics and parameterization, as well as simulation options. SELANSI runs under the MATLAB environment, and is available under GPLv3 license at https://sites.google.com/view/selansi. antonio@iim.csic.es. © The Author(s) 2017. Published by Oxford University Press.

  10. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases

    Directory of Open Access Journals (Sweden)

    Ma'ayan Avi

    2007-10-01

    Full Text Available Abstract Background In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP, generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Results Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Conclusion Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.

  11. A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection.

    Science.gov (United States)

    Carey, Michelle; Ramírez, Juan Camilo; Wu, Shuang; Wu, Hulin

    2018-07-01

    A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.

  12. From Gene Trees to a Dated Allopolyploid Network: Insights from the Angiosperm Genus Viola (Violaceae)

    Science.gov (United States)

    Marcussen, Thomas; Heier, Lise; Brysting, Anne K.; Oxelman, Bengt; Jakobsen, Kjetill S.

    2015-01-01

    Allopolyploidization accounts for a significant fraction of speciation events in many eukaryotic lineages. However, existing phylogenetic and dating methods require tree-like topologies and are unable to handle the network-like phylogenetic relationships of lineages containing allopolyploids. No explicit framework has so far been established for evaluating competing network topologies, and few attempts have been made to date phylogenetic networks. We used a four-step approach to generate a dated polyploid species network for the cosmopolitan angiosperm genus Viola L. (Violaceae Batch.). The genus contains ca 600 species and both recent (neo-) and more ancient (meso-) polyploid lineages distributed over 16 sections. First, we obtained DNA sequences of three low-copy nuclear genes and one chloroplast region, from 42 species representing all 16 sections. Second, we obtained fossil-calibrated chronograms for each nuclear gene marker. Third, we determined the most parsimonious multilabeled genome tree and its corresponding network, resolved at the section (not the species) level. Reconstructing the “correct” network for a set of polyploids depends on recovering all homoeologs, i.e., all subgenomes, in these polyploids. Assuming the presence of Viola subgenome lineages that were not detected by the nuclear gene phylogenies (“ghost subgenome lineages”) significantly reduced the number of inferred polyploidization events. We identified the most parsimonious network topology from a set of five competing scenarios differing in the interpretation of homoeolog extinctions and lineage sorting, based on (i) fewest possible ghost subgenome lineages, (ii) fewest possible polyploidization events, and (iii) least possible deviation from expected ploidy as inferred from available chromosome counts of the involved polyploid taxa. Finally, we estimated the homoploid and polyploid speciation times of the most parsimonious network. Homoploid speciation times were estimated by

  13. A gene network bioinformatics analysis for pemphigoid autoimmune blistering diseases.

    Science.gov (United States)

    Barone, Antonio; Toti, Paolo; Giuca, Maria Rita; Derchi, Giacomo; Covani, Ugo

    2015-07-01

    In this theoretical study, a text mining search and clustering analysis of data related to genes potentially involved in human pemphigoid autoimmune blistering diseases (PAIBD) was performed using web tools to create a gene/protein interaction network. The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database was employed to identify a final set of PAIBD-involved genes and to calculate the overall significant interactions among genes: for each gene, the weighted number of links, or WNL, was registered and a clustering procedure was performed using the WNL analysis. Genes were ranked in class (leader, B, C, D and so on, up to orphans). An ontological analysis was performed for the set of 'leader' genes. Using the above-mentioned data network, 115 genes represented the final set; leader genes numbered 7 (intercellular adhesion molecule 1 (ICAM-1), interferon gamma (IFNG), interleukin (IL)-2, IL-4, IL-6, IL-8 and tumour necrosis factor (TNF)), class B genes were 13, whereas the orphans were 24. The ontological analysis attested that the molecular action was focused on extracellular space and cell surface, whereas the activation and regulation of the immunity system was widely involved. Despite the limited knowledge of the present pathologic phenomenon, attested by the presence of 24 genes revealing no protein-protein direct or indirect interactions, the network showed significant pathways gathered in several subgroups: cellular components, molecular functions, biological processes and the pathologic phenomenon obtained from the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database. The molecular basis for PAIBD was summarised and expanded, which will perhaps give researchers promising directions for the identification of new therapeutic targets.

  14. Modeling of a 5-cell direct methanol fuel cell using adaptive-network-based fuzzy inference systems

    Science.gov (United States)

    Wang, Rongrong; Qi, Liang; Xie, Xiaofeng; Ding, Qingqing; Li, Chunwen; Ma, ChenChi M.

    The methanol concentrations, temperature and current were considered as inputs, the cell voltage was taken as output, and the performance of a direct methanol fuel cell (DMFC) was modeled by adaptive-network-based fuzzy inference systems (ANFIS). The artificial neural network (ANN) and polynomial-based models were selected to be compared with the ANFIS in respect of quality and accuracy. Based on the ANFIS model obtained, the characteristics of the DMFC were studied. The results show that temperature and methanol concentration greatly affect the performance of the DMFC. Within a restricted current range, the methanol concentration does not greatly affect the stack voltage. In order to obtain higher fuel utilization efficiency, the methanol concentrations and temperatures should be adjusted according to the load on the system.

  15. Forecasting building energy consumption with hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system

    Energy Technology Data Exchange (ETDEWEB)

    Li, Kangji [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China); School of Electricity Information Engineering, Jiangsu University, Zhenjiang 212013 (China); Su, Hongye [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China)

    2010-11-15

    There are several ways to forecast building energy consumption, varying from simple regression to models based on physical principles. In this paper, a new method, namely, the hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system (GA-HANFIS) model is developed. In this model, hierarchical structure decreases the rule base dimension. Both clustering and rule base parameters are optimized by GAs and neural networks (NNs). The model is applied to predict a hotel's daily air conditioning consumption for a period over 3 months. The results obtained by the proposed model are presented and compared with regular method of NNs, which indicates that GA-HANFIS model possesses better performance than NNs in terms of their forecasting accuracy. (author)

  16. Modeling of a 5-cell direct methanol fuel cell using adaptive-network-based fuzzy inference systems

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Rongrong; Li, Chunwen [Department of Automation, Tsinghua University, Beijing 100084 (China); Qi, Liang; Xie, Xiaofeng [Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing 100084 (China); Ding, Qingqing [Department of Electrical Engineering, Tsinghua University, Beijing 100084 (China); Ma, ChenChi M. [National Tsing Hua University, Hsinchu 300 (China)

    2008-12-01

    The methanol concentrations, temperature and current were considered as inputs, the cell voltage was taken as output, and the performance of a direct methanol fuel cell (DMFC) was modeled by adaptive-network-based fuzzy inference systems (ANFIS). The artificial neural network (ANN) and polynomial-based models were selected to be compared with the ANFIS in respect of quality and accuracy. Based on the ANFIS model obtained, the characteristics of the DMFC were studied. The results show that temperature and methanol concentration greatly affect the performance of the DMFC. Within a restricted current range, the methanol concentration does not greatly affect the stack voltage. In order to obtain higher fuel utilization efficiency, the methanol concentrations and temperatures should be adjusted according to the load on the system. (author)

  17. Prosecutor: parameter-free inference of gene function for prokaryotes using DNA microarray data, genomic context and multiple gene annotation sources

    Directory of Open Access Journals (Sweden)

    van Hijum Sacha AFT

    2008-10-01

    Full Text Available Abstract Background Despite a plethora of functional genomic efforts, the function of many genes in sequenced genomes remains unknown. The increasing amount of microarray data for many species allows employing the guilt-by-association principle to predict function on a large scale: genes exhibiting similar expression patterns are more likely to participate in shared biological processes. Results We developed Prosecutor, an application that enables researchers to rapidly infer gene function based on available gene expression data and functional annotations. Our parameter-free functional prediction method uses a sensitive algorithm to achieve a high association rate of linking genes with unknown function to annotated genes. Furthermore, Prosecutor utilizes additional biological information such as genomic context and known regulatory mechanisms that are specific for prokaryotes. We analyzed publicly available transcriptome data sets and used literature sources to validate putative functions suggested by Prosecutor. We supply the complete results of our analysis for 11 prokaryotic organisms on a dedicated website. Conclusion The Prosecutor software and supplementary datasets available at http://www.prosecutor.nl allow researchers working on any of the analyzed organisms to quickly identify the putative functions of their genes of interest. A de novo analysis allows new organisms to be studied.

  18. BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Directory of Open Access Journals (Sweden)

    Lin Kui

    2007-07-01

    Full Text Available Abstract Background Overlapping genes (OGs in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest. Description BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ and Unweighted Pair-Group Method using Arithmetic averages (UPGMA. Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint. Conclusion BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by

  19. Multimodality Inferring of Human Cognitive States Based on Integration of Neuro-Fuzzy Network and Information Fusion Techniques

    Directory of Open Access Journals (Sweden)

    P. Bhattacharya

    2007-11-01

    Full Text Available To achieve an effective and safe operation on the machine system where the human interacts with the machine mutually, there is a need for the machine to understand the human state, especially cognitive state, when the human's operation task demands an intensive cognitive activity. Due to a well-known fact with the human being, a highly uncertain cognitive state and behavior as well as expressions or cues, the recent trend to infer the human state is to consider multimodality features of the human operator. In this paper, we present a method for multimodality inferring of human cognitive states by integrating neuro-fuzzy network and information fusion techniques. To demonstrate the effectiveness of this method, we take the driver fatigue detection as an example. The proposed method has, in particular, the following new features. First, human expressions are classified into four categories: (i casual or contextual feature, (ii contact feature, (iii contactless feature, and (iv performance feature. Second, the fuzzy neural network technique, in particular Takagi-Sugeno-Kang (TSK model, is employed to cope with uncertain behaviors. Third, the sensor fusion technique, in particular ordered weighted aggregation (OWA, is integrated with the TSK model in such a way that cues are taken as inputs to the TSK model, and then the outputs of the TSK are fused by the OWA which gives outputs corresponding to particular cognitive states under interest (e.g., fatigue. We call this method TSK-OWA. Validation of the TSK-OWA, performed in the Northeastern University vehicle drive simulator, has shown that the proposed method is promising to be a general tool for human cognitive state inferring and a special tool for the driver fatigue detection.

  20. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Gene expression patterns combined with network analysis identify hub genes associated with bladder cancer.

    Science.gov (United States)

    Bi, Dongbin; Ning, Hao; Liu, Shuai; Que, Xinxiang; Ding, Kejia

    2015-06-01

    To explore molecular mechanisms of bladder cancer (BC), network strategy was used to find biomarkers for early detection and diagnosis. The differentially expressed genes (DEGs) between bladder carcinoma patients and normal subjects were screened using empirical Bayes method of the linear models for microarray data package. Co-expression networks were constructed by differentially co-expressed genes and links. Regulatory impact factors (RIF) metric was used to identify critical transcription factors (TFs). The protein-protein interaction (PPI) networks were constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and clusters were obtained through molecular complex detection (MCODE) algorithm. Centralities analyses for complex networks were performed based on degree, stress and betweenness. Enrichment analyses were performed based on Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Co-expression networks and TFs (based on expression data of global DEGs and DEGs in different stages and grades) were identified. Hub genes of complex networks, such as UBE2C, ACTA2, FABP4, CKS2, FN1 and TOP2A, were also obtained according to analysis of degree. In gene enrichment analyses of global DEGs, cell adhesion, proteinaceous extracellular matrix and extracellular matrix structural constituent were top three GO terms. ECM-receptor interaction, focal adhesion, and cell cycle were significant pathways. Our results provide some potential underlying biomarkers of BC. However, further validation is required and deep studies are needed to elucidate the pathogenesis of BC. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Chromosome Gene Orientation Inversion Networks (GOINs) of Plasmodium Proteome.

    Science.gov (United States)

    Quevedo-Tumailli, Viviana F; Ortega-Tenezaca, Bernabé; González-Díaz, Humbert

    2018-03-02

    The spatial distribution of genes in chromosomes seems not to be random. For instance, only 10% of genes are transcribed from bidirectional promoters in humans, and many more are organized into larger clusters. This raises intriguing questions previously asked by different authors. We would like to add a few more questions in this context, related to gene orientation inversions. Does gene orientation (inversion) follow a random pattern? Is it relevant to biological activity somehow? We define a new kind of network coined as the gene orientation inversion network (GOIN). GOIN's complex network encodes short- and long-range patterns of inversion of the orientation of pairs of gene in the chromosome. We selected Plasmodium falciparum as a case of study due to the high relevance of this parasite to public health (causal agent of malaria). We constructed here for the first time all of the GOINs for the genome of this parasite. These networks have an average of 383 nodes (genes in one chromosome) and 1314 links (pairs of gene with inverse orientation). We calculated node centralities and other parameters of these networks. These numerical parameters were used to study different properties of gene inversion patterns, for example, distribution, local communities, similarity to Erdös-Rényi random networks, randomness, and so on. We find clues that seem to indicate that gene orientation inversion does not follow a random pattern. We noted that some gene communities in the GOINs tend to group genes encoding for RIFIN-related proteins in the proteome of the parasite. RIFIN-like proteins are a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Consequently, we used these centralities as input of machine learning (ML) models to predict the RIFIN-like activity of 5365 proteins in the proteome of Plasmodium sp. The best linear ML model found discriminates RIFIN-like from other proteins with sensitivity and

  3. Spatiotemporal network motif reveals the biological traits of developmental gene regulatory networks in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Kim Man-Sun

    2012-05-01

    Full Text Available Abstract Background Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts. Results On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns. Conclusions Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.

  4. Novel candidate genes important for asthma and hypertension comorbidity revealed from associative gene networks.

    Science.gov (United States)

    Saik, Olga V; Demenkov, Pavel S; Ivanisenko, Timofey V; Bragina, Elena Yu; Freidin, Maxim B; Goncharova, Irina A; Dosenko, Victor E; Zolotareva, Olga I; Hofestaedt, Ralf; Lavrik, Inna N; Rogaev, Evgeny I; Ivanisenko, Vladimir A

    2018-02-13

    Hypertension and bronchial asthma are a major issue for people's health. As of 2014, approximately one billion adults, or ~ 22% of the world population, have had hypertension. As of 2011, 235-330 million people globally have been affected by asthma and approximately 250,000-345,000 people have died each year from the disease. The development of the effective treatment therapies against these diseases is complicated by their comorbidity features. This is often a major problem in diagnosis and their treatment. Hence, in this study the bioinformatical methodology for the analysis of the comorbidity of these two diseases have been developed. As such, the search for candidate genes related to the comorbid conditions of asthma and hypertension can help in elucidating the molecular mechanisms underlying the comorbid condition of these two diseases, and can also be useful for genotyping and identifying new drug targets. Using ANDSystem, the reconstruction and analysis of gene networks associated with asthma and hypertension was carried out. The gene network of asthma included 755 genes/proteins and 62,603 interactions, while the gene network of hypertension - 713 genes/proteins and 45,479 interactions. Two hundred and five genes/proteins and 9638 interactions were shared between asthma and hypertension. An approach for ranking genes implicated in the comorbid condition of two diseases was proposed. The approach is based on nine criteria for ranking genes by their importance, including standard methods of gene prioritization (Endeavor, ToppGene) as well as original criteria that take into account the characteristics of an associative gene network and the presence of known polymorphisms in the analysed genes. According to the proposed approach, the genes IL10, TLR4, and CAT had the highest priority in the development of comorbidity of these two diseases. Additionally, it was revealed that the list of top genes is enriched with apoptotic genes and genes involved in

  5. Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.

    Science.gov (United States)

    Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin

    2017-08-01

    This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  6. Inferring the physical connectivity of complex networks from their functional dynamics

    Directory of Open Access Journals (Sweden)

    Holm Liisa

    2010-05-01

    Full Text Available Abstract Background Biological networks, such as protein-protein interactions, metabolic, signalling, transcription-regulatory networks and neural synapses, are representations of large-scale dynamic systems. The relationship between the network structure and functions remains one of the central problems in current multidisciplinary research. Significant progress has been made toward understanding the implication of topological features for the network dynamics and functions, especially in biological networks. Given observations of a network system's behaviours or measurements of its functional dynamics, what can we conclude of the details of physical connectivity of the underlying structure? Results We modelled the network system by employing a scale-free network of coupled phase oscillators. Pairwise phase coherence (PPC was calculated for all the pairs of oscillators to present functional dynamics induced by the system. At the regime of global incoherence, we observed a Significant pairwise synchronization only between two nodes that are physically connected. Right after the onset of global synchronization, disconnected nodes begin to oscillate in a correlated fashion and the PPC of two nodes, either connected or disconnected, depends on their degrees. Based on the observation of PPCs, we built a weighted network of synchronization (WNS, an all-to-all functionally connected network where each link is weighted by the PPC of two oscillators at the ends of the link. In the regime of strong coupling, we observed a Significant similarity in the organization of WNSs induced by systems sharing the same substrate network but different configurations of initial phases and intrinsic frequencies of oscillators. We reconstruct physical network from the WNS by choosing the links whose weights are higher than a given threshold. We observed an optimal reconstruction just before the onset of global synchronization. Finally, we correlated the topology of the

  7. A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining

    Directory of Open Access Journals (Sweden)

    Lan Chung-Yu

    2008-09-01

    Full Text Available Abstract Background Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. Results In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs such as NF-κB. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin. Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network

  8. Identifying essential genes in bacterial metabolic networks with machine learning methods

    Science.gov (United States)

    2010-01-01

    Background Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective. Results We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for Salmonella typhimurium. We compared our predictions to the viability of experimental knock-outs of S. typhimurium and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway. Conclusions Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism. PMID:20438628

  9. Linear control theory for gene network modeling.

    Directory of Open Access Journals (Sweden)

    Yong-Jun Shin

    Full Text Available Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain and linear state-space (time domain can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.

  10. Multiscale Embedded Gene Co-expression Network Analysis.

    Directory of Open Access Journals (Sweden)

    Won-Min Song

    2015-11-01

    Full Text Available Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3, the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA by: i introducing quality control of co-expression similarities, ii parallelizing embedded network construction, and iii developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs. We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA. MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

  11. Multiscale Embedded Gene Co-expression Network Analysis.

    Science.gov (United States)

    Song, Won-Min; Zhang, Bin

    2015-11-01

    Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.

  12. Molecular phylogeny of the Oriental butterfly genus Arhopala (Lycaenidae, Theclinae) inferred from mitochondrial and nuclear genes

    NARCIS (Netherlands)

    Megens, H.J.W.C.; Nes, Van W.J.; Moorsel, van C.H.M.; Pierce, N.E.; Jong, de R.

    2004-01-01

    We present a phylogeny for a selection of species of the butterfly genus Arhopala Boisduval, 1832 based on molecular characters. We sequenced 1778 bases of the mitochondrial genes Cytochrome Oxidase 1 and 2 including tRNALeu, and a 393-bp fragment of the nuclear wingless gene for a total of 42

  13. Structure, evolution and functional inference on the Mildew Locus O (MLO) gene family in three cultivated Cucurbitaceae spp.

    Science.gov (United States)

    Iovieno, Paolo; Andolfo, Giuseppe; Schiavulli, Adalgisa; Catalano, Domenico; Ricciardi, Luigi; Frusciante, Luigi; Ercolano, Maria Raffaella; Pavan, Stefano

    2015-12-29

    The powdery mildew disease affects thousands of plant species and arguably represents the major fungal threat for many Cucurbitaceae crops, including melon (Cucumis melo L.), watermelon (Citrullus lanatus L.) and zucchini (Cucurbita pepo L.). Several studies revealed that specific members of the Mildew Locus O (MLO) gene family act as powdery mildew susceptibility factors. Indeed, their inactivation, as the result of gene knock-out or knock-down, is associated with a peculiar form of resistance, referred to as mlo resistance. We exploited recently available genomic information to provide a comprehensive overview of the MLO gene family in Cucurbitaceae. We report the identification of 16 MLO homologs in C. melo, 14 in C. lanatus and 18 in C. pepo genomes. Bioinformatic treatment of data allowed phylogenetic inference and the prediction of several ortholog pairs and groups. Comparison with functionally characterized MLO genes and, in C. lanatus, gene expression analysis, resulted in the detection of candidate powdery mildew susceptibility factors. We identified a series of conserved amino acid residues and motifs that are likely to play a major role for the function of MLO proteins. Finally, we performed a codon-based evolutionary analysis indicating a general high level of purifying selection in the three Cucurbitaceae MLO gene families, and the occurrence of regions under diversifying selection in candidate susceptibility factors. Results of this study may help to address further biological questions concerning the evolution and function of MLO genes. Moreover, data reported here could be conveniently used by breeding research, aiming to select powdery mildew resistant cultivars in Cucurbitaceae.

  14. Singular Perturbation Analysis and Gene Regulatory Networks with Delay

    Science.gov (United States)

    Shlykova, Irina; Ponosov, Arcady

    2009-09-01

    There are different ways of how to model gene regulatory networks. Differential equations allow for a detailed description of the network's dynamics and provide an explicit model of the gene concentration changes over time. Production and relative degradation rate functions used in such models depend on the vector of steeply sloped threshold functions which characterize the activity of genes. The most popular example of the threshold functions comes from the Boolean network approach, where the threshold functions are given by step functions. The system of differential equations becomes then piecewise linear. The dynamics of this system can be described very easily between the thresholds, but not in the switching domains. For instance this approach fails to analyze stationary points of the system and to define continuous solutions in the switching domains. These problems were studied in [2], [3], but the proposed model did not take into account a time delay in cellular systems. However, analysis of real gene expression data shows a considerable number of time-delayed interactions suggesting that time delay is essential in gene regulation. Therefore, delays may have a great effect on the dynamics of the system presenting one of the critical factors that should be considered in reconstruction of gene regulatory networks. The goal of this work is to apply the singular perturbation analysis to certain systems with delay and to obtain an analog of Tikhonov's theorem, which provides sufficient conditions for constracting the limit system in the delay case.

  15. Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient.

    Science.gov (United States)

    Mohamed Salleh, Faridah Hani; Arif, Shereena Mohd; Zainudin, Suhaila; Firdaus-Raih, Mohd

    2015-12-01

    A gene regulatory network (GRN) is a large and complex network consisting of interacting elements that, over time, affect each other's state. The dynamics of complex gene regulatory processes are difficult to understand using intuitive approaches alone. To overcome this problem, we propose an algorithm for inferring the regulatory interactions from knock-out data using a Gaussian model combines with Pearson Correlation Coefficient (PCC). There are several problems relating to GRN construction that have been outlined in this paper. We demonstrated the ability of our proposed method to (1) predict the presence of regulatory interactions between genes, (2) their directionality and (3) their states (activation or suppression). The algorithm was applied to network sizes of 10 and 50 genes from DREAM3 datasets and network sizes of 10 from DREAM4 datasets. The predicted networks were evaluated based on AUROC and AUPR. We discovered that high false positive values were generated by our GRN prediction methods because the indirect regulations have been wrongly predicted as true relationships. We achieved satisfactory results as the majority of sub-networks achieved AUROC values above 0.5. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. A relative variation-based method to unraveling gene regulatory networks.

    Directory of Open Access Journals (Sweden)

    Yali Wang

    Full Text Available Gene regulatory network (GRN reconstruction is essential in understanding the functioning and pathology of a biological system. Extensive models and algorithms have been developed to unravel a GRN. The DREAM project aims to clarify both advantages and disadvantages of these methods from an application viewpoint. An interesting yet surprising observation is that compared with complicated methods like those based on nonlinear differential equations, etc., methods based on a simple statistics, such as the so-called Z-score, usually perform better. A fundamental problem with the Z-score, however, is that direct and indirect regulations can not be easily distinguished. To overcome this drawback, a relative expression level variation (RELV based GRN inference algorithm is suggested in this paper, which consists of three major steps. Firstly, on the basis of wild type and single gene knockout/knockdown experimental data, the magnitude of RELV of a gene is estimated. Secondly, probability for the existence of a direct regulation from a perturbed gene to a measured gene is estimated, which is further utilized to estimate whether a gene can be regulated by other genes. Finally, the normalized RELVs are modified to make genes with an estimated zero in-degree have smaller RELVs in magnitude than the other genes, which is used afterwards in queuing possibilities of the existence of direct regulations among genes and therefore leads to an estimate on the GRN topology. This method can in principle avoid the so-called cascade errors under certain situations. Computational results with the Size 100 sub-challenges of DREAM3 and DREAM4 show that, compared with the Z-score based method, prediction performances can be substantially improved, especially the AUPR specification. Moreover, it can even outperform the best team of both DREAM3 and DREAM4. Furthermore, the high precision of the obtained most reliable predictions shows that the suggested algorithm may be

  17. Phylogenetic inference in Rafflesiales: the influence of rate heterogeneity and horizontal gene transfer

    Directory of Open Access Journals (Sweden)

    Vidal-Russell Romina

    2004-10-01

    Full Text Available Abstract Background The phylogenetic relationships among the holoparasites of Rafflesiales have remained enigmatic for over a century. Recent molecular phylogenetic studies using the mitochondrial matR gene placed Rafflesia, Rhizanthes and Sapria (Rafflesiaceae s. str. in the angiosperm order Malpighiales and Mitrastema (Mitrastemonaceae in Ericales. These phylogenetic studies did not, however, sample two additional groups traditionally classified within Rafflesiales (Apodantheaceae and Cytinaceae. Here we provide molecular phylogenetic evidence using DNA sequence data from mitochondrial and nuclear genes for representatives of all genera in Rafflesiales. Results Our analyses indicate that the phylogenetic affinities of the large-flowered clade and Mitrastema, ascertained using mitochondrial matR, are congruent with results from nuclear SSU rDNA when these data are analyzed using maximum likelihood and Bayesian methods. The relationship of Cytinaceae to Malvales was recovered in all analyses. Relationships between Apodanthaceae and photosynthetic angiosperms varied depending upon the data partition: Malvales (3-gene, Cucurbitales (matR or Fabales (atp1. The latter incongruencies suggest that horizontal gene transfer (HGT may be affecting the mitochondrial gene topologies. The lack of association between Mitrastema and Ericales using atp1 is suggestive of HGT, but greater sampling within eudicots is needed to test this hypothesis further. Conclusions Rafflesiales are not monophyletic but composed of three or four independent lineages (families: Rafflesiaceae, Mitrastemonaceae, Apodanthaceae and Cytinaceae. Long-branch attraction appears to be misleading parsimony analyses of nuclear small-subunit rDNA data, but model-based methods (maximum likelihood and Bayesian analyses recover a topology that is congruent with the mitochondrial matR gene tree, thus providing compelling evidence for organismal relationships. Horizontal gene transfer appears to

  18. BRAIN NETWORKS. Correlated gene expression supports synchronous activity in brain networks.

    Science.gov (United States)

    Richiardi, Jonas; Altmann, Andre; Milazzo, Anna-Clare; Chang, Catie; Chakravarty, M Mallar; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Flor, Herta; Frouin, Vincent; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Lemaître, Hervé; Mann, Karl F; Martinot, Jean-Luc; Nees, Frauke; Paus, Tomáš; Pausova, Zdenka; Rietschel, Marcella; Robbins, Trevor W; Smolka, Michael N; Spanagel, Rainer; Ströhle, Andreas; Schumann, Gunter; Hawrylycz, Mike; Poline, Jean-Baptiste; Greicius, Michael D

    2015-06-12

    During rest, brain activity is synchronized between different regions widely distributed throughout the brain, forming functional networks. However, the molecular mechanisms supporting functional connectivity remain undefined. We show that functional brain networks defined with resting-state functional magnetic resonance imaging can be recapitulated by using measures of correlated gene expression in a post mortem brain tissue data set. The set of 136 genes we identify is significantly enriched for ion channels. Polymorphisms in this set of genes significantly affect resting-state functional connectivity in a large sample of healthy adolescents. Expression levels of these genes are also significantly associated with axonal connectivity in the mouse. The results provide convergent, multimodal evidence that resting-state functional networks correlate with the orchestrated activity of dozens of genes linked to ion channel activity and synaptic function. Copyright © 2015, American Association for the Advancement of Science.

  19. Reconstruction of gene regulatory modules from RNA silencing of IFN-α modulators: experimental set-up and inference method.

    Science.gov (United States)

    Grassi, Angela; Di Camillo, Barbara; Ciccarese, Francesco; Agnusdei, Valentina; Zanovello, Paola; Amadori, Alberto; Finesso, Lorenzo; Indraccolo, Stefano; Toffolo, Gianna Maria

    2016-03-12

    Inference of gene regulation from expression data may help to unravel regulatory mechanisms involved in complex diseases or in the action of specific drugs. A challenging task for many researchers working in the field of systems biology is to build up an experiment with a limited budget and produce a dataset suitable to reconstruct putative regulatory modules worth of biological validation. Here, we focus on small-scale gene expression screens and we introduce a novel experimental set-up and a customized method of analysis to make inference on regulatory modules starting from genetic perturbation data, e.g. knockdown and overexpression data. To illustrate the utility of our strategy, it was applied to produce and analyze a dataset of quantitative real-time RT-PCR data, in which interferon-α (IFN-α) transcriptional response in endothelial cells is investigated by RNA silencing of two candidate IFN-α modulators, STAT1 and IFIH1. A putative regulatory module was reconstructed by our method, revealing an intriguing feed-forward loop, in which STAT1 regulates IFIH1 and they both negatively regulate IFNAR1. STAT1 regulation on IFNAR1 was object of experimental validation at the protein level. Detailed description of the experimental set-up and of the analysis procedure is reported, with the intent to be of inspiration for other scientists who want to realize similar experiments to reconstruct gene regulatory modules starting from perturbations of possible regulators. Application of our approach to the study of IFN-α transcriptional response modulators in endothelial cells has led to many interesting novel findings and new biological hypotheses worth of validation.

  20. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    Science.gov (United States)

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  1. [Weighted gene co-expression network analysis in biomedicine research].

    Science.gov (United States)

    Liu, Wei; Li, Li; Ye, Hua; Tu, Wei

    2017-11-25

    High-throughput biological technologies are now widely applied in biology and medicine, allowing scientists to monitor thousands of parameters simultaneously in a specific sample. However, it is still an enormous challenge to mine useful information from high-throughput data. The emergence of network biology provides deeper insights into complex bio-system and reveals the modularity in tissue/cellular networks. Correlation networks are increasingly used in bioinformatics applications. Weighted gene co-expression network analysis (WGCNA) tool can detect clusters of highly correlated genes. Therefore, we systematically reviewed the application of WGCNA in the study of disease diagnosis, pathogenesis and other related fields. First, we introduced principle, workflow, advantages and disadvantages of WGCNA. Second, we presented the application of WGCNA in disease, physiology, drug, evolution and genome annotation. Then, we indicated the application of WGCNA in newly developed high-throughput methods. We hope this review will help to promote the application of WGCNA in biomedicine research.

  2. Transcriptional control in the segmentation gene network of Drosophila.

    Directory of Open Access Journals (Sweden)

    Mark D Schroeder

    2004-09-01

    Full Text Available The segmentation gene network of Drosophila consists of maternal and zygotic factors that generate, by transcriptional (cross- regulation, expression patterns of increasing complexity along the anterior-posterior axis of the embryo. Using known binding site information for maternal and zygotic gap transcription factors, the computer algorithm Ahab recovers known segmentation control elements (modules with excellent success and predicts many novel modules within the network and genome-wide. We show that novel module predictions are highly enriched in the network and typically clustered proximal to the promoter, not only upstream, but also in intronic space and downstream. When placed upstream of a reporter gene, they consistently drive patterned blastoderm expression, in most cases faithfully producing one or more pattern elements of the endogenous gene. Moreover, we demonstrate for the entire set of known and newly validated modules that Ahab's prediction of binding sites correlates well with the expression patterns produced by the modules, revealing basic rules governing their composition. Specifically, we show that maternal factors consistently act as activators and that gap factors act as repressors, except for the bimodal factor Hunchback. Our data suggest a simple context-dependent rule for its switch from repressive to activating function. Overall, the composition of modules appears well fitted to the spatiotemporal distribution of their positive and negative input factors. Finally, by comparing Ahab predictions with different categories of transcription factor input, we confirm the global regulatory structure of the segmentation gene network, but find odd skipped behaving like a primary pair-rule gene. The study expands our knowledge of the segmentation gene network by increasing the number of experimentally tested modules by 50%. For the first time, the entire set of validated modules is analyzed for binding site composition under a

  3. The fidelity of Kepler eclipsing binary parameters inferred by the neural network

    Science.gov (United States)

    Holanda, N.; da Silva, J. R. P.

    2018-04-01

    This work aims to test the fidelity and efficiency of obtaining automatic orbital elements of eclipsing binary systems, from light curves using neural network models. We selected a random sample with 78 systems, from over 1400 eclipsing binary detached obtained from the Kepler Eclipsing Binaries Catalog, processed using the neural network approach. The orbital parameters of the sample systems were measured applying the traditional method of light curve adjustment with uncertainties calculated by the bootstrap method, employing the JKTEBOP code. These estimated parameters were compared with those obtained by the neural network approach for the same systems. The results reveal a good agreement between techniques for the sum of the fractional radii and moderate agreement for e cos ω and e sin ω, but orbital inclination is clearly underestimated in neural network tests.

  4. Simulation and Statistical Inference of Stochastic Reaction Networks with Applications to Epidemic Models

    KAUST Repository

    Moraes, Alvaro

    2015-01-01

    Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions

  5. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    Science.gov (United States)

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were

  6. Construction of functional linkage gene networks by data integration.

    Science.gov (United States)

    Linghu, Bolan; Franzosa, Eric A; Xia, Yu

    2013-01-01

    Networks of functional associations between genes have recently been successfully used for gene function and disease-related research. A typical approach for constructing such functional linkage gene networks (FLNs) is based on the integration of diverse high-throughput functional genomics datasets. Data integration is a nontrivial task due to the heterogeneous nature of the different data sources and their variable accuracy and completeness. The presence of correlations between data sources also adds another layer of complexity to the integration process. In this chapter we discuss an approach for constructing a human FLN from data integration and a subsequent application of the FLN to novel disease gene discovery. Similar approaches can be applied to nonhuman species and other discovery tasks.

  7. Finding gene regulatory network candidates using the gene expression knowledge base.

    Science.gov (United States)

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  8. Network Analysis of Human Genes Influencing Susceptibility to Mycobacterial Infections

    Science.gov (United States)

    Lipner, Ettie M.; Garcia, Benjamin J.; Strong, Michael

    2016-01-01

    Tuberculosis and nontuberculous mycobacterial infections constitute a high burden of pulmonary disease in humans, resulting in over 1.5 million deaths per year. Building on the premise that genetic factors influence the instance, progression, and defense of infectious disease, we undertook a systems biology approach to investigate relationships among genetic factors that may play a role in increased susceptibility or control of mycobacterial infections. We combined literature and database mining with network analysis and pathway enrichment analysis to examine genes, pathways, and networks, involved in the human response to Mycobacterium tuberculosis and nontuberculous mycobacterial infections. This approach allowed us to examine functional relationships among reported genes, and to identify novel genes and enriched pathways that may play a role in mycobacterial susceptibility or control. Our findings suggest that the primary pathways and genes influencing mycobacterial infection control involve an interplay between innate and adaptive immune proteins and pathways. Signaling pathways involved in autoimmune disease were significantly enriched as revealed in our networks. Mycobacterial disease susceptibility networks were also examined within the context of gene-chemical relationships, in order to identify putative drugs and nutrients with potential beneficial immunomodulatory or anti-mycobacterial effects. PMID:26751573

  9. Species boundaries of Gulf of Mexico vestimentiferans (Polychaeta, Siboglinidae) inferred from mitochondrial genes

    Science.gov (United States)

    Pia Miglietta, Maria; Hourdez, Stephane; Cowart, Dominique A.; Schaeffer, Stephen W.; Fisher, Charles

    2010-11-01

    At least six morphospecies of vestimentiferan tubeworms are associated with cold seeps in the Gulf of Mexico (GOM). The physiology and ecology of the two best-studied species from depths above 1000 m in the upper Louisiana slope (Lamellibrachia luymesi and Seepiophila jonesi) are relatively well understood. The biology of one rare species from the upper slope (escarpiid sp. nov.) and three morphospecies found at greater depths in the GOM (Lamellibrachia sp. 1, L. sp. 2, and Escarpia laminata) are not as well understood. Here we address species distributions and boundaries of cold-seep tubeworms using phylogenetic hypotheses based on two mitochondrial genes. Fragments of the mitochondrial large ribosomal subunit rDNA (16S) and cytochrome oxidase subunit I (COI) genes were sequenced for 167 vestimentiferans collected from the GOM and analyzed in the context of other seep vestimentiferans for which sequence data were available. The analysis supported five monophyletic clades of vestimentiferans in the GOM. Intra-clade variation in both genes was very low, and there was no apparent correlation between the within-clade diversity and collection depth or location. Two of the morphospecies of Lamellibrachia from different depths in the GOM could not be distinguished by either mitochondrial gene. Similarly, E. laminata could not be distinguished from other described species of Escarpia from either the west coast of Africa or the eastern Pacific using COI. We suggest that the mitochondrial COI and 16S genes have little utility as barcoding markers for seep vestimentiferan tubeworms.

  10. Synchronous versus asynchronous modeling of gene regulatory networks.

    Science.gov (United States)

    Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni

    2008-09-01

    In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.

  11. A network approach to predict pathogenic genes for Fusarium graminearum.

    Directory of Open Access Journals (Sweden)

    Xiaoping Liu

    Full Text Available Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB, which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other

  12. Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network.

    Science.gov (United States)

    Kordmahalleh, Mina Moradi; Sefidmazgi, Mohammad Gorji; Harrison, Scott H; Homaifar, Abdollah

    2017-01-01

    The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Powerful biotechnologies have been rapidly and successfully measuring levels of genetic expression to illuminate different states of biological systems. This has led to an ensuing challenge to improve the identification of specific regulatory mechanisms through regulatory network reconstructions. Solutions to this challenge will ultimately help to spur forward efforts based on the usage of regulatory network reconstructions in systems biology applications. We have developed a hierarchical recurrent neural network (HRNN) that identifies time-delayed gene interactions using time-course data. A customized genetic algorithm (GA) was used to optimize hierarchical connectivity of regulatory genes and a target gene. The proposed design provides a non-fully connected network with the flexibility of using recurrent connections inside the network. These features and the non-linearity of the HRNN facilitate the process of identifying temporal patterns of a GRN. Our HRNN method was implemented with the Python language. It was first evaluated on simulated data representing linear and nonlinear time-delayed gene-gene interaction models across a range of network sizes and variances of noise. We then further demonstrated the capability of our method in reconstructing GRNs of the Saccharomyces cerevisiae synthetic network for in vivo benchmarking of reverse-engineering and modeling approaches (IRMA). We compared the performance of our method to TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet across different network

  13. Random Walks on Directed Networks: Inference and Respondent-Driven Sampling

    Directory of Open Access Journals (Sweden)

    Malmros Jens

    2016-06-01

    Full Text Available Respondent-driven sampling (RDS is often used to estimate population properties (e.g., sexual risk behavior in hard-to-reach populations. In RDS, already sampled individuals recruit population members to the sample from their social contacts in an efficient snowball-like sampling procedure. By assuming a Markov model for the recruitment of individuals, asymptotically unbiased estimates of population characteristics can be obtained. Current RDS estimation methodology assumes that the social network is undirected, that is, all edges are reciprocal. However, empirical social networks in general also include a substantial number of nonreciprocal edges. In this article, we develop an estimation method for RDS in populations connected by social networks that include reciprocal and nonreciprocal edges. We derive estimators of the selection probabilities of individuals as a function of the number of outgoing edges of sampled individuals. The proposed estimators are evaluated on artificial and empirical networks and are shown to generally perform better than existing estimators. This is the case in particular when the fraction of directed edges in the network is large.

  14. Ground rules of the pluripotency gene regulatory network.

    KAUST Repository

    Li, Mo

    2017-01-03

    Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.

  15. Ground rules of the pluripotency gene regulatory network.

    KAUST Repository

    Li, Mo; Belmonte, Juan Carlos Izpisua

    2017-01-01

    Pluripotency is a state that exists transiently in the early embryo and, remarkably, can be recapitulated in vitro by deriving embryonic stem cells or by reprogramming somatic cells to become induced pluripotent stem cells. The state of pluripotency, which is stabilized by an interconnected network of pluripotency-associated genes, integrates external signals and exerts control over the decision between self-renewal and differentiation at the transcriptional, post-transcriptional and epigenetic levels. Recent evidence of alternative pluripotency states indicates the regulatory flexibility of this network. Insights into the underlying principles of the pluripotency network may provide unprecedented opportunities for studying development and for regenerative medicine.

  16. Analysis of the population structure of Uruguayan Creole cattle as inferred from milk major gene polymorphisms

    Directory of Open Access Journals (Sweden)

    Gonzalo Rincón

    2006-01-01

    Full Text Available The ancestors of Uruguayan Creole cattle were introduced by the Spanish conquerors in the XVII century, following which the population grew extensively and became semi-feral before the introduction of selected breeds. Today the Uruguayan Creole cattle genetic reserve consists of 575 animals. We used the tetra primer amplification refractory mutation system polymerase chain reaction (ARMS-PCR to analyze the kappa-casein, beta-casein, alphaS1-casein and alpha-lactoalbumin gene polymorphisms and restriction fragment length polymorphism PCR (RFLP-PCR for the beta-lactoglobulin and the acylCoA:diacyl glycerol acyltransferase 1 (DGAT1 genes. The kappa-casein and beta-lactoglobulin genes presented very similar A and B allele frequencies, while the alphas1-casein and alpha-lactoalbumin gene B alleles showed much higher frequencies than the corresponding A alleles. The beta-casein B allele was not found in the population sampled. There was a very high frequency of the DGAT1 gene A allele which is associated with low milk fat content and high milk yield. All loci were in Hardy-Weinberg equilibrium and the level of heterozygosity agreed with the high genetic diversity observed in a previous analysis of this population. Preservation of the allelic richness observed in the Uruguayan Creole cattle should be considered for future dairy management and livestock genetic improvement. The results also emphasize the value of the tetra primers ARMS-PCR technique as a rapid, easy and economical way of genotyping cattle breeds for milk gene single nucleotide polymorphisms.

  17. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

    Directory of Open Access Journals (Sweden)

    Faridah Hani Mohamed Salleh

    2017-01-01

    Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  18. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems.

    Science.gov (United States)

    Salleh, Faridah Hani Mohamed; Zainudin, Suhaila; Arif, Shereena M

    2017-01-01

    Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dea