A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.
Directory of Open Access Journals (Sweden)
Xiangyun Xiao
Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
A Learning Algorithm for Multimodal Grammar Inference.
D'Ulizia, A; Ferri, F; Grifoni, P
2011-12-01
The high costs of development and maintenance of multimodal grammars in integrating and understanding input in multimodal interfaces lead to the investigation of novel algorithmic solutions in automating grammar generation and in updating processes. Many algorithms for context-free grammar inference have been developed in the natural language processing literature. An extension of these algorithms toward the inference of multimodal grammars is necessary for multimodal input processing. In this paper, we propose a novel grammar inference mechanism that allows us to learn a multimodal grammar from its positive samples of multimodal sentences. The algorithm first generates the multimodal grammar that is able to parse the positive samples of sentences and, afterward, makes use of two learning operators and the minimum description length metrics in improving the grammar description and in avoiding the over-generalization problem. The experimental results highlight the acceptable performances of the algorithm proposed in this paper since it has a very high probability of parsing valid sentences.
Replication-based Inference Algorithms for Hard Computational Problems
Alamino, Roberto C.; Neirotti, Juan P.; Saad, David
2013-01-01
Inference algorithms based on evolving interactions between replicated solutions are introduced and analyzed on a prototypical NP-hard problem - the capacity of the binary Ising perceptron. The efficiency of the algorithm is examined numerically against that of the parallel tempering algorithm, showing improved performance in terms of the results obtained, computing requirements and simplicity of implementation.
Grammatical inference algorithms, routines and applications
Wieczorek, Wojciech
2017-01-01
This book focuses on grammatical inference, presenting classic and modern methods of grammatical inference from the perspective of practitioners. To do so, it employs the Python programming language to present all of the methods discussed. Grammatical inference is a field that lies at the intersection of multiple disciplines, with contributions from computational linguistics, pattern recognition, machine learning, computational biology, formal learning theory and many others. Though the book is largely practical, it also includes elements of learning theory, combinatorics on words, the theory of automata and formal languages, plus references to real-world problems. The listings presented here can be directly copied and pasted into other programs, thus making the book a valuable source of ready recipes for students, academic researchers, and programmers alike, as well as an inspiration for their further development.>.
Congested Link Inference Algorithms in Dynamic Routing IP Network
Directory of Open Access Journals (Sweden)
Yu Chen
2017-01-01
Full Text Available The performance descending of current congested link inference algorithms is obviously in dynamic routing IP network, such as the most classical algorithm CLINK. To overcome this problem, based on the assumptions of Markov property and time homogeneity, we build a kind of Variable Structure Discrete Dynamic Bayesian (VSDDB network simplified model of dynamic routing IP network. Under the simplified VSDDB model, based on the Bayesian Maximum A Posteriori (BMAP and Rest Bayesian Network Model (RBNM, we proposed an Improved CLINK (ICLINK algorithm. Considering the concurrent phenomenon of multiple link congestion usually happens, we also proposed algorithm CLILRS (Congested Link Inference algorithm based on Lagrangian Relaxation Subgradient to infer the set of congested links. We validated our results by the experiments of analogy, simulation, and actual Internet.
Inference of Gene Regulatory Network Based on Local Bayesian Networks.
Liu, Fei; Zhang, Shao-Wu; Guo, Wei-Feng; Wei, Ze-Gang; Chen, Luonan
2016-08-01
The inference of gene regulatory networks (GRNs) from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. During past decades, numerous computational approaches have been introduced for inferring the GRNs. However, many of them still suffer from various problems, e.g., Bayesian network (BN) methods cannot handle large-scale networks due to their high computational complexity, while information theory-based methods cannot identify the directions of regulatory interactions and also suffer from false positive/negative problems. To overcome the limitations, in this work we present a novel algorithm, namely local Bayesian network (LBN), to infer GRNs from gene expression data by using the network decomposition strategy and false-positive edge elimination scheme. Specifically, LBN algorithm first uses conditional mutual information (CMI) to construct an initial network or GRN, which is decomposed into a number of local networks or GRNs. Then, BN method is employed to generate a series of local BNs by selecting the k-nearest neighbors of each gene as its candidate regulatory genes, which significantly reduces the exponential search space from all possible GRN structures. Integrating these local BNs forms a tentative network or GRN by performing CMI, which reduces redundant regulations in the GRN and thus alleviates the false positive problem. The final network or GRN can be obtained by iteratively performing CMI and local BN on the tentative network. In the iterative process, the false or redundant regulations are gradually removed. When tested on the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in E.coli, our results suggest that LBN outperforms other state-of-the-art methods (ARACNE, GENIE3 and NARROMI) significantly, with more accurate and robust performance. In particular, the decomposition strategy with local Bayesian networks not only effectively reduce
Algorithm Optimally Orders Forward-Chaining Inference Rules
James, Mark
2008-01-01
People typically develop knowledge bases in a somewhat ad hoc manner by incrementally adding rules with no specific organization. This often results in a very inefficient execution of those rules since they are so often order sensitive. This is relevant to tasks like Deep Space Network in that it allows the knowledge base to be incrementally developed and have it automatically ordered for efficiency. Although data flow analysis was first developed for use in compilers for producing optimal code sequences, its usefulness is now recognized in many software systems including knowledge-based systems. However, this approach for exhaustively computing data-flow information cannot directly be applied to inference systems because of the ubiquitous execution of the rules. An algorithm is presented that efficiently performs a complete producer/consumer analysis for each antecedent and consequence clause in a knowledge base to optimally order the rules to minimize inference cycles. An algorithm was developed that optimally orders a knowledge base composed of forwarding chaining inference rules such that independent inference cycle executions are minimized, thus, resulting in significantly faster execution. This algorithm was integrated into the JPL tool Spacecraft Health Inference Engine (SHINE) for verification and it resulted in a significant reduction in inference cycles for what was previously considered an ordered knowledge base. For a knowledge base that is completely unordered, then the improvement is much greater.
Directory of Open Access Journals (Sweden)
Matsen Frederick A
2012-05-01
Full Text Available Abstract Background Although taxonomy is often used informally to evaluate the results of phylogenetic inference and the root of phylogenetic trees, algorithmic methods to do so are lacking. Results In this paper we formalize these procedures and develop algorithms to solve the relevant problems. In particular, we introduce a new algorithm that solves a "subcoloring" problem to express the difference between a taxonomy and a phylogeny at a given rank. This algorithm improves upon the current best algorithm in terms of asymptotic complexity for the parameter regime of interest; we also describe a branch-and-bound algorithm that saves orders of magnitude in computation on real data sets. We also develop a formalism and an algorithm for rooting phylogenetic trees according to a taxonomy. Conclusions The algorithms in this paper, and the associated freely-available software, will help biologists better use and understand taxonomically labeled phylogenetic trees.
Conditional random pattern algorithm for LOH inference and segmentation.
Wu, Ling-Yun; Zhou, Xiaobo; Li, Fuhai; Yang, Xiaorong; Chang, Chung-Che; Wong, Stephen T C
2009-01-01
Loss of heterozygosity (LOH) is one of the most important mechanisms in the tumor evolution. LOH can be detected from the genotypes of the tumor samples with or without paired normal samples. In paired sample cases, LOH detection for informative single nucleotide polymorphisms (SNPs) is straightforward if there is no genotyping error. But genotyping errors are always unavoidable, and there are about 70% non-informative SNPs whose LOH status can only be inferred from the neighboring informative SNPs. This article presents a novel LOH inference and segmentation algorithm based on the conditional random pattern (CRP) model. The new model explicitly considers the distance between two neighboring SNPs, as well as the genotyping error rate and the heterozygous rate. This new method is tested on the simulated and real data of the Affymetrix Human Mapping 500K SNP arrays. The experimental results show that the CRP method outperforms the conventional methods based on the hidden Markov model (HMM). Software is available upon request.
Comparison of evolutionary algorithms in gene regulatory network model inference.
LENUS (Irish Health Repository)
2010-01-01
ABSTRACT: BACKGROUND: The evolution of high throughput technologies that measure gene expression levels has created a data base for inferring GRNs (a process also known as reverse engineering of GRNs). However, the nature of these data has made this process very difficult. At the moment, several methods of discovering qualitative causal relationships between genes with high accuracy from microarray data exist, but large scale quantitative analysis on real biological datasets cannot be performed, to date, as existing approaches are not suitable for real microarray data which are noisy and insufficient. RESULTS: This paper performs an analysis of several existing evolutionary algorithms for quantitative gene regulatory network modelling. The aim is to present the techniques used and offer a comprehensive comparison of approaches, under a common framework. Algorithms are applied to both synthetic and real gene expression data from DNA microarrays, and ability to reproduce biological behaviour, scalability and robustness to noise are assessed and compared. CONCLUSIONS: Presented is a comparison framework for assessment of evolutionary algorithms, used to infer gene regulatory networks. Promising methods are identified and a platform for development of appropriate model formalisms is established.
Sparse kernel learning with LASSO and Bayesian inference algorithm.
Gao, Junbin; Kwan, Paul W; Shi, Daming
2010-03-01
Kernelized LASSO (Least Absolute Selection and Shrinkage Operator) has been investigated in two separate recent papers [Gao, J., Antolovich, M., & Kwan, P. H. (2008). L1 LASSO and its Bayesian inference. In W. Wobcke, & M. Zhang (Eds.), Lecture notes in computer science: Vol. 5360 (pp. 318-324); Wang, G., Yeung, D. Y., & Lochovsky, F. (2007). The kernel path in kernelized LASSO. In International conference on artificial intelligence and statistics (pp. 580-587). San Juan, Puerto Rico: MIT Press]. This paper is concerned with learning kernels under the LASSO formulation via adopting a generative Bayesian learning and inference approach. A new robust learning algorithm is proposed which produces a sparse kernel model with the capability of learning regularized parameters and kernel hyperparameters. A comparison with state-of-the-art methods for constructing sparse regression models such as the relevance vector machine (RVM) and the local regularization assisted orthogonal least squares regression (LROLS) is given. The new algorithm is also demonstrated to possess considerable computational advantages. Copyright 2009 Elsevier Ltd. All rights reserved.
A haplotype inference algorithm for trios based on deterministic sampling
Directory of Open Access Journals (Sweden)
Iliadis Alexandros
2010-08-01
Full Text Available Abstract Background In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs. Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data. Results We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS package, available for download at http://www.ee.columbia.edu/~anastas/tds Conclusions Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets.
Inference of Gene Regulatory Network Based on Local Bayesian Networks.
Directory of Open Access Journals (Sweden)
Fei Liu
2016-08-01
Full Text Available The inference of gene regulatory networks (GRNs from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. During past decades, numerous computational approaches have been introduced for inferring the GRNs. However, many of them still suffer from various problems, e.g., Bayesian network (BN methods cannot handle large-scale networks due to their high computational complexity, while information theory-based methods cannot identify the directions of regulatory interactions and also suffer from false positive/negative problems. To overcome the limitations, in this work we present a novel algorithm, namely local Bayesian network (LBN, to infer GRNs from gene expression data by using the network decomposition strategy and false-positive edge elimination scheme. Specifically, LBN algorithm first uses conditional mutual information (CMI to construct an initial network or GRN, which is decomposed into a number of local networks or GRNs. Then, BN method is employed to generate a series of local BNs by selecting the k-nearest neighbors of each gene as its candidate regulatory genes, which significantly reduces the exponential search space from all possible GRN structures. Integrating these local BNs forms a tentative network or GRN by performing CMI, which reduces redundant regulations in the GRN and thus alleviates the false positive problem. The final network or GRN can be obtained by iteratively performing CMI and local BN on the tentative network. In the iterative process, the false or redundant regulations are gradually removed. When tested on the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in E.coli, our results suggest that LBN outperforms other state-of-the-art methods (ARACNE, GENIE3 and NARROMI significantly, with more accurate and robust performance. In particular, the decomposition strategy with local Bayesian networks not only
Directory of Open Access Journals (Sweden)
Miguel eLopes
2013-12-01
Full Text Available Accurate inference of causal gene regulatory networks from gene expression data is an open bioinformatics challenge. Gene interactions are dynamical processes and consequently we can expect that the effect of any regulation action occurs after a certain temporal lag. However such lag is unknown a priori and temporal aspects require specific inference algorithms. In this paper we aim to assess the impact of taking into consideration temporal aspects on the final accuracy of the inference procedure. In particular we will compare the accuracy of static algorithms, where no dynamic aspect is considered, to that of fixed lag and adaptive lag algorithms in three inference tasks from microarray expression data. Experimental results show that network inference algorithms that take dynamics into account perform consistently better than static ones, once the considered lags are properly chosen. However, no individual algorithm stands out in all three inference tasks, and the challenging nature of network inference tasks is evidenced, as a large number of the assessed algorithms does not perform better than random.
Inference from matrix products: a heuristic spin glass algorithm
Energy Technology Data Exchange (ETDEWEB)
Hastings, Matthew B [Los Alamos National Laboratory
2008-01-01
We present an algorithm for finding ground states of two-dimensional spin-glass systems based on ideas from matrix product states in quantum information theory. The algorithm works directly at zero temperature and defines an approximation to the energy whose accuracy depends on a parameter k. We test the algorithm against exact methods on random field and random bond Ising models, and we find that accurate results require a k which scales roughly polynomially with the system size. The algorithm also performs well when tested on small systems with arbitrary interactions, where no fast, exact algorithms exist. The time required is significantly less than Monte Carlo schemes.
A Fast Iterative Bayesian Inference Algorithm for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand; Manchón, Carles Navarro; Fleury, Bernard Henri
2013-01-01
representation of the Bessel K probability density function; a highly efficient, fast iterative Bayesian inference method is then applied to the proposed model. The resulting estimator outperforms other state-of-the-art Bayesian and non-Bayesian estimators, either by yielding lower mean squared estimation error...
GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.
Ali, Raja H; Muhammad, Sayyed A; Arvestad, Lars
2016-06-04
Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.
SALZA: Soft algorithmic complexity estimates for clustering and causality inference
Revolle, Marion; François, Cayre; Bihan, Nicolas le
2016-01-01
A complete set of practical estimators for the conditional, simple and joint algorihmic complexities is presented, from which a semi-metric is derived. Also, new directed information estimators are proposed that are applied to causality inference on Directed Acyclic Graphs. The performances of these estimators are investigated and shown to compare well with respect to the state-of-the-art Normalized Compression Distance (NCD).
Causal inference algorithms can be useful in life course epidemiology
la Bastide-van Gemert, Sacha; Stolk, Ronald P.; van den Heuvel, Edwin R.; Fidler, Vaclav
2014-01-01
Objectives: Life course epidemiology attempts to unravel causal relationships between variables observed over time. Causal relationships can be represented as directed acyclic graphs. This article explains the theoretical concepts of the search algorithms used for finding such representations, discu
Causal inference algorithms can be useful in life course epidemiology
la Bastide-van Gemert, Sacha; Stolk, Ronald P.; van den Heuvel, Edwin R.; Fidler, Vaclav
Objectives: Life course epidemiology attempts to unravel causal relationships between variables observed over time. Causal relationships can be represented as directed acyclic graphs. This article explains the theoretical concepts of the search algorithms used for finding such representations,
Directory of Open Access Journals (Sweden)
Wei Huang
2013-01-01
Full Text Available We introduce a new category of fuzzy inference systems with the aid of a multiobjective opposition-based space search algorithm (MOSSA. The proposed MOSSA is essentially a multiobjective space search algorithm improved by using an opposition-based learning that employs a so-called opposite numbers mechanism to speed up the convergence of the optimization algorithm. In the identification of fuzzy inference system, the MOSSA is exploited to carry out the parametric identification of the fuzzy model as well as to realize its structural identification. Experimental results demonstrate the effectiveness of the proposed fuzzy models.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel
A new Hedging algorithm and its application to inferring latent random variables
Freund, Yoav
2008-01-01
We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of the master algorithm receive zero weight. We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables.
A curious robot: An explorative-exploitive inference algorithm
DEFF Research Database (Denmark)
Pedersen, Kim Steenstrup; Johansen, Peter
2007-01-01
We propose a sequential learning algorithm with a focus on robot control. It is initialised by a teacher who directs the robot through a series of example solutions of a problem. Left alone, the control chooses its next action by prediction based on a variable order Markov chain model selected...
A curious robot: An explorative-exploitive inference algorithm
DEFF Research Database (Denmark)
Pedersen, Kim Steenstrup; Johansen, Peter
2007-01-01
We propose a sequential learning algorithm with a focus on robot control. It is initialised by a teacher who directs the robot through a series of example solutions of a problem. Left alone, the control chooses its next action by prediction based on a variable order Markov chain model selected to...
DEFF Research Database (Denmark)
Møller, Jesper
.1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...
Improvements in algorithms for phenotype inference: the NAT2 example.
Selinski, Silvia; Blaszkewicz, Meinolf; Ickstadt, Katja; Hengstler, Jan G; Golka, Klaus
2014-02-01
Numerous studies have analyzed the impact of N-acetyltransferase 2 (NAT2) polymorphisms on drug efficacy, side effects as well as cancer risk. Here, we present the state of the art of deriving haplotypes from polymorphisms and discuss the available software. PHASE v2.1 is currently considered a gold standard for NAT2 haplotype assignment. In vitro studies have shown that some slow acetylation genotypes confer reduced protein stability. This has been observed particularly for G191A, T341C and G590A. Substantial ethnic variations of the acetylation status have been described. Probably, upcoming agriculture and the resulting change in diet caused a selection pressure for slow acetylation. In recent years much research has been done to reduce the complexity of NAT2 genotyping. Deriving the haplotype from seven SNPs is still considered a gold standard. However, meanwhile several studies have shown that a two-SNP combination, C282T and T341C, results in a similarly good distinction in Caucasians. However, attempts to further reduce complexity to only one 'tagging SNP' (rs1495741) may lead to wrong predictions where phenotypically slow acetylators were genotyped as intermediate or rapid. Numerous studies have shown that slow NAT2 haplotypes are associated with increased urinary bladder cancer risk and increased risk of anti-tuberculosis drug-induced hepatotoxicity. A drawback of the current practice of solely discriminating slow, intermediate and rapid genotypes for phenotype inference is limited resolution of differences between slow acetylators. Future developments to differentiate between slow and ultra-slow genotypes may further improve individualized drug dosing and epidemiological studies of cancer risk.
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Directory of Open Access Journals (Sweden)
Joeri Ruyssinck
Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made
Wang, Ting; Ren, Zhao; Ding, Ying; Fang, Zhou; Sun, Zhe; MacDonald, Matthew L; Sweet, Robert A; Wang, Jieru; Chen, Wei
2016-02-01
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM".
Inference of biological S-system using the separable estimation method and the genetic algorithm.
Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, W J
2012-01-01
Reconstruction of a biological system from its experimental time series data is a challenging task in systems biology. The S-system which consists of a group of nonlinear ordinary differential equations (ODEs) is an effective model to characterize molecular biological systems and analyze the system dynamics. However, inference of S-systems without the knowledge of system structure is not a trivial task due to its nonlinearity and complexity. In this paper, a pruning separable parameter estimation algorithm (PSPEA) is proposed for inferring S-systems. This novel algorithm combines the separable parameter estimation method (SPEM) and a pruning strategy, which includes adding an l₁ regularization term to the objective function and pruning the solution with a threshold value. Then, this algorithm is combined with the continuous genetic algorithm (CGA) to form a hybrid algorithm that owns the properties of these two combined algorithms. The performance of the pruning strategy in the proposed algorithm is evaluated from two aspects: the parameter estimation error and structure identification accuracy. The results show that the proposed algorithm with the pruning strategy has much lower estimation error and much higher identification accuracy than the existing method.
An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks
Bayer, Christian
2016-01-06
In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.
Algorithms of causal inference for the analysis of effective connectivity among brain regions
Directory of Open Access Journals (Sweden)
Daniel eChicharro
2014-07-01
Full Text Available In recent years, powerful general algorithms of causal inference have been developed. In particular, in the framework of Pearl’s causality, algorithms of inductive causation (IC and IC* provide a procedure to determine which causal connections among nodes in a network can be inferred from empirical observations even in the presence of latent variables, indicating the limits of what can be learned without active manipulation of the system. These algorithms can in principle become important complements to established techniques such as Granger causality and Dynamic Causal Modeling (DCM to analyze causal influences (effective connectivity among brain regions. However, their application to dynamic processes has not been yet examined. Here we study how to apply these algorithms to time-varying signals such as electrophysiological or neuroimaging signals. We propose a new algorithm which combines the basic principles of the previous algorithms with Granger causality to obtain a representation of the causal relations suited to dynamic processes. Furthermore, we use graphical criteria to predict dynamic statistical dependencies between the signals from the causal structure. We show how some problems for causal inference from neural signals (e.g. measurement noise, hemodynamic responses, and time aggregation can be understood in a general graphical approach. Focusing on the effect of spatial aggregation, we show that when causal inference is performed at a coarser scale than the one at which the neural sources interact, results strongly depend on the degree of integration of the neural sources aggregated in the signals, and thus characterize more the intra-areal properties than the interactions among regions. We finally discuss how the explicit consideration of latent processes contributes to understand Granger causality and DCM as well as to distinguish functional and effective connectivity.
Algorithms of causal inference for the analysis of effective connectivity among brain regions.
Chicharro, Daniel; Panzeri, Stefano
2014-01-01
In recent years, powerful general algorithms of causal inference have been developed. In particular, in the framework of Pearl's causality, algorithms of inductive causation (IC and IC(*)) provide a procedure to determine which causal connections among nodes in a network can be inferred from empirical observations even in the presence of latent variables, indicating the limits of what can be learned without active manipulation of the system. These algorithms can in principle become important complements to established techniques such as Granger causality and Dynamic Causal Modeling (DCM) to analyze causal influences (effective connectivity) among brain regions. However, their application to dynamic processes has not been yet examined. Here we study how to apply these algorithms to time-varying signals such as electrophysiological or neuroimaging signals. We propose a new algorithm which combines the basic principles of the previous algorithms with Granger causality to obtain a representation of the causal relations suited to dynamic processes. Furthermore, we use graphical criteria to predict dynamic statistical dependencies between the signals from the causal structure. We show how some problems for causal inference from neural signals (e.g., measurement noise, hemodynamic responses, and time aggregation) can be understood in a general graphical approach. Focusing on the effect of spatial aggregation, we show that when causal inference is performed at a coarser scale than the one at which the neural sources interact, results strongly depend on the degree of integration of the neural sources aggregated in the signals, and thus characterize more the intra-areal properties than the interactions among regions. We finally discuss how the explicit consideration of latent processes contributes to understand Granger causality and DCM as well as to distinguish functional and effective connectivity.
Assessment of network inference methods: how to cope with an underdetermined problem.
Directory of Open Access Journals (Sweden)
Caroline Siegenthaler
Full Text Available The inference of biological networks is an active research area in the field of systems biology. The number of network inference algorithms has grown tremendously in the last decade, underlining the importance of a fair assessment and comparison among these methods. Current assessments of the performance of an inference method typically involve the application of the algorithm to benchmark datasets and the comparison of the network predictions against the gold standard or reference networks. While the network inference problem is often deemed underdetermined, implying that the inference problem does not have a (unique solution, the consequences of such an attribute have not been rigorously taken into consideration. Here, we propose a new procedure for assessing the performance of gene regulatory network (GRN inference methods. The procedure takes into account the underdetermined nature of the inference problem, in which gene regulatory interactions that are inferable or non-inferable are determined based on causal inference. The assessment relies on a new definition of the confusion matrix, which excludes errors associated with non-inferable gene regulations. For demonstration purposes, the proposed assessment procedure is applied to the DREAM 4 In Silico Network Challenge. The results show a marked change in the ranking of participating methods when taking network inferability into account.
DEFF Research Database (Denmark)
Møller, Jesper
2010-01-01
Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...
Janzing, Dominik; Chaves, Rafael; Schölkopf, Bernhard
2016-09-01
We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We discuss the implications of this principle and argue that they link thermodynamics and causal inference. On the one hand, they entail behavior that is similar to the usual arrow of time. On the other hand, they motivate a statistical asymmetry between cause and effect that has recently been postulated in the field of causal inference, namely, that the probability distribution {P}{{cause}} contains no information about the conditional distribution {P}{{effect}| {{cause}}} and vice versa, while {P}{{effect}} may contain information about {P}{{cause}| {{effect}}}.
Vrettas, Michail D; Opper, Manfred; Cornford, Dan
2015-01-01
This work introduces a Gaussian variational mean-field approximation for inference in dynamical systems which can be modeled by ordinary stochastic differential equations. This new approach allows one to express the variational free energy as a functional of the marginal moments of the approximating Gaussian process. A restriction of the moment equations to piecewise polynomial functions, over time, dramatically reduces the complexity of approximate inference for stochastic differential equation models and makes it comparable to that of discrete time hidden Markov models. The algorithm is demonstrated on state and parameter estimation for nonlinear problems with up to 1000 dimensional state vectors and compares the results empirically with various well-known inference methodologies.
Directory of Open Access Journals (Sweden)
Purushotham Arnie
2011-10-01
Full Text Available Abstract Background Inferring molecular pathway activity is an important step towards reducing the complexity of genomic data, understanding the heterogeneity in clinical outcome, and obtaining molecular correlates of cancer imaging traits. Increasingly, approaches towards pathway activity inference combine molecular profiles (e.g gene or protein expression with independent and highly curated structural interaction data (e.g protein interaction networks or more generally with prior knowledge pathway databases. However, it is unclear how best to use the pathway knowledge information in the context of molecular profiles of any given study. Results We present an algorithm called DART (Denoising Algorithm based on Relevance network Topology which filters out noise before estimating pathway activity. Using simulated and real multidimensional cancer genomic data and by comparing DART to other algorithms which do not assess the relevance of the prior pathway information, we here demonstrate that substantial improvement in pathway activity predictions can be made if prior pathway information is denoised before predictions are made. We also show that genes encoding hubs in expression correlation networks represent more reliable markers of pathway activity. Using the Netpath resource of signalling pathways in the context of breast cancer gene expression data we further demonstrate that DART leads to more robust inferences about pathway activity correlations. Finally, we show that DART identifies a hypothesized association between oestrogen signalling and mammographic density in ER+ breast cancer. Conclusions Evaluating the consistency of prior information of pathway databases in molecular tumour profiles may substantially improve the subsequent inference of pathway activity in clinical tumour specimens. This de-noising strategy should be incorporated in approaches which attempt to infer pathway activity from prior pathway models.
Directory of Open Access Journals (Sweden)
Ting Wang
2016-02-01
Full Text Available Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM, a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer's disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named "FastGGM".
Hu, Zixi; Yao, Zhewei; Li, Jinglai
2017-03-01
Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.
Imetelstat (GRN163L)--telomerase-based cancer therapy.
Röth, Alexander; Harley, Calvin B; Baerlocher, Gabriela M
2010-01-01
Telomeres and telomerase play essential roles in the regulation of the lifespan of human cells. While normal human somatic cells do not or only transiently express telomerase and therefore shorten their telomeres with each cell division, most human cancer cells typically express high levels of telomerase and show unlimited cell proliferation. High telomerase expression allows cells to proliferate and expand long-term and therefore supports tumor growth. Owing to the high expression and its role, telomerase has become an attractive diagnostic and therapeutic cancer target. Imetelstat (GRN163L) is a potent and specific telomerase inhibitor and so far the only drug of its class in clinical trials. Here, we report on the structure and the mechanism of action of imetelstat as well as about the preclinical and clinical data and future prospects using imetelstat in cancer therapy.
Improving gene regulatory network inference using network topology information.
Nair, Ajay; Chetty, Madhu; Wangikar, Pramod P
2015-09-01
Inferring the gene regulatory network (GRN) structure from data is an important problem in computational biology. However, it is a computationally complex problem and approximate methods such as heuristic search techniques, restriction of the maximum-number-of-parents (maxP) for a gene, or an optimal search under special conditions are required. The limitations of a heuristic search are well known but literature on the detailed analysis of the widely used maxP technique is lacking. The optimal search methods require large computational time. We report the theoretical analysis and experimental results of the strengths and limitations of the maxP technique. Further, using an optimal search method, we combine the strengths of the maxP technique and the known GRN topology to propose two novel algorithms. These algorithms are implemented in a Bayesian network framework and tested on biological, realistic, and in silico networks of different sizes and topologies. They overcome the limitations of the maxP technique and show superior computational speed when compared to the current optimal search algorithms.
Drawing inferences from clinical studies with missing values using genetic algorithm.
Priya, R Devi; Kuppuswami, S
2014-01-01
Missing data problem degrades the statistical power of any analysis made in clinical studies. To infer valid results from such studies, suitable method is required to replace the missing values. There is no method which can be universally applicable for handling missing values and the main objective of this paper is to introduce a common method applicable in all cases of missing data. In this paper, Bayesian Genetic Algorithm (BGA) is proposed to effectively impute both missing continuous and discrete values using heuristic search algorithm called genetic algorithm and Bayesian rule. BGA is applied to impute missing values in a real cancer dataset under Missing At Random (MAR) and Missing Completely At Random (MCAR) conditions. For both discrete and continuous attributes, the results show better classification accuracy and RMSE% than many existing methods.
An improved algorithm for generalized community structure inference in complex networks
Qu, Yingfei; Shi, Weiren; Shi, Xin
2017-07-01
In recent years, the research of the community detection is not only on the structure that densely connected internally, but also on the structure of more patterns, such as heterogeneity, overlapping, core-periphery. In this paper, we build the network model based on the random graph models and propose an improved algorithm to infer the generalized community structures. We achieve it by introducing the generalized Bernstein polynomials and computing the latent parameters of vertices. The algorithm is tested both on the computer-generated benchmark networks and the real-world networks. Results show that the algorithm makes better performances on convergence speed and is able to discover the latent continuous structures in networks.
Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures
Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland
1998-01-01
Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.
Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.
Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani
2017-03-27
Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm
iPoint: an integer programming based algorithm for inferring protein subnetworks.
Atias, Nir; Sharan, Roded
2013-07-01
Large scale screening experiments have become the workhorse of molecular biology, producing data at an ever increasing scale. The interpretation of such data, particularly in the context of a protein interaction network, has the potential to shed light on the molecular pathways underlying the phenotype or the process in question. A host of approaches have been developed in recent years to tackle this reconstruction challenge. These approaches aim to infer a compact subnetwork that connects the genes revealed by the screen while optimizing local (individual path lengths) or global (likelihood) aspects of the subnetwork. Yosef et al. [Mol. Syst. Biol., 2009, 5, 248] were the first to provide a joint optimization of both criteria, albeit approximate in nature. Here we devise an integer linear programming formulation for the joint optimization problem, allowing us to solve it to optimality in minutes on current networks. We apply our algorithm, iPoint, to various data sets in yeast and human and evaluate its performance against state-of-the-art algorithms. We show that iPoint attains very compact and accurate solutions that outperform previous network inference algorithms with respect to their local and global attributes, their consistency across multiple experiments targeting the same pathway, and their agreement with current biological knowledge.
A DNA sequence alignment algorithm using quality information and a fuzzy inference method
Institute of Scientific and Technical Information of China (English)
Kwangbaek Kim; Minhwan Kim; Youngwoon Woo
2008-01-01
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods.In this paper.We propose a DNA sequence alignment that Uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information.In conventional algorithms.DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch,which is established by using quality information of each DNA fragment.However,there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low.because only the overall DNA sequence quality information are used.In our proposed method.an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information.Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments.From the experiments by applying real genome data of National Center for Bioteclmology Information,we could see that the proposed method is more efficient than conventional algorithms.
Directory of Open Access Journals (Sweden)
Heringstad Bjørg
2010-07-01
Full Text Available Abstract Background In the genetic analysis of binary traits with one observation per animal, animal threshold models frequently give biased heritability estimates. In some cases, this problem can be circumvented by fitting sire- or sire-dam models. However, these models are not appropriate in cases where individual records exist on parents. Therefore, the aim of our study was to develop a new Gibbs sampling algorithm for a proper estimation of genetic (covariance components within an animal threshold model framework. Methods In the proposed algorithm, individuals are classified as either "informative" or "non-informative" with respect to genetic (covariance components. The "non-informative" individuals are characterized by their Mendelian sampling deviations (deviance from the mid-parent mean being completely confounded with a single residual on the underlying liability scale. For threshold models, residual variance on the underlying scale is not identifiable. Hence, variance of fully confounded Mendelian sampling deviations cannot be identified either, but can be inferred from the between-family variation. In the new algorithm, breeding values are sampled as in a standard animal model using the full relationship matrix, but genetic (covariance components are inferred from the sampled breeding values and relationships between "informative" individuals (usually parents only. The latter is analogous to a sire-dam model (in cases with no individual records on the parents. Results When applied to simulated data sets, the standard animal threshold model failed to produce useful results since samples of genetic variance always drifted towards infinity, while the new algorithm produced proper parameter estimates essentially identical to the results from a sire-dam model (given the fact that no individual records exist for the parents. Furthermore, the new algorithm showed much faster Markov chain mixing properties for genetic parameters (similar to
Zhang, Kui; Busov, Victor; Wei, Hairong
2017-01-01
Background Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. Results A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. Conclusions BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental
Bayer, Christian
2016-02-20
© 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
Vilanova, Pedro
2016-01-07
In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset
2017-01-06
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended.
Inferring regulatory networks from expression data using tree-based methods.
Directory of Open Access Journals (Sweden)
Vân Anh Huynh-Thu
Full Text Available One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene is predicted from the expression patterns of all the other genes (input genes, using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.
Ling, Steve S H; Nguyen, Hung T
2011-03-01
Hypoglycemia or low blood glucose is dangerous and can result in unconsciousness, seizures, and even death. It is a common and serious side effect of insulin therapy in patients with diabetes. Hypoglycemic monitor is a noninvasive monitor that measures some physiological parameters continuously to provide detection of hypoglycemic episodes in type 1 diabetes mellitus patients (T1DM). Based on heart rate (HR), corrected QT interval of the ECG signal, change of HR, and the change of corrected QT interval, we develop a genetic algorithm (GA)-based multiple regression with fuzzy inference system (FIS) to classify the presence of hypoglycemic episodes. GA is used to find the optimal fuzzy rules and membership functions of FIS and the model parameters of regression method. From a clinical study of 16 children with T1DM, natural occurrence of nocturnal hypoglycemic episodes is associated with HRs and corrected QT intervals. The overall data were organized into a training set (eight patients) and a testing set (another eight patients) randomly selected. The results show that the proposed algorithm performs a good sensitivity with an acceptable specificity.
Directory of Open Access Journals (Sweden)
Mohammad Subhi Al-batah
2014-01-01
Full Text Available To date, cancer of uterine cervix is still a leading cause of cancer-related deaths in women worldwide. The current methods (i.e., Pap smear and liquid-based cytology (LBC to screen for cervical cancer are time-consuming and dependent on the skill of the cytopathologist and thus are rather subjective. Therefore, this paper presents an intelligent computer vision system to assist pathologists in overcoming these problems and, consequently, produce more accurate results. The developed system consists of two stages. In the first stage, the automatic features extraction (AFE algorithm is performed. In the second stage, a neuro-fuzzy model called multiple adaptive neuro-fuzzy inference system (MANFIS is proposed for recognition process. The MANFIS contains a set of ANFIS models which are arranged in parallel combination to produce a model with multi-input-multioutput structure. The system is capable of classifying cervical cell image into three groups, namely, normal, low-grade squamous intraepithelial lesion (LSIL and high-grade squamous intraepithelial lesion (HSIL. The experimental results prove the capability of the AFE algorithm to be as effective as the manual extraction by human experts, while the proposed MANFIS produces a good classification performance with 94.2% accuracy.
A bayesian framework that integrates heterogeneous data for inferring gene regulatory networks.
Santra, Tapesh
2014-01-01
Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein-protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.
A Bayesian Framework that integrates heterogeneous data for inferring gene regulatory networks
Directory of Open Access Journals (Sweden)
Tapesh eSantra
2014-05-01
Full Text Available Reconstruction of gene regulatory networks (GRNs from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein protein interactions with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS and physical protein interactions (PPI among transcription factors (TFs in a Bayesian Variable Selection (BVS algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of LASSO regression based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression based method in some circumstances.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
Directory of Open Access Journals (Sweden)
Thomas Paul D
2010-06-01
Full Text Available Abstract Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events, which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees
A practical algorithm to infer soil and foliage component temperatures from bi-angular ATSR-2 data
Jia, L.; Li, Z.L.; Menenti, M.; Su, Z.; Verhoef, W.; Wan, Z.
2003-01-01
An operational algorithm is proposed to retrieve soil and foliage component temperatures over heterogeneous land surface based on the analysis of bi-angular multi-spectral observations made by ATSR-2. Firstly, on the basis of the radiative transfer theory in a canopy, a model is developed to infer t
Ancestral state reconstruction by comparative analysis of a GRN kernel operating in echinoderms.
Erkenbrack, Eric M; Ako-Asare, Kayla; Miller, Emily; Tekelenburg, Saira; Thompson, Jeffrey R; Romano, Laura
2016-01-01
Diverse sampling of organisms across the five major classes in the phylum Echinodermata is beginning to reveal much about the structure and function of gene regulatory networks (GRNs) in development and evolution. Sea urchins are the most studied clade within this phylum, and recent work suggests there has been dramatic rewiring at the top of the skeletogenic GRN along the lineage leading to extant members of the euechinoid sea urchins. Such rewiring likely accounts for some of the observed developmental differences between the two major subclasses of sea urchins-cidaroids and euechinoids. To address effects of topmost rewiring on downstream GRN events, we cloned four downstream regulatory genes within the skeletogenic GRN and surveyed their spatiotemporal expression patterns in the cidaroid Eucidaris tribuloides. We performed phylogenetic analyses with homologs from other non-vertebrate deuterostomes and characterized their spatiotemporal expression by quantitative polymerase chain reaction (qPCR) and whole-mount in situ hybridization (WMISH). Our data suggest the erg-hex-tgif subcircuit, a putative GRN kernel, exhibits a mesoderm-specific expression pattern early in Eucidaris development that is directly downstream of the initial mesodermal GRN circuitry. Comparative analysis of the expression of this subcircuit in four echinoderm taxa allowed robust ancestral state reconstruction, supporting hypotheses that its ancestral function was to stabilize the mesodermal regulatory state and that it has been co-opted and deployed as a unit in mesodermal subdomains in distantly diverged echinoderms. Importantly, our study supports the notion that GRN kernels exhibit structural and functional modularity, locking down and stabilizing clade-specific, embryonic regulatory states.
Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
Directory of Open Access Journals (Sweden)
Izquierdo-Carrasco Fernando
2011-12-01
Full Text Available Abstract Background The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. Results We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728 that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. Conclusions We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.
Telomerase inhibitor Imetelstat (GRN163L limits the lifespan of human pancreatic cancer cells.
Directory of Open Access Journals (Sweden)
Katrina M Burchett
Full Text Available Telomerase is required for the unlimited lifespan of cancer cells. The vast majority of pancreatic adenocarcinomas overexpress telomerase activity and blocking telomerase could limit their lifespan. GRN163L (Imetelstat is a lipid-conjugated N3'→P5' thio-phosphoramidate oligonucleotide that blocks the template region of telomerase. The aim of this study was to define the effects of long-term GRN163L exposure on the maintenance of telomeres and lifespan of pancreatic cancer cells. Telomere size, telomerase activity, and telomerase inhibition response to GRN163L were measured in a panel of 10 pancreatic cancer cell lines. The cell lines exhibited large differences in levels of telomerase activity (46-fold variation, but most lines had very short telomeres (2-3 kb in size. GRN163L inhibited telomerase in all 10 pancreatic cancer cell lines, with IC50 ranging from 50 nM to 200 nM. Continuous GRN163L exposure of CAPAN1 (IC50 = 75 nM and CD18 cells (IC50 = 204 nM resulted in an initial rapid shortening of the telomeres followed by the maintenance of extremely short but stable telomeres. Continuous exposure to the drug eventually led to crisis and to a complete loss of viability after 47 (CAPAN1 and 69 (CD18 doublings. Crisis In these cells was accompanied by activation of a DNA damage response (γ-H2AX and evidence of both senescence (SA-β-galactosidase activity and apoptosis (sub-G1 DNA content, PARP cleavage. Removal of the drug after long-term GRN163L exposure led to a reactivation of telomerase and re-elongation of telomeres in the third week of cultivation without GRN163L. These findings show that the lifespan of pancreatic cancer cells can be limited by continuous telomerase inhibition. These results should facilitate the design of future clinical trials of GRN163L in patients with pancreatic cancer.
Telomerase inhibitor Imetelstat (GRN163L) limits the lifespan of human pancreatic cancer cells.
Burchett, Katrina M; Yan, Ying; Ouellette, Michel M
2014-01-01
Telomerase is required for the unlimited lifespan of cancer cells. The vast majority of pancreatic adenocarcinomas overexpress telomerase activity and blocking telomerase could limit their lifespan. GRN163L (Imetelstat) is a lipid-conjugated N3'→P5' thio-phosphoramidate oligonucleotide that blocks the template region of telomerase. The aim of this study was to define the effects of long-term GRN163L exposure on the maintenance of telomeres and lifespan of pancreatic cancer cells. Telomere size, telomerase activity, and telomerase inhibition response to GRN163L were measured in a panel of 10 pancreatic cancer cell lines. The cell lines exhibited large differences in levels of telomerase activity (46-fold variation), but most lines had very short telomeres (2-3 kb in size). GRN163L inhibited telomerase in all 10 pancreatic cancer cell lines, with IC50 ranging from 50 nM to 200 nM. Continuous GRN163L exposure of CAPAN1 (IC50 = 75 nM) and CD18 cells (IC50 = 204 nM) resulted in an initial rapid shortening of the telomeres followed by the maintenance of extremely short but stable telomeres. Continuous exposure to the drug eventually led to crisis and to a complete loss of viability after 47 (CAPAN1) and 69 (CD18) doublings. Crisis In these cells was accompanied by activation of a DNA damage response (γ-H2AX) and evidence of both senescence (SA-β-galactosidase activity) and apoptosis (sub-G1 DNA content, PARP cleavage). Removal of the drug after long-term GRN163L exposure led to a reactivation of telomerase and re-elongation of telomeres in the third week of cultivation without GRN163L. These findings show that the lifespan of pancreatic cancer cells can be limited by continuous telomerase inhibition. These results should facilitate the design of future clinical trials of GRN163L in patients with pancreatic cancer.
Directory of Open Access Journals (Sweden)
KAMPOUROPOULOS, K.
2014-02-01
Full Text Available This document presents an energy forecast methodology using Adaptive Neuro-Fuzzy Inference System (ANFIS and Genetic Algorithms (GA. The GA has been used for the selection of the training inputs of the ANFIS in order to minimize the training result error. The presented algorithm has been installed and it is being operating in an automotive manufacturing plant. It periodically communicates with the plant to obtain new information and update the database in order to improve its training results. Finally the obtained results of the algorithm are used in order to provide a short-term load forecasting for the different modeled consumption processes.
de Matos Simoes, Ricardo; Dalleau, Sabine; Williamson, Kate E; Emmert-Streib, Frank
2015-05-14
Urothelial pathogenesis is a complex process driven by an underlying network of interconnected genes. The identification of novel genomic target regions and gene targets that drive urothelial carcinogenesis is crucial in order to improve our current limited understanding of urothelial cancer (UC) on the molecular level. The inference of genome-wide gene regulatory networks (GRN) from large-scale gene expression data provides a promising approach for a detailed investigation of the underlying network structure associated to urothelial carcinogenesis. In our study we inferred and compared three GRNs by the application of the BC3Net inference algorithm to large-scale transitional cell carcinoma gene expression data sets from Illumina RNAseq (179 samples), Illumina Bead arrays (165 samples) and Affymetrix Oligo microarrays (188 samples). We investigated the structural and functional properties of GRNs for the identification of molecular targets associated to urothelial cancer. We found that the urothelial cancer (UC) GRNs show a significant enrichment of subnetworks that are associated with known cancer hallmarks including cell cycle, immune response, signaling, differentiation and translation. Interestingly, the most prominent subnetworks of co-located genes were found on chromosome regions 5q31.3 (RNAseq), 8q24.3 (Oligo) and 1q23.3 (Bead), which all represent known genomic regions frequently deregulated or aberated in urothelial cancer and other cancer types. Furthermore, the identified hub genes of the individual GRNs, e.g., HID1/DMC1 (tumor development), RNF17/TDRD4 (cancer antigen) and CYP4A11 (angiogenesis/ metastasis) are known cancer associated markers. The GRNs were highly dataset specific on the interaction level between individual genes, but showed large similarities on the biological function level represented by subnetworks. Remarkably, the RNAseq UC GRN showed twice the proportion of significant functional subnetworks. Based on our analysis of inferential
Developing Backward Chaining Algorithm of Inference Engine in Ternary Grid Expert System
Directory of Open Access Journals (Sweden)
Yuliadi Erdani
2012-09-01
Full Text Available The inference engine is one of main components of expert system that influences the performance of expert system. The task of inference engine is to give answers and reasons to users by inference the knowledge of expert system. Since the idea of ternary grid issued in 2004, there is only several developed method, technique or engine working on ternary grid knowledge model. The in 2010 developed inference engine is less efficient because it works based on iterative process. The in 2011 developed inference engine works statically and quite expensive to compute. In order to improve the previous inference methods, a new inference engine has been developed. It works based on backward chaining process in ternary grid expert system. This paper describes the development of inference engine of expert system that can work in ternary grid knowledge model. The strategy to inference knowledge uses backward chaining with recursive process. The design result is implemented in the form of software. The result of experiment shows that the inference process works properly, dynamically and more efficient to compute in comparison to the previous developed methods.
Progress in Telomerase Inhibitor GRN163L in the Treatment for Cancer%端粒酶抑制剂GRN163L在肿瘤治疗中的研究进展
Institute of Scientific and Technical Information of China (English)
喻嫦娥; 余忠华
2013-01-01
GRN163L是一种针对hTR (端粒酶RNA)的反义核苷酸药物,它能够抑制端粒酶的活性,引起肿瘤染色体末端端粒长度的不断缩短,诱导肿瘤细胞停止分裂.作为针对hTR基因靶向治疗的前沿药物,GRN163L逐渐成为临床研究的热点.全文拟从端粒及端粒酶、GRN 163L的理化性质、生物学功能及肿瘤治疗的相关研究进展作一综述.%GRN163L (Geron Presentations on Imetelstat) is an antisense nueleotide drugs arm-ming at hTR (telomerase RNA),which can inhibit the activity of telomerase, cause telomere length of tumor chromosome ends constantly shortened and induce tumor cells to stop splitting. As a therapy forefront drug targetting the hTR gene,GRN163L gradually become the focus of clinical research. This paper will expound telomeres and telomerase,phy scal and chemical properties of GRN163L,biological function of GRN163L and research progre is of tumor therapy.
Message-Passing Algorithms for Channel Estimation and Decoding Using Approximate Inference
DEFF Research Database (Denmark)
Badiu, Mihai Alin; Kirkelund, Gunvor Elisabeth; Manchón, Carles Navarro
2012-01-01
We design iterative receiver schemes for a generic communication system by treating channel estimation and information decoding as an inference problem in graphical models. We introduce a recently proposed inference framework that combines belief propagation (BP) and the mean field (MF) approxima...
Wang, Xiaoxiao; Wang, Huan; Huang, Jinfeng; Zhou, Yifeng; Tzvetanov, Tzvetomir
2017-01-01
The contrast sensitivity function that spans the two dimensions of contrast and spatial frequency is crucial in predicting functional vision both in research and clinical applications. In this study, the use of Bayesian inference was proposed to determine the parameters of the two-dimensional contrast sensitivity function. Two-dimensional Bayesian inference was extensively simulated in comparison to classical one-dimensional measures. Its performance on two-dimensional data gathered with different sampling algorithms was also investigated. The results showed that the two-dimensional Bayesian inference method significantly improved the accuracy and precision of the contrast sensitivity function, as compared to the more common one-dimensional estimates. In addition, applying two-dimensional Bayesian estimation to the final data set showed similar levels of reliability and efficiency across widely disparate and established sampling methods (from classical one-dimensional sampling, such as Ψ or staircase, to more novel multi-dimensional sampling methods, such as quick contrast sensitivity function and Fisher information gain). Furthermore, the improvements observed following the application of Bayesian inference were maintained even when the prior poorly matched the subject's contrast sensitivity function. Simulation results were confirmed in a psychophysical experiment. The results indicated that two-dimensional Bayesian inference of contrast sensitivity function data provides similar estimates across a wide range of sampling methods. The present study likely has implications for the measurement of contrast sensitivity function in various settings (including research and clinical settings) and would facilitate the comparison of existing data from previous studies. PMID:28119563
Papatpremsiri, Atiroch; Smout, Michael J; Loukas, Alex; Brindley, Paul J; Sripa, Banchob; Laha, Thewarach
2015-01-01
Multistep processes likely underlie cholangiocarcinogenesis induced by chronic infection with the fish-borne liver fluke, Opisthorchis viverrini. One process appears to be cellular proliferation of the host bile duct epithelia driven by excretory-secretory (ES) products of this pathogen. Specifically, the secreted growth factor Ov-GRN-1, a liver fluke granulin, is a prominent component of ES and a known driver of hyper-proliferation of cultured human and mouse cells in vitro. We show potent hyper-proliferation of human cholangiocytes induced by low nanomolar levels of recombinant Ov-GRN-1 and similar growth produced by low microgram concentrations of ES products and soluble lysates of the adult worm. To further explore the influence of Ov-GRN-1 on the flukes and the host cells, expression of Ov-grn-1 was repressed using RNA interference. Expression of Ov-grn-1 was suppressed by 95% by day 3 and by ~100% by day 7. Co-culture of Ov-grn-1 suppressed flukes with human cholangiocyte (H-69) or human cholangiocarcinoma (KKU-M214) cell lines retarded cell hyper-proliferation by 25% and 92%, respectively. Intriguingly, flukes in which expression of Ov-grn-1 was repressed were less viable in culture, suggesting that Ov-GRN-1 is an essential growth factor for survival of the adult stage of O. viverrini, at least in vitro. To summarize, specific knock down of Ov-grn-1 reduced in vitro survival and capacity of ES products to drive host cell proliferation. These findings may help to contribute to a deeper understanding of liver fluke induced cholangiocarcinogenesis.
Institute of Scientific and Technical Information of China (English)
Yongxi Cheng; Hadi Sabaa; Zhipeng Cai; Randy Goebel; Guohui Lin
2009-01-01
An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases where some pedigree founders are not genotyped, provided that for each nuclear family at least one parent is genotyped and each non-genotyped founder appears in exactly one nuclear family. The importance of this generalization lies in that such cases frequently happen in real data, because some founders may have passed away and their genotype data can no longer be collected. The algorithm runs in O(m3n3) time, where m is the number of single nucleotide polymorphism (SNP) loci under consideration and n is the number of genotyped members in the pedigree. This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites, or equivalently, the number of maximal zero-recombination chromosomal regions. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m3n3) time in a novel incremental fashion,here m denotes the total number of SNP loci along the chromosome.
A Formalism and an Algorithm for Computing Pragmatic Inferences and Detecting Infelicities
Marcu, D
1999-01-01
Since Austin introduced the term ``infelicity'', the linguistic literature has been flooded with its use, but no formal or computational explanation has been given for it. This thesis provides one for those infelicities that occur when a pragmatic inference is cancelled. Our contribution assumes the existence of a finer grained taxonomy with respect to pragmatic inferences. It is shown that if one wants to account for the natural language expressiveness, one should distinguish between pragmatic inferences that are felicitous to defeat and pragmatic inferences that are infelicitously defeasible. Thus, it is shown that one should consider at least three types of information: indefeasible, felicitously defeasible, and infelicitously defeasible. The cancellation of the last of these determines the pragmatic infelicities. A new formalism has been devised to accommodate the three levels of information, called ``stratified logic''. Within it, we are able to express formally notions such as ``utterance U presupposes ...
Directory of Open Access Journals (Sweden)
A. A. Zolotin
2015-07-01
Full Text Available Posteriori inference is one of the three kinds of probabilistic-logic inferences in the probabilistic graphical models theory and the base for processing of knowledge patterns with probabilistic uncertainty using Bayesian networks. The paper deals with a task of local posteriori inference description in algebraic Bayesian networks that represent a class of probabilistic graphical models by means of matrix-vector equations. The latter are essentially based on the use of tensor product of matrices, Kronecker degree and Hadamard product. Matrix equations for calculating posteriori probabilities vectors within posteriori inference in knowledge patterns with quanta propositions are obtained. Similar equations of the same type have already been discussed within the confines of the theory of algebraic Bayesian networks, but they were built only for the case of posteriori inference in the knowledge patterns on the ideals of conjuncts. During synthesis and development of matrix-vector equations on quanta propositions probability vectors, a number of earlier results concerning normalizing factors in posteriori inference and assignment of linear projective operator with a selector vector was adapted. We consider all three types of incoming evidences - deterministic, stochastic and inaccurate - combined with scalar and interval estimation of probability truth of propositional formulas in the knowledge patterns. Linear programming problems are formed. Their solution gives the desired interval values of posterior probabilities in the case of inaccurate evidence or interval estimates in a knowledge pattern. That sort of description of a posteriori inference gives the possibility to extend the set of knowledge pattern types that we can use in the local and global posteriori inference, as well as simplify complex software implementation by use of existing third-party libraries, effectively supporting submission and processing of matrices and vectors when
K. Vela Velupillai
2010-01-01
The digital and information technology revolutions are based on algorithmic mathematics in many of their alternative forms. Algorithmic mathematics per se is not necessarily underpinned by the digital or the discrete only; analogue traditions of algorithmic mathematics have a noble pedigree, even in economics. Constructive mathematics of any variety, computability theory and non-standard analysis are intrinsically algorithmic at their foundations. Economic theory, game theory and mathematical...
How difficult is inference of mammalian causal gene regulatory networks?
Directory of Open Access Journals (Sweden)
Djordje Djordjevic
Full Text Available Gene regulatory networks (GRNs play a central role in systems biology, especially in the study of mammalian organ development. One key question remains largely unanswered: Is it possible to infer mammalian causal GRNs using observable gene co-expression patterns alone? We assembled two mouse GRN datasets (embryonic tooth and heart and matching microarray gene expression profiles to systematically investigate the difficulties of mammalian causal GRN inference. The GRNs were assembled based on > 2,000 pieces of experimental genetic perturbation evidence from manually reading > 150 primary research articles. Each piece of perturbation evidence records the qualitative change of the expression of one gene following knock-down or over-expression of another gene. Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect, which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges. Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult. Nonetheless, if we have expression profiling data from genetic or molecular perturbation experiments, such as gene knock-out or signalling stimulation, it is possible to use the set of differentially expressed genes to recover causal regulatory relationships with good sensitivity and specificity. Our result supports the importance of using perturbation experimental data in causal network reconstruction. Furthermore, we showed that causal gene regulatory relationship can be highly cell type or developmental stage specific, suggesting the importance of employing expression profiles from homogeneous cell populations. This study provides essential datasets and empirical evidence to guide the development of new GRN inference
How difficult is inference of mammalian causal gene regulatory networks?
Djordjevic, Djordje; Yang, Andrian; Zadoorian, Armella; Rungrugeecharoen, Kevin; Ho, Joshua W K
2014-01-01
Gene regulatory networks (GRNs) play a central role in systems biology, especially in the study of mammalian organ development. One key question remains largely unanswered: Is it possible to infer mammalian causal GRNs using observable gene co-expression patterns alone? We assembled two mouse GRN datasets (embryonic tooth and heart) and matching microarray gene expression profiles to systematically investigate the difficulties of mammalian causal GRN inference. The GRNs were assembled based on > 2,000 pieces of experimental genetic perturbation evidence from manually reading > 150 primary research articles. Each piece of perturbation evidence records the qualitative change of the expression of one gene following knock-down or over-expression of another gene. Our data have thorough annotation of tissue types and embryonic stages, as well as the type of regulation (activation, inhibition and no effect), which uniquely allows us to estimate both sensitivity and specificity of the inference of tissue specific causal GRN edges. Using these unprecedented datasets, we found that gene co-expression does not reliably distinguish true positive from false positive interactions, making inference of GRN in mammalian development very difficult. Nonetheless, if we have expression profiling data from genetic or molecular perturbation experiments, such as gene knock-out or signalling stimulation, it is possible to use the set of differentially expressed genes to recover causal regulatory relationships with good sensitivity and specificity. Our result supports the importance of using perturbation experimental data in causal network reconstruction. Furthermore, we showed that causal gene regulatory relationship can be highly cell type or developmental stage specific, suggesting the importance of employing expression profiles from homogeneous cell populations. This study provides essential datasets and empirical evidence to guide the development of new GRN inference methods for
Institute of Scientific and Technical Information of China (English)
谭冠政; 曾庆冬; 李文斌
2004-01-01
A designing method of intelligent proportional-integral-derivative(PID) controllers was proposed based on the ant system algorithm and fuzzy inference. This kind of controller is called Fuzzy-ant system PID controller. It consists of an off-line part and an on-line part. In the off-line part, for a given control system with a PID controller,by taking the overshoot, setting time and steady-state error of the system unit step response as the performance indexes and by using the ant system algorithm, a group of optimal PID parameters K*p , Ti* and T*d can be obtained, which are used as the initial values for the on-line tuning of PID parameters. In the on-line part, based on Kp* , Ti*and Td* and according to the current system error e and its time derivative, a specific program is written, which is used to optimize and adjust the PID parameters on-line through a fuzzy inference mechanism to ensure that the system response has optimal transient and steady-state performance. This kind of intelligent PID controller can be used to control the motor of the intelligent bionic artificial leg designed by the authors. The result of computer simulation experiment shows that the controller has less overshoot and shorter setting time.
A Novel Graph-based Algorithm to Infer Recurrent Copy Number Variations in Cancer
Chi, Chen; Ajwad, Rasif; Kuang, Qin; Hu, Pingzhao
2016-01-01
Many cancers have been linked to copy number variations (CNVs) in the genomic DNA. Although there are existing methods to analyze CNVs from individual samples, cancer-causing genes are more frequently discovered in regions where CNVs are common among tumor samples, also known as recurrent CNVs. Integrating multiple samples and locating recurrent CNV regions remain a challenge, both computationally and conceptually. We propose a new graph-based algorithm for identifying recurrent CNVs using the maximal clique detection technique. The algorithm has an optimal solution, which means all maximal cliques can be identified, and guarantees that the identified CNV regions are the most frequent and that the minimal regions have been delineated among tumor samples. The algorithm has successfully been applied to analyze a large cohort of breast cancer samples and identified some breast cancer-associated genes and pathways.
The fuzzy logic algorithm has the ability to describe knowledge in a descriptive human-like manner in the form of simple rules using linguistic variables, and provides a new way of modeling uncertain or naturally fuzzy hydrological processes like non-linear rainfall-runoff relationships. Fuzzy infe...
Nitrate leaching from a potato field using fuzzy inference system combined with genetic algorithm
DEFF Research Database (Denmark)
Shekofteh, Hosein; Afyuni, Majid M; Hajabbasi, Mohammad-Ali
2012-01-01
in MFIS were tuned by Genetic Algorithm. The correlation coefficient, normalized root mean square error and relative mean absolute error percentage between the data obtained by HYDRUS-2D and the estimated values using MFIS model were 0.986, 0.086 and 2.38 respectively. It appears that MFIS can predict...
Nitrate leaching from a potato field using fuzzy inference system combined with genetic algorithm
DEFF Research Database (Denmark)
Shekofteh, Hosein; Afyuni, Majid M; Hajabbasi, Mohammad-Ali
2012-01-01
in MFIS were tuned by Genetic Algorithm. The correlation coefficient, normalized root mean square error and relative mean absolute error percentage between the data obtained by HYDRUS-2D and the estimated values using MFIS model were 0.986, 0.086 and 2.38 respectively. It appears that MFIS can predict...
Energy Technology Data Exchange (ETDEWEB)
Chin, George; Choudhury, Sutanay; Kangas, Lars J.; McFarlane, Sally A.; Marquez, Andres
2011-09-01
Long viewed as a strong statistical inference technique, Bayesian networks have emerged to be an important class of applications for high-performance computing. We have applied an architecture-conscious approach to parallelizing the Lauritzen-Spiegelhalter Junction Tree algorithm for exact inferencing in Bayesian networks. In optimizing the Junction Tree algorithm, we have implemented both in-clique and topological parallelism strategies to best leverage the fine-grained synchronization and massive-scale multithreading of the Cray XMT architecture. Two topological techniques were developed to parallelize the evidence propagation process through the Bayesian network. One technique involves performing intelligent scheduling of junction tree nodes based on its topology and relative size. The second technique involves decomposing the junction tree into a much finer tree-like representation to offer much more opportunities for parallelism. We evaluate these optimizations on five different Bayesian networks and report our findings and observations. Another important contribution of this paper is to demonstrate the application of massive-scale multithreading for load balancing and use of implicit parallelism-based compiler optimizations in designing scalable inferencing algorithms.
Chen, Zhe; Vijayan, Sujith; Barbieri, Riccardo; Wilson, Matthew A; Brown, Emery N
2009-07-01
UP and DOWN states, the periodic fluctuations between increased and decreased spiking activity of a neuronal population, are a fundamental feature of cortical circuits. Understanding UP-DOWN state dynamics is important for understanding how these circuits represent and transmit information in the brain. To date, limited work has been done on characterizing the stochastic properties of UP-DOWN state dynamics. We present a set of Markov and semi-Markov discrete- and continuous-time probability models for estimating UP and DOWN states from multiunit neural spiking activity. We model multiunit neural spiking activity as a stochastic point process, modulated by the hidden (UP and DOWN) states and the ensemble spiking history. We estimate jointly the hidden states and the model parameters by maximum likelihood using an expectation-maximization (EM) algorithm and a Monte Carlo EM algorithm that uses reversible-jump Markov chain Monte Carlo sampling in the E-step. We apply our models and algorithms in the analysis of both simulated multiunit spiking activity and actual multi- unit spiking activity recorded from primary somatosensory cortex in a behaving rat during slow-wave sleep. Our approach provides a statistical characterization of UP-DOWN state dynamics that can serve as a basis for verifying and refining mechanistic descriptions of this process.
Comparison of algorithms to infer genetic population structure from unlinked molecular markers.
Peña-Malavera, Andrea; Bruno, Cecilia; Fernandez, Elmer; Balzarini, Monica
2014-08-01
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Anderson, Eric C
2012-11-08
Advances in genotyping that allow tens of thousands of individuals to be genotyped at a moderate number of single nucleotide polymorphisms (SNPs) permit parentage inference to be pursued on a very large scale. The intergenerational tagging this capacity allows is revolutionizing the management of cultured organisms (cows, salmon, etc.) and is poised to do the same for scientific studies of natural populations. Currently, however, there are no likelihood-based methods of parentage inference which are implemented in a manner that allows them to quickly handle a very large number of potential parents or parent pairs. Here we introduce an efficient likelihood-based method applicable to the specialized case of cultured organisms in which both parents can be reliably sampled. We develop a Markov chain representation for the cumulative number of Mendelian incompatibilities between an offspring and its putative parents and we exploit it to develop a fast algorithm for simulation-based estimates of statistical confidence in SNP-based assignments of offspring to pairs of parents. The method is implemented in the freely available software SNPPIT. We describe the method in detail, then assess its performance in a large simulation study using known allele frequencies at 96 SNPs from ten hatchery salmon populations. The simulations verify that the method is fast and accurate and that 96 well-chosen SNPs can provide sufficient power to identify the correct pair of parents from amongst millions of candidate pairs.
Particle MCMC algorithms and architectures for accelerating inference in state-space models.
Mingas, Grigorios; Bottolo, Leonardo; Bouganis, Christos-Savvas
2017-04-01
Particle Markov Chain Monte Carlo (pMCMC) is a stochastic algorithm designed to generate samples from a probability distribution, when the density of the distribution does not admit a closed form expression. pMCMC is most commonly used to sample from the Bayesian posterior distribution in State-Space Models (SSMs), a class of probabilistic models used in numerous scientific applications. Nevertheless, this task is prohibitive when dealing with complex SSMs with massive data, due to the high computational cost of pMCMC and its poor performance when the posterior exhibits multi-modality. This paper aims to address both issues by: 1) Proposing a novel pMCMC algorithm (denoted ppMCMC), which uses multiple Markov chains (instead of the one used by pMCMC) to improve sampling efficiency for multi-modal posteriors, 2) Introducing custom, parallel hardware architectures, which are tailored for pMCMC and ppMCMC. The architectures are implemented on Field Programmable Gate Arrays (FPGAs), a type of hardware accelerator with massive parallelization capabilities. The new algorithm and the two FPGA architectures are evaluated using a large-scale case study from genetics. Results indicate that ppMCMC achieves 1.96x higher sampling efficiency than pMCMC when using sequential CPU implementations. The FPGA architecture of pMCMC is 12.1x and 10.1x faster than state-of-the-art, parallel CPU and GPU implementations of pMCMC and up to 53x more energy efficient; the FPGA architecture of ppMCMC increases these speedups to 34.9x and 41.8x respectively and is 173x more power efficient, bringing previously intractable SSM-based data analyses within reach.
Optimization of BP Algorithm in Repeated Inference%多次推理中的BP算法优化
Institute of Scientific and Technical Information of China (English)
吴孝滨; 任志平
2011-01-01
使用BP算法求解效用最大化问题时,容易产生大量冗余计算.为此,对标准BP算法进行优化,在推理过程中,对一些受限定条件影响较小的结点,直接利用前次推理结果,无需重新计算其边缘概率,并证明这种优化不会显著影响推理结果.将该算法应用于组合竞拍模型进行测试.仿真结果表明,相对于标准BP算法,该优化算法能提升求解效用最大化问题时的收敛效率.%It always produces redundant computation when applying Belief Propagation(BP) algorithm to MEU problem. To improve such a situation, an optimization to BP is proposed that reuse last influent result instead of recomputing the marginal probability if the node is not significantly affected by the change of condition, and prove is given to confirm that the optimization does not significantly affect the influence result. A simulation by applying the optimized algorithm to combinatorial auctions gives the result that compare to the standard BP algorism, optimized algorism efficiently improve the efficiency of inference without significantly affect the quality of the solutions.
Directory of Open Access Journals (Sweden)
Jing Li
2017-01-01
Full Text Available The goal of this study is to improve thermal comfort and indoor air quality with the adaptive network-based fuzzy inference system (ANFIS model and improved particle swarm optimization (PSO algorithm. A method to optimize air conditioning parameters and installation distance is proposed. The methodology is demonstrated through a prototype case, which corresponds to a typical laboratory in colleges and universities. A laboratory model is established, and simulated flow field information is obtained with the CFD software. Subsequently, the ANFIS model is employed instead of the CFD model to predict indoor flow parameters, and the CFD database is utilized to train ANN input-output “metamodels” for the subsequent optimization. With the improved PSO algorithm and the stratified sequence method, the objective functions are optimized. The functions comprise PMV, PPD, and mean age of air. The optimal installation distance is determined with the hemisphere model. Results show that most of the staff obtain a satisfactory degree of thermal comfort and that the proposed method can significantly reduce the cost of building an experimental device. The proposed methodology can be used to determine appropriate air supply parameters and air conditioner installation position for a pleasant and healthy indoor environment.
"Antelope": a hybrid-logic model checker for branching-time Boolean GRN analysis
Directory of Open Access Journals (Sweden)
Arellano Gustavo
2011-12-01
Full Text Available Abstract Background In Thomas' formalism for modeling gene regulatory networks (GRNs, branching time, where a state can have more than one possible future, plays a prominent role. By representing a certain degree of unpredictability, branching time can model several important phenomena, such as (a asynchrony, (b incompletely specified behavior, and (c interaction with the environment. Introducing more than one possible future for a state, however, creates a difficulty for ordinary simulators, because infinitely many paths may appear, limiting ordinary simulators to statistical conclusions. Model checkers for branching time, by contrast, are able to prove properties in the presence of infinitely many paths. Results We have developed Antelope ("Analysis of Networks through TEmporal-LOgic sPEcifications", http://turing.iimas.unam.mx:8080/AntelopeWEB/, a model checker for analyzing and constructing Boolean GRNs. Currently, software systems for Boolean GRNs use branching time almost exclusively for asynchrony. Antelope, by contrast, also uses branching time for incompletely specified behavior and environment interaction. We show the usefulness of modeling these two phenomena in the development of a Boolean GRN of the Arabidopsis thaliana root stem cell niche. There are two obstacles to a direct approach when applying model checking to Boolean GRN analysis. First, ordinary model checkers normally only verify whether or not a given set of model states has a given property. In comparison, a model checker for Boolean GRNs is preferable if it reports the set of states having a desired property. Second, for efficiency, the expressiveness of many model checkers is limited, resulting in the inability to express some interesting properties of Boolean GRNs. Antelope tries to overcome these two drawbacks: Apart from reporting the set of all states having a given property, our model checker can express, at the expense of efficiency, some properties that ordinary
Church, Jonathan R.
New condensed matter metrologies are being used to probe ever smaller length scales. In support of the diverse field of materials research synchrotron based spectroscopies provide sub-micron spatial resolutions and a breadth of photon wavelengths for scientific studies. For electronic materials the thinnest layers in a complementary metal-oxide-semiconductor (CMOS) device have been reduced to just a few nanometers. This raises concerns for layer uniformity, complete surface coverage, and interfacial quality. Deposition processes like chemical vapor deposition (CVD) and atomic layer deposition (ALD) have been shown to deposit the needed high-quality films for the requisite thicknesses. However, new materials beget new chemistries and, unfortunately, unwanted side-reactions and by-products. CVD/ALD tools and chemical precursors provided by our collaborators at Air Liquide utilized these new chemistries and films were deposited for which novel spectroscopic characterization methods were used. The second portion of the thesis focuses on fading and decomposing paint pigments in iconic artworks. Efforts have been directed towards understanding the micro-environments causing degradation. Hard X-ray photoelectron spectroscopy (HAXPES) and variable kinetic energy X-ray photoelectron spectroscopy (VKE-XPS) are advanced XPS techniques capable of elucidating both chemical environments and electronic band structures in sub-surface regions of electronic materials. HAXPES has been used to study the electronic band structure in a typical CMOS structure; it will be shown that unexpected band alignments are associated with the presence of electronic charges near a buried interface. Additionally, a computational modeling algorithm, Bayes-Sim, was developed to reconstruct compositional depth profiles (CDP) using VKE-XPS data sets; a subset algorithm also reconstructs CDP from angle-resolved XPS data. Reconstructed CDP produced by Bayes-Sim were most strongly correlated to the real
Energy Technology Data Exchange (ETDEWEB)
McLoughlin, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-01-11
This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive feature of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.
Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F
2015-01-01
This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...
Cloning and Expression of Human Grn Gene%重组人Grn的克隆与表达
Institute of Scientific and Technical Information of China (English)
谌思; 粟艳林; 李悦; 彭小宁
2015-01-01
目的：构建人Grn基因原核表达载体并观察其表达。方法：从人单核细胞系THP-1细胞cDNA中扩增出Grn片段，通过酶切与连接，将Grn片段构建到pGEX-4T-2质粒载体上，再转化入感受态细胞TOP10，菌落PCR鉴定阳性克隆并测序分析，测序正确的重组质粒再转化入感受态细胞DE3中表达蛋白，用Western blot鉴定结果。结果：构建的pGEX-4T-2表达载体作PCR和双酶切鉴定，证实其中有目的片段完整插入，插入片段测序结果与Grn序列设计完全一致，将其转化入DE3中，Western blot结果表明在细菌裂解上清有分子质量为91KD的融合蛋白。结论：重组人Grn的克隆与表达成功，为进一步研究Grn的功能奠定了基础。%ObjectsTo construct a prokaryotic expression vector containing human Grn gene and observe its expression. MethodsThe gene Grn was amplifified by PCR from THP-1 cDNA. The cloned gene was digested by restrictive endonucle-ase and subcloned into eulcaryotic experssion vecter pGEX-4T-2. Grn had been cloned into pGEX-4T-2, and the recom-bined plasmid was transduced into TOP10. The positive cloning was identified by PCR and sequencing. The recombinant plasmid was transformed into competent cell of DE3. The expression of GRN was analyzed by Western blot.ResultsIt was verified by PCR and nucleotide sequencing that the constructed Grn eukaryotic vector was correct. After pGEX-4T-2 was transfected into DE3, a GST fusion protein molecule of91KD was found in supernatant of lysis of bactoria by Western blot. ConclusionA prokaryotic expression plasmid containing human grn gene is successfully constructed, and it can express out objectiveprotein, which has laid a concrete fundation for future study on grn.
Directory of Open Access Journals (Sweden)
Yaron Orenstein
Full Text Available The new technology of protein binding microarrays (PBMs allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.
Teimouri, Reza; Sohrabpoor, Hamed
2013-12-01
Electrochemical machining process (ECM) is increasing its importance due to some of the specific advantages which can be exploited during machining operation. The process offers several special privileges such as higher machining rate, better accuracy and control, and wider range of materials that can be machined. Contribution of too many predominate parameters in the process, makes its prediction and selection of optimal values really complex, especially while the process is programmized for machining of hard materials. In the present work in order to investigate effects of electrolyte concentration, electrolyte flow rate, applied voltage and feed rate on material removal rate (MRR) and surface roughness (SR) the adaptive neuro-fuzzy inference systems (ANFIS) have been used for creation predictive models based on experimental observations. Then the ANFIS 3D surfaces have been plotted for analyzing effects of process parameters on MRR and SR. Finally, the cuckoo optimization algorithm (COA) was used for selection solutions in which the process reaches maximum material removal rate and minimum surface roughness simultaneously. Results indicated that the ANFIS technique has superiority in modeling of MRR and SR with high prediction accuracy. Also, results obtained while applying of COA have been compared with those derived from confirmatory experiments which validate the applicability and suitability of the proposed techniques in enhancing the performance of ECM process.
Decelle, Aurélien; Ricci-Tersenghi, Federico
2014-02-21
In this Letter we propose a new method to infer the topology of the interaction network in pairwise models with Ising variables. By using the pseudolikelihood method (PLM) at high temperature, it is generally possible to distinguish between zero and nonzero couplings because a clear gap separate the two groups. However at lower temperatures the PLM is much less effective and the result depends on subjective choices, such as the value of the ℓ1 regularizer and that of the threshold to separate nonzero couplings from null ones. We introduce a decimation procedure based on the PLM that recursively sets to zero the less significant couplings, until the variation of the pseudolikelihood signals that relevant couplings are being removed. The new method is fully automated and does not require any subjective choice by the user. Numerical tests have been performed on a wide class of Ising models, having different topologies (from random graphs to finite dimensional lattices) and different couplings (both diluted ferromagnets in a field and spin glasses). These numerical results show that the new algorithm performs better than standard PLM.
Sleeman, J.; Halem, M.; Finin, T.; Cane, M. A.
2016-12-01
topics, we establish which chapter-citation pairs are most similar. We will perform posterior inferences based on Hastings -Metropolis simulated annealing MCMC algorithm to infer, from the evolution of topics starting from AR1 to AR4, assertions of topics for AR5 and potentially AR6.
Emergent adaptive behaviour of GRN-controlled simulated robots in a changing environment
Directory of Open Access Journals (Sweden)
Yao Yao
2016-12-01
Full Text Available We developed a bio-inspired robot controller combining an artificial genome with an agent-based control system. The genome encodes a gene regulatory network (GRN that is switched on by environmental cues and, following the rules of transcriptional regulation, provides output signals to actuators. Whereas the genome represents the full encoding of the transcriptional network, the agent-based system mimics the active regulatory network and signal transduction system also present in naturally occurring biological systems. Using such a design that separates the static from the conditionally active part of the gene regulatory network contributes to a better general adaptive behaviour. Here, we have explored the potential of our platform with respect to the evolution of adaptive behaviour, such as preying when food becomes scarce, in a complex and changing environment and show through simulations of swarm robots in an A-life environment that evolution of collective behaviour likely can be attributed to bio-inspired evolutionary processes acting at different levels, from the gene and the genome to the individual robot and robot population.
Directory of Open Access Journals (Sweden)
Yasser Abduallah
2017-01-01
Full Text Available Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs. Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.
Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L
2017-01-01
Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.
Directory of Open Access Journals (Sweden)
Bacha Jamil
2009-06-01
Full Text Available Abstract Background Biological processes are regulated by complex interactions between transcription factors and signalling molecules, collectively described as Genetic Regulatory Networks (GRNs. The characterisation of these networks to reveal regulatory mechanisms is a long-term goal of many laboratories. However compiling, visualising and interacting with such networks is non-trivial. Current tools and databases typically focus on GRNs within simple, single celled organisms. However, data is available within the literature describing regulatory interactions in multi-cellular organisms, although not in any systematic form. This is particularly true within the field of developmental biology, where regulatory interactions should also be tagged with information about the time and anatomical location of development in which they occur. Description We have developed myGRN (http://www.myGRN.org, a web application for storing and interrogating interaction data, with an emphasis on developmental processes. Users can submit interaction and gene expression data, either curated from published sources or derived from their own unpublished data. All interactions associated with publications are publicly visible, and unpublished interactions can only be shared between collaborating labs prior to publication. Users can group interactions into discrete networks based on specific biological processes. Various filters allow dynamic production of network diagrams based on a range of information including tissue location, developmental stage or basic topology. Individual networks can be viewed using myGRV, a tool focused on displaying developmental networks, or exported in a range of formats compatible with third party tools. Networks can also be analysed for the presence of common network motifs. We demonstrate the capabilities of myGRN using a network of zebrafish interactions integrated with expression data from the zebrafish database, ZFIN. Conclusion Here we
Bagging statistical network inference from large-scale gene expression data.
Ricardo de Matos Simoes; Frank Emmert-Streib
2012-01-01
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-sc...
Marcon, Gabriella; Rossi, Giacomina; Giaccone, Giorgio; Giovagnoli, Anna Rita; Piccoli, Elena; Zanini, Sergio; Geatti, Onelio; Toso, Vito; Grisoli, Marina; Tagliavini, Fabrizio
2011-01-01
Mutations in the progranulin gene (GRN) were recently identified as an important cause of familial frontotemporal dementia (FTD). More than 60 pathogenic mutations have been reported up to now and prominent phenotypic variability within and among affected kindreds has been described. We have studied an Italian family with clinical evidence of dementia, and here we report detailed clinical records, imaging, sequential neurological examinations, cognitive assessments, and genetic analysis of three affected members of the same generation. Genetic analysis revealed the presence of the null mutation IVS6 + 5_8delGTGA in GRN, leading to haploinsufficiency, as documented by mRNA analysis. The mutation is associated with wide variation of the clinical phenotype, ranging from FTD to Alzheimer's disease and to a rapidly-progressive dementia. In summary, the patients of this kindred showed highly variable clinical features that do not have a close correspondence with the pattern of the cerebral atrophy. Our data extend the phenotypic spectrum and the complexity of neurodegenerative diseases linked to GRN mutations.
Vargas Cardona, Hernán Darío; Orozco, Álvaro Ángel; Álvarez, Mauricio A
2013-01-01
Automatic identification of biosignals is one of the more studied fields in biomedical engineering. In this paper, we present an approach for the unsupervised recognition of biomedical signals: Microelectrode Recordings (MER) and Electrocardiography signals (ECG). The unsupervised learning is based in classic and bayesian estimation theory. We employ gaussian mixtures models with two estimation methods. The first is derived from the frequentist estimation theory, known as Expectation-Maximization (EM) algorithm. The second is obtained from bayesian probabilistic estimation and it is called variational inference. In this framework, both methods are used for parameters estimation of Gaussian mixtures. The mixtures models are used for unsupervised pattern classification, through the responsibility matrix. The algorithms are applied in two real databases acquired in Parkinson's disease surgeries and electrocardiograms. The results show an accuracy over 85% in MER and 90% in ECG for identification of two classes. These results are statistically equal or even better than parametric (Naive Bayes) and nonparametric classifiers (K-nearest neighbor).
Bal, Mert; Amasyali, M Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse
2014-01-01
The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.
Directory of Open Access Journals (Sweden)
Mert Bal
2014-01-01
Full Text Available The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets.
Bal, Mert; Amasyali, M. Fatih; Sever, Hayri; Kose, Guven; Demirhan, Ayse
2014-01-01
The importance of the decision support systems is increasingly supporting the decision making process in cases of uncertainty and the lack of information and they are widely used in various fields like engineering, finance, medicine, and so forth, Medical decision support systems help the healthcare personnel to select optimal method during the treatment of the patients. Decision support systems are intelligent software systems that support decision makers on their decisions. The design of decision support systems consists of four main subjects called inference mechanism, knowledge-base, explanation module, and active memory. Inference mechanism constitutes the basis of decision support systems. There are various methods that can be used in these mechanisms approaches. Some of these methods are decision trees, artificial neural networks, statistical methods, rule-based methods, and so forth. In decision support systems, those methods can be used separately or a hybrid system, and also combination of those methods. In this study, synthetic data with 10, 100, 1000, and 2000 records have been produced to reflect the probabilities on the ALARM network. The accuracy of 11 machine learning methods for the inference mechanism of medical decision support system is compared on various data sets. PMID:25295291
Directory of Open Access Journals (Sweden)
Brian Godsey
Full Text Available MicroRNAs (miRs are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging, there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.
Yang, Yahong; Zhao, Fuqing; Hong, Yi; Yu, Dongmei
2005-12-01
Integration of process planning with scheduling by considering the manufacturing system's capacity, cost and capacity in its workshop is a critical issue. The concurrency between them can also eliminate the redundant process and optimize the entire production cycle, but most integrated process planning and scheduling methods only consider the time aspects of the alternative machines when constructing schedules. In this paper, a fuzzy inference system (FIS) in choosing alternative machines for integrated process planning and scheduling of a job shop manufacturing system is presented. Instead of choosing alternative machines randomly, machines are being selected based on the machines reliability. The mean time to failure (MTF) values is input in a fuzzy inference mechanism, which outputs the machine reliability. The machine is then being penalized based on the fuzzy output. The most reliable machine will have the higher priority to be chosen. In order to overcome the problem of un-utilization machines, sometimes faced by unreliable machine, the particle swarm optimization (PSO) have been used to balance the load for all the machines. Simulation study shows that the system can be used as an alternative way of choosing machines in integrated process planning and scheduling.
O. Lao Grueso (Oscar); F. Liu (Fan); A. Wollstein (Andreas); M.H. Kayser (Manfred)
2014-01-01
textabstractAttempts to detect genetic population substructure in humans are troubled by the fact that the vast majority of the total amount of observed genetic variation is present within populations rather than between populations. Here we introduce a new algorithm for transforming a genetic dista
Directory of Open Access Journals (Sweden)
Alvarez-Buylla Elena R
2010-10-01
Full Text Available Abstract Background Recent experimental work has uncovered some of the genetic components required to maintain the Arabidopsis thaliana root stem cell niche (SCN and its structure. Two main pathways are involved. One pathway depends on the genes SHORTROOT and SCARECROW and the other depends on the PLETHORA genes, which have been proposed to constitute the auxin readouts. Recent evidence suggests that a regulatory circuit, composed of WOX5 and CLE40, also contributes to the SCN maintenance. Yet, we still do not understand how the niche is dynamically maintained and patterned or if the uncovered molecular components are sufficient to recover the observed gene expression configurations that characterize the cell types within the root SCN. Mathematical and computational tools have proven useful in understanding the dynamics of cell differentiation. Hence, to further explore root SCN patterning, we integrated available experimental data into dynamic Gene Regulatory Network (GRN models and addressed if these are sufficient to attain observed gene expression configurations in the root SCN in a robust and autonomous manner. Results We found that an SCN GRN model based only on experimental data did not reproduce the configurations observed within the root SCN. We developed several alternative GRN models that recover these expected stable gene configurations. Such models incorporate a few additional components and interactions in addition to those that have been uncovered. The recovered configurations are stable to perturbations, and the models are able to recover the observed gene expression profiles of almost all the mutants described so far. However, the robustness of the postulated GRNs is not as high as that of other previously studied networks. Conclusions These models are the first published approximations for a dynamic mechanism of the A. thaliana root SCN cellular pattering. Our model is useful to formally show that the data now available are not
Smout, Michael J; Mulvenna, Jason P; Jones, Malcolm K; Loukas, Alex
2011-10-01
Granulins (GRNs) are potent growth factors that are upregulated in many aggressive cancers from a wide range of organs. GRNs form tight, disulphide bonded, beta hairpin stacks, making them difficult to express in recombinant form. We recently described Ov-GRN-1, a GRN family member secreted by the carcinogenic liver fluke of humans, Opisthorchis viverrini, and showed that recombinant Ov-GRN-1 expressed and refolded from Escherichia coli caused proliferation of mammalian cell lines at nanomolar concentrations. We now report on an optimized method to express and purify monomeric Ov-GRN-1 in E. coli using a straightforward and scalable purification and refolding process. Purified monomeric protein caused proliferation at nanomolar concentrations of cancerous and non-cancerous cell lines derived from human bile duct tissue. The expression and purification method we describe herein will serve as a backbone upon which to develop expression and purification processes for recombinant GRNs from other organisms, accelerating research on this intriguing family of proteins.
Directory of Open Access Journals (Sweden)
Maria Serpente
2015-01-01
Full Text Available We analysed the expression levels of 84 key genes involved in the regulated degradation of cellular protein by the ubiquitin-proteasome system in peripheral cells from patients with frontotemporal dementia (FTD due to C9ORF72 and GRN mutations, as compared with sporadic FTD and age-matched controls. A SABiosciences PCR array was used to investigate the transcription profile in a discovery population consisting of six patients each in C9ORF72, GRN, sporadic FTD and age-matched control groups. A generalized down-regulation of gene expression compared with controls was observed in C9ORF72 expansion carriers and sporadic FTD patients. In particular, in both groups, four genes, UBE2I, UBE2Q1, UBE2E1 and UBE2N, were down-regulated at a statistically significant (p < 0.05 level. All of them encode for members of the E2 ubiquitin-conjugating enzyme family. In GRN mutation carriers, no statistically significant deregulation of ubiquitination pathway genes was observed, except for the UBE2Z gene, which displays E2 ubiquitin conjugating enzyme activity, and was found to be statistically significant up-regulated (p = 0.006. These preliminary results suggest that the proteasomal degradation pathway plays a role in the pathogenesis of FTD associated with TDP-43 pathology, although different proteins are altered in carriers of GRN mutations as compared with carriers of the C9ORF72 expansion.
Lattante, Serena; Le Ber, Isabelle; Galimberti, Daniela; Serpente, Maria; Rivaud-Péchoux, Sophie; Camuzat, Agnès; Clot, Fabienne; Fenoglio, Chiara; Scarpini, Elio; Brice, Alexis; Kabashi, Edor
2014-11-01
TMEM106B was identified as a risk factor for frontotemporal lobar degeneration (FTD) with TAR DNA-binding protein 43 kDa inclusions. It has been reported that variants in this gene are genetic modifiers of the disease and that this association is stronger in patients carrying a GRN mutation or a pathogenic expansion in chromosome 9 open reading frame 72 (C9orf72) gene. Here, we investigated the contribution of TMEM106B polymorphisms in cohorts of FTD and FTD with amyotrophic lateral sclerosis patients from France and Italy. Patients carrying the C9orf72 expansion (n = 145) and patients with GRN mutations (n = 76) were compared with a group of FTD patients (n = 384) negative for mutations and to a group of healthy controls (n = 552). In our cohorts, the presence of the C9orf72 expansion did not correlate with TMEM106B genotypes but the association was very strong in individuals with pathogenic GRN mutations (p = 9.54 × 10(-6)). Our data suggest that TMEM106B genotypes differ in FTD patient cohorts and strengthen the protective role of TMEM106B in GRN carriers. Further studies are needed to determine whether TMEM106B polymorphisms are associated with other genetic causes for FTD, including C9orf72 repeat expansions.
J. Simón-Sánchez (Javier); H. Seelaar (Harro); Z. Bochdanovits (Zoltan); D.J.H. Deeg (Dorly); J.C. van Swieten (John); P. Heutink (Peter)
2009-01-01
textabstractBackground: A single nucleotide polymorphism (rs5848) located in the 3′- untranslated region of GRN has recently been associated with a risk of frontotemporal lobar degeneration (FTLD) in North American population particularly in pathologically confirmed cases with neural inclusions
Li, Enhu; Materna, Stefan C; Davidson, Eric H
2013-10-01
The sea urchin oral ectoderm gene regulatory network (GRN) model has increased in complexity as additional genes are added to it, revealing its multiple spatial regulatory state domains. The formation of the oral ectoderm begins with an oral-aboral redox gradient, which is interpreted by the cis-regulatory system of the nodal gene to cause its expression on the oral side of the embryo. Nodal signaling drives cohorts of regulatory genes within the oral ectoderm and its derived subdomains. Activation of these genes occurs sequentially, spanning the entire blastula stage. During this process the stomodeal subdomain emerges inside of the oral ectoderm, and bilateral subdomains defining the lateral portions of the future ciliary band emerge adjacent to the central oral ectoderm. Here we examine two regulatory genes encoding repressors, sip1 and ets4, which selectively prevent transcription of oral ectoderm genes until their expression is cleared from the oral ectoderm as an indirect consequence of Nodal signaling. We show that the timing of transcriptional de-repression of sip1 and ets4 targets which occurs upon their clearance explains the dynamics of oral ectoderm gene expression. In addition two other repressors, the direct Nodal target not, and the feed forward Nodal target goosecoid, repress expression of regulatory genes in the central animal oral ectoderm thereby confining their expression to the lateral domains of the animal ectoderm. These results have permitted construction of an enhanced animal ectoderm GRN model highlighting the repressive interactions providing precise temporal and spatial control of regulatory gene expression. © 2013 Elsevier Inc. All rights reserved.
Modeling gene regulatory networks: A network simplification algorithm
Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.
2016-12-01
Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.
Yolmeh, Mahmoud; Habibi Najafi, Mohammad B; Salehi, Fakhreddin
2014-01-01
Annatto is commonly used as a coloring agent in the food industry and has antimicrobial and antioxidant properties. In this study, genetic algorithm-artificial neural network (GA-ANN) and adaptive neuro-fuzzy inference system (ANFIS) models were used to predict the effect of annatto dye on Salmonella enteritidis in mayonnaise. The GA-ANN and ANFIS were fed with 3 inputs of annatto dye concentration (0, 0.1, 0.2 and 0.4%), storage temperature (4 and 25°C) and storage time (1-20 days) for prediction of S. enteritidis population. Both models were trained with experimental data. The results showed that the annatto dye was able to reduce of S. enteritidis and its effect was stronger at 25°C than 4°C. The developed GA-ANN, which included 8 hidden neurons, could predict S. enteritidis population with correlation coefficient of 0.999. The overall agreement between ANFIS predictions and experimental data was also very good (r=0.998). Sensitivity analysis results showed that storage temperature was the most sensitive factor for prediction of S. enteritidis population.
Making Type Inference Practical
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff; Oxhøj, Nicholas; Palsberg, Jens
1992-01-01
We present the implementation of a type inference algorithm for untyped object-oriented programs with inheritance, assignments, and late binding. The algorithm significantly improves our previous one, presented at OOPSLA'91, since it can handle collection classes, such as List, in a useful way. Abo....... Experiments indicate that the implementation type checks as much as 100 lines pr. second. This results in a mature product, on which a number of tools can be based, for example a safety tool, an image compression tool, a code optimization tool, and an annotation tool. This may make type inference for object...
Guven, Gamze; Lohmann, Ebba; Bras, Jose; Gibbs, J. Raphael; Gurvit, Hakan; Bilgic, Basar; Hanagasi, Hasmet; Rizzu, Patrizia; Heutink, Peter; Emre, Murat; Erginel-Unaltuna, Nihan; Just, Walter; Hardy, John; Singleton, Andrew; Guerreiro, Rita
2016-01-01
‘Microtubule-associated protein tau’ (MAPT), ‘granulin’ (GRN) and ‘chromosome 9 open reading frame72’ (C9ORF72) gene mutations are the major known genetic causes of frontotemporal dementia (FTD). Recent studies suggest that mutations in these genes may also be associated with other forms of dementia. Therefore we investigated whether MAPT, GRN and C9ORF72 gene mutations are major contributors to dementia in a random, unselected Turkish cohort of dementia patients. A combination of whole-exome sequencing, Sanger sequencing and fragment analysis/Southern blot was performed in order to identify pathogenic mutations and novel variants in these genes as well as other FTD-related genes such as the ‘charged multivesicular body protein 2B’ (CHMP2B), the ‘FUS RNA binding protein’ (FUS), the ‘TAR DNA binding protein’ (TARDBP), the ‘sequestosome1’ (SQSTM1), and the ‘valosin containing protein’ (VCP). We determined one pathogenic MAPT mutation (c.1906C>T, p.P636L) and one novel missense variant (c.38A>G, p.D13G). In GRN we identified a probably pathogenic TGAG deletion in the splice donor site of exon 6. Three patients were found to carry the GGGGCC expansions in the non-coding region of the C9ORF72 gene. In summary, a complete screening for mutations in MAPT, GRN and C9ORF72 genes revealed a frequency of 5.4% of pathogenic mutations in a random cohort of 93 Turkish index patients with dementia. PMID:27632209
Du, Xianming; Rao, Malireddi R K Subba; Chen, Xue Qin; Wu, Wei; Mahalingam, Sundarasamy; Balasundaram, David
2006-01-01
Grn1p from fission yeast and GNL3L from human cells, two putative GTPases from the novel HSR1_MMR1 GTP-binding protein subfamily with circularly permuted G-motifs play a critical role in maintaining normal cell growth. Deletion of Grn1 resulted in a severe growth defect, a marked reduction in mature rRNA species with a concomitant accumulation of the 35S pre-rRNA transcript, and failure to export the ribosomal protein Rpl25a from the nucleolus. Deleting any of the Grn1p G-domain motifs resulted in a null phenotype and nuclear/nucleolar localization consistent with the lack of nucleolar export of preribosomes accompanied by a distortion of nucleolar structure. Heterologous expression of GNL3L in a Deltagrn1 mutant restored processing of 35S pre-rRNA, nuclear export of Rpl25a and cell growth to wild-type levels. Genetic complementation in yeast and siRNA knockdown in HeLa cells confirmed the homologous proteins Grn1p and GNL3L are required for growth. Failure of two similar HSR1_MMR1 putative nucleolar GTPases, Nucleostemin (NS), or the dose-dependent response of breast tumor autoantigen NGP-1, to rescue deltagrn1 implied the highly specific roles of Grn1p or GNL3L in nucleolar events. Our analysis uncovers an important role for Grn1p/GNL3L within this unique group of nucleolar GTPases.
King, Gary; Rosen, Ori; Tanner, Martin A.
2004-09-01
This collection of essays brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half-decade has witnessed an explosion of research in ecological inference--the process of trying to infer individual behavior from aggregate data. Although uncertainties and information lost in aggregation make ecological inference one of the most problematic types of research to rely on, these inferences are required in many academic fields, as well as by legislatures and the Courts in redistricting, by business in marketing research, and by governments in policy analysis.
Bessonov, Kyrylo; Van Steen, Kristel
2016-12-01
Gene regulatory network (GRN) inference is an active area of research that facilitates understanding the complex interplays between biological molecules. We propose a novel framework to create such GRNs, based on Conditional Inference Forests (CIFs) as proposed by Strobl et al. Our framework consists of using ensembles of Conditional Inference Trees (CITs) and selecting an appropriate aggregation scheme for variant selection prior to network construction. We show on synthetic microarray data that taking the original implementation of CIFs with conditional permutation scheme (CIFcond ) may lead to improved performance compared to Breiman's implementation of Random Forests (RF). Among all newly introduced CIF-based methods and five network scenarios obtained from the DREAM4 challenge, CIFcond performed best. Networks derived from well-tuned CIFs, obtained by simply averaging P-values over tree ensembles (CIFmean ) are particularly attractive, because they combine adequate performance with computational efficiency. Moreover, thresholds for variable selection are based on significance levels for P-values and, hence, do not need to be tuned. From a practical point of view, our extensive simulations show the potential advantages of CIFmean -based methods. Although more work is needed to improve on speed, especially when fully exploiting the advantages of CITs in the context of heterogeneous and correlated data, we have shown that CIF methodology can be flexibly inserted in a framework to infer biological interactions. Notably, we confirmed biologically relevant interaction between IL2RA and FOXP1, linked to the IL-2 signaling pathway and to type 1 diabetes.
An Inference Language for Imaging
DEFF Research Database (Denmark)
Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen
2014-01-01
We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framework...
Statistical Inference in Graphical Models
2008-06-17
Probabilistic Network Library ( PNL ). While not fully mature, PNL does provide the most commonly-used algorithms for inference and learning with the efficiency...of C++, and also offers interfaces for calling the library from MATLAB and R 1361. Notably, both BNT and PNL provide learning and inference algorithms...mature and has been used for research purposes for several years, it is written in MATLAB and thus is not suitable to be used in real-time settings. PNL
Energy Technology Data Exchange (ETDEWEB)
Hanson, K.M.; Cunningham, G.S.
1996-04-01
The authors are developing a computer application, called the Bayes Inference Engine, to provide the means to make inferences about models of physical reality within a Bayesian framework. The construction of complex nonlinear models is achieved by a fully object-oriented design. The models are represented by a data-flow diagram that may be manipulated by the analyst through a graphical programming environment. Maximum a posteriori solutions are achieved using a general, gradient-based optimization algorithm. The application incorporates a new technique of estimating and visualizing the uncertainties in specific aspects of the model.
Directory of Open Access Journals (Sweden)
Richard Shoemaker
2014-04-01
Full Text Available Establishing causality has been a problem throughout history of philosophy of science. This paper discusses the philosophy of causal inference along the different school of thoughts and methods: Rationalism, Empiricism, Inductive method, Hypothetical deductive method with pros and cons. The article it starting from the Problem of Hume, also close to the positions of Russell, Carnap, Popper and Kuhn to better understand the modern interpretation and implications of causal inference in epidemiological research.
Sassi, Celeste; Guerreiro, Rita; Gibbs, Raphael; Ding, Jinhui; Lupton, Michelle K.; Troakes, Claire; Al-Sarraj, Safa; Niblock, Michael; Gallo, Jean-Marc; Adnan, Jihad; Killick, Richard; Brown, Kristelle S.; Medway, Christopher; Lord, Jenny; Turton, James; Bras, Jose; Morgan, Kevin; Powell, John F.; Singleton, Andrew; Hardy, John
2014-01-01
The overlapping clinical and neuropathologic features between late-onset apparently sporadic Alzheimer's disease (LOAD), familial Alzheimer's disease (FAD), and other neurodegenerative dementias (frontotemporal dementia, corticobasal degeneration, progressive supranuclear palsy, and Creutzfeldt-Jakob disease) raise the question of whether shared genetic risk factors may explain the similar phenotype among these disparate disorders. To investigate this intriguing hypothesis, we analyzed rare coding variability in 6 Mendelian dementia genes (APP, PSEN1, PSEN2, GRN, MAPT, and PRNP), in 141 LOAD patients and 179 elderly controls, neuropathologically proven, from the UK. In our cohort, 14 LOAD cases (10%) and 11 controls (6%) carry at least 1 rare variant in the genes studied. We report a novel variant in PSEN1 (p.I168T) and a rare variant in PSEN2 (p.A237V), absent in controls and both likely pathogenic. Our findings support previous studies, suggesting that (1) rare coding variability in PSEN1 and PSEN2 may influence the susceptibility for LOAD and (2) GRN, MAPT, and PRNP are not major contributors to LOAD. Thus, genetic screening is pivotal for the clinical differential diagnosis of these neurodegenerative dementias. PMID:25104557
An Inference Language for Imaging
DEFF Research Database (Denmark)
Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen
2014-01-01
We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framewor......-accelerated primitives specializes iLang to the spatial data-structures that arise in imaging applications. We illustrate the framework through a challenging application: spatio-temporal tomographic reconstruction with compressive sensing....
DEFF Research Database (Denmark)
Ødegård, Jørgen; Meuwissen, Theo HE; Heringstad, Bjørg
2010-01-01
Background In the genetic analysis of binary traits with one observation per animal, animal threshold models frequently give biased heritability estimates. In some cases, this problem can be circumvented by fitting sire- or sire-dam models. However, these models are not appropriate in cases where...... individual records exist on parents. Therefore, the aim of our study was to develop a new Gibbs sampling algorithm for a proper estimation of genetic (co)variance components within an animal threshold model framework. Methods In the proposed algorithm, individuals are classified as either "informative......" or "non-informative" with respect to genetic (co)variance components. The "non-informative" individuals are characterized by their Mendelian sampling deviations (deviance from the mid-parent mean) being completely confounded with a single residual on the underlying liability scale. For threshold models...
Rohatgi, Vijay K
2003-01-01
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Lifted Inference for Relational Continuous Models
Choi, Jaesik; Hill, David J
2012-01-01
Relational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real world applications. The algorithm applies to Relational Pairwise Models which are (relational) products of potentials of arity 2. Our algorithm is unique in two ways. First, it substantially improves the efficiency of lifted inference with variables of continuous domains. When a relational model has Gaussian potentials, it takes only linear-time compared to cubic time of previous methods. Second, it is the first exact inference algorithm which handles RCMs in a lifted way. The algorithm is illustrated over an example from econometrics. Experimental results show that our algorithm outperforms both a groundlevel inferenc...
An Inference Language for Imaging
DEFF Research Database (Denmark)
Pedemonte, Stefano; Catana, Ciprian; Van Leemput, Koen
2014-01-01
We introduce iLang, a language and software framework for probabilistic inference. The iLang framework enables the definition of directed and undirected probabilistic graphical models and the automated synthesis of high performance inference algorithms for imaging applications. The iLang framework...... is composed of a set of language primitives and of an inference engine based on a message-passing system that integrates cutting-edge computational tools, including proximal algorithms and high performance Hamiltonian Markov Chain Monte Carlo techniques. A set of domain-specific highly optimized GPU......-accelerated primitives specializes iLang to the spatial data-structures that arise in imaging applications. We illustrate the framework through a challenging application: spatio-temporal tomographic reconstruction with compressive sensing....
Definitive Consensus for Distributed Data Inference
2011-01-01
Inference from data is of key importance in many applications of informatics. The current trend in performing such a task of inference from data is to utilise machine learning algorithms. Moreover, in many applications that it is either required or is preferable to infer from the data in a distributed manner. Many practical difficulties arise from the fact that in many distributed applications we avert from transferring data or parts of it due to cost...
Ghaedi, M; Ghaedi, A M; Abdi, F; Roosta, M; Vafaei, A; Asghari, A
2013-10-01
In the present study, activated carbon (AC) simply derived from Pistacia khinjuk and characterized using different techniques such as SEM and BET analysis. This new adsorbent was used for methylene blue (MB) adsorption. Fitting the experimental equilibrium data to various isotherm models shows the suitability and applicability of the Langmuir model. The adsorption mechanism and rate of processes was investigated by analyzing time dependency data to conventional kinetic models and it was found that adsorption follow the pseudo-second-order kinetic model. Principle component analysis (PCA) has been used for preprocessing of input data and genetic algorithm optimization have been used for prediction of adsorption of methylene blue using activated carbon derived from P. khinjuk. In our laboratory various activated carbon as sole adsorbent or loaded with various nanoparticles was used for removal of many pollutants (Ghaedi et al., 2012). These results indicate that the small amount of proposed adsorbent (1.0g) is applicable for successful removal of MB (RE>98%) in short time (45min) with high adsorption capacity (48-185mgg(-1)).
Directory of Open Access Journals (Sweden)
Panos G Kalatzis
Full Text Available Bacterial infections are a serious problem in aquaculture since they can result in massive mortalities in farmed fish and invertebrates. Vibriosis is one of the most common diseases in marine aquaculture hatcheries and its causative agents are bacteria of the genus Vibrio mostly entering larval rearing water through live feeds, such as Artemia and rotifers. The pathogenic Vibrio alginolyticus strain V1, isolated during a vibriosis outbreak in cultured seabream, Sparus aurata, was used as host to isolate and characterize the two novel bacteriophages φSt2 and φGrn1 for phage therapy application. In vitro cell lysis experiments were performed against the bacterial host V. alginolyticus strain V1 but also against 12 presumptive Vibrio strains originating from live prey Artemia salina cultures indicating the strong lytic efficacy of the 2 phages. In vivo administration of the phage cocktail, φSt2 and φGrn1, at MOI = 100 directly on live prey A. salina cultures, led to a 93% decrease of presumptive Vibrio population after 4 h of treatment. Current study suggests that administration of φSt2 and φGrn1 to live preys could selectively reduce Vibrio load in fish hatcheries. Innovative and environmental friendly solutions against bacterial diseases are more than necessary and phage therapy is one of them.
Kalatzis, Panos G; Bastías, Roberto; Kokkari, Constantina; Katharios, Pantelis
2016-01-01
Bacterial infections are a serious problem in aquaculture since they can result in massive mortalities in farmed fish and invertebrates. Vibriosis is one of the most common diseases in marine aquaculture hatcheries and its causative agents are bacteria of the genus Vibrio mostly entering larval rearing water through live feeds, such as Artemia and rotifers. The pathogenic Vibrio alginolyticus strain V1, isolated during a vibriosis outbreak in cultured seabream, Sparus aurata, was used as host to isolate and characterize the two novel bacteriophages φSt2 and φGrn1 for phage therapy application. In vitro cell lysis experiments were performed against the bacterial host V. alginolyticus strain V1 but also against 12 presumptive Vibrio strains originating from live prey Artemia salina cultures indicating the strong lytic efficacy of the 2 phages. In vivo administration of the phage cocktail, φSt2 and φGrn1, at MOI = 100 directly on live prey A. salina cultures, led to a 93% decrease of presumptive Vibrio population after 4 h of treatment. Current study suggests that administration of φSt2 and φGrn1 to live preys could selectively reduce Vibrio load in fish hatcheries. Innovative and environmental friendly solutions against bacterial diseases are more than necessary and phage therapy is one of them.
Kalatzis, Panos G.; Bastías, Roberto; Kokkari, Constantina; Katharios, Pantelis
2016-01-01
Bacterial infections are a serious problem in aquaculture since they can result in massive mortalities in farmed fish and invertebrates. Vibriosis is one of the most common diseases in marine aquaculture hatcheries and its causative agents are bacteria of the genus Vibrio mostly entering larval rearing water through live feeds, such as Artemia and rotifers. The pathogenic Vibrio alginolyticus strain V1, isolated during a vibriosis outbreak in cultured seabream, Sparus aurata, was used as host to isolate and characterize the two novel bacteriophages φSt2 and φGrn1 for phage therapy application. In vitro cell lysis experiments were performed against the bacterial host V. alginolyticus strain V1 but also against 12 presumptive Vibrio strains originating from live prey Artemia salina cultures indicating the strong lytic efficacy of the 2 phages. In vivo administration of the phage cocktail, φSt2 and φGrn1, at MOI = 100 directly on live prey A. salina cultures, led to a 93% decrease of presumptive Vibrio population after 4 h of treatment. Current study suggests that administration of φSt2 and φGrn1 to live preys could selectively reduce Vibrio load in fish hatcheries. Innovative and environmental friendly solutions against bacterial diseases are more than necessary and phage therapy is one of them. PMID:26950336
Automatic Inference of DATR Theories
Barg, P
1996-01-01
This paper presents an approach for the automatic acquisition of linguistic knowledge from unstructured data. The acquired knowledge is represented in the lexical knowledge representation language DATR. A set of transformation rules that establish inheritance relationships and a default-inference algorithm make up the basis components of the system. Since the overall approach is not restricted to a special domain, the heuristic inference strategy uses criteria to evaluate the quality of a DATR theory, where different domains may require different criteria. The system is applied to the linguistic learning task of German noun inflection.
Constraint Processing in Lifted Probabilistic Inference
Kisynski, Jacek
2012-01-01
First-order probabilistic models combine representational power of first-order logic with graphical models. There is an ongoing effort to design lifted inference algorithms for first-order probabilistic models. We analyze lifted inference from the perspective of constraint processing and, through this viewpoint, we analyze and compare existing approaches and expose their advantages and limitations. Our theoretical results show that the wrong choice of constraint processing method can lead to exponential increase in computational complexity. Our empirical tests confirm the importance of constraint processing in lifted inference. This is the first theoretical and empirical study of constraint processing in lifted inference.
Type Inference for Session Types in the Pi-Calculus
DEFF Research Database (Denmark)
Graversen, Eva Fajstrup; Harbo, Jacob Buchreitz; Huttel, Hans
2014-01-01
In this paper we present a direct algorithm for session type inference for the π-calculus. Type inference for session types has previously been achieved by either imposing limitations and restriction on the π-calculus, or by reducing the type inference problem to that for linear types. Our approach...
Type Inference for Session Types in the Pi-Calculus
DEFF Research Database (Denmark)
Huttel, Hans; Graversen, Eva Fajstrup; Wahl, Sebastian
2014-01-01
In this paper we present a direct algorithm for session type inference for the π-calculus. Type inference for session types has previously been achieved by either imposing limitations and restriction on the π-calculus, or by reducing the type inference problem to that for linear types. Our approa...
Gui, Shupeng; Rice, Andrew P; Chen, Rui; Wu, Liang; Liu, Ji; Miao, Hongyu
2017-01-31
Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if a network size increases to 100 or higher. Here we present a novel scalable algorithm for identifying genome-wide gene regulatory network (GRN) structures, and we have verified the algorithm performances by extensive simulation studies based on the DREAM challenge benchmark data. The highlight of our method is that its superior performance does not degenerate even for a network size on the order of 10(4), and is thus readily applicable to large-scale complex networks. Such a breakthrough is achieved by considering both prior biological knowledge and multiple topological properties (i.e., sparsity and hub gene structure) of complex networks in the regularized formulation. We also validate and illustrate the application of our algorithm in practice using the time-course gene expression data from a study on human respiratory epithelial cells in response to influenza A virus (IAV) infection, as well as the CHIP-seq data from ENCODE on transcription factor (TF) and target gene interactions. An interesting finding, owing to the proposed algorithm, is that the biggest hub structures (e.g., top ten) in the GRN all center at some transcription factors in the context of epithelial cell infection by IAV. The proposed algorithm is the first scalable method for large complex network structure identification. The GRN structure identified by our algorithm could reveal possible biological links and help researchers to choose which gene functions to investigate in a biological event. The algorithm described in this article is implemented in MATLAB (Ⓡ) , and the source code is
Khoshbin, Fatemeh; Bonakdari, Hossein; Hamed Ashraf Talesh, Seyed; Ebtehaj, Isa; Zaji, Amir Hossein; Azimi, Hamed
2016-06-01
In the present article, the adaptive neuro-fuzzy inference system (ANFIS) is employed to model the discharge coefficient in rectangular sharp-crested side weirs. The genetic algorithm (GA) is used for the optimum selection of membership functions, while the singular value decomposition (SVD) method helps in computing the linear parameters of the ANFIS results section (GA/SVD-ANFIS). The effect of each dimensionless parameter on discharge coefficient prediction is examined in five different models to conduct sensitivity analysis by applying the above-mentioned dimensionless parameters. Two different sets of experimental data are utilized to examine the models and obtain the best model. The study results indicate that the model designed through GA/SVD-ANFIS predicts the discharge coefficient with a good level of accuracy (mean absolute percentage error = 3.362 and root mean square error = 0.027). Moreover, comparing this method with existing equations and the multi-layer perceptron-artificial neural network (MLP-ANN) indicates that the GA/SVD-ANFIS method has superior performance in simulating the discharge coefficient of side weirs.
Lower complexity bounds for lifted inference
DEFF Research Database (Denmark)
Jaeger, Manfred
2015-01-01
instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...
Protein inference: A protein quantification perspective.
He, Zengyou; Huang, Ting; Liu, Xiaoqing; Zhu, Peijun; Teng, Ben; Deng, Shengchun
2016-08-01
In mass spectrometry-based shotgun proteomics, protein quantification and protein identification are two major computational problems. To quantify the protein abundance, a list of proteins must be firstly inferred from the raw data. Then the relative or absolute protein abundance is estimated with quantification methods, such as spectral counting. Until now, most researchers have been dealing with these two processes separately. In fact, the protein inference problem can be regarded as a special protein quantification problem in the sense that truly present proteins are those proteins whose abundance values are not zero. Some recent published papers have conceptually discussed this possibility. However, there is still a lack of rigorous experimental studies to test this hypothesis. In this paper, we investigate the feasibility of using protein quantification methods to solve the protein inference problem. Protein inference methods aim to determine whether each candidate protein is present in the sample or not. Protein quantification methods estimate the abundance value of each inferred protein. Naturally, the abundance value of an absent protein should be zero. Thus, we argue that the protein inference problem can be viewed as a special protein quantification problem in which one protein is considered to be present if its abundance is not zero. Based on this idea, our paper tries to use three simple protein quantification methods to solve the protein inference problem effectively. The experimental results on six data sets show that these three methods are competitive with previous protein inference algorithms. This demonstrates that it is plausible to model the protein inference problem as a special protein quantification task, which opens the door of devising more effective protein inference algorithms from a quantification perspective. The source codes of our methods are available at: http://code.google.com/p/protein-inference/.
Inferring AS Relationships from BGP Attributes
Giotsas, Vasileios
2011-01-01
Business relationships between autonomous systems (AS) are crucial for Internet routing. Existing algorithms used heuristics to infer AS relationships from AS topology data. In this paper we propose a different approach to infer AS relationships from more informative data sources, namely the BGP Community and Local Preference attributes. These data contain rich information on AS routing policies and therefore closely reflect AS relationships. We accumulate the BGP data from RouteViews, RIPE RIS and route servers in August 2010 and February 2011. We infer the AS relationships for 39% of links that are visible in our BGP data. They cover the majority of links among the Tier-1 and Tier-2 ASes. The BGP data also allow us to discover special relationship types, namely hybrid relationship, partial-transit relationship, indirect peering relationship and backup links. Finally we evaluate and analyse the problems of the existing inference algorithms.
An inference engine for embedded diagnostic systems
Fox, Barry R.; Brewster, Larry T.
1987-01-01
The implementation of an inference engine for embedded diagnostic systems is described. The system consists of two distinct parts. The first is an off-line compiler which accepts a propositional logical statement of the relationship between facts and conclusions and produces data structures required by the on-line inference engine. The second part consists of the inference engine and interface routines which accept assertions of fact and return the conclusions which necessarily follow. Given a set of assertions, it will generate exactly the conclusions which logically follow. At the same time, it will detect any inconsistencies which may propagate from an inconsistent set of assertions or a poorly formulated set of rules. The memory requirements are fixed and the worst case execution times are bounded at compile time. The data structures and inference algorithms are very simple and well understood. The data structures and algorithms are described in detail. The system has been implemented on Lisp, Pascal, and Modula-2.
Bayesian Inference Methods for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand
2013-01-01
This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...
Bayesian Inference for Radio Observations
Lochner, Michelle; Zwart, Jonathan T L; Smirnov, Oleg; Bassett, Bruce A; Oozeer, Nadeem; Kunz, Martin
2015-01-01
(Abridged) New telescopes like the Square Kilometre Array (SKA) will push into a new sensitivity regime and expose systematics, such as direction-dependent effects, that could previously be ignored. Current methods for handling such systematics rely on alternating best estimates of instrumental calibration and models of the underlying sky, which can lead to inaccurate uncertainty estimates and biased results because such methods ignore any correlations between parameters. These deconvolution algorithms produce a single image that is assumed to be a true representation of the sky, when in fact it is just one realisation of an infinite ensemble of images compatible with the noise in the data. In contrast, here we report a Bayesian formalism that simultaneously infers both systematics and science. Our technique, Bayesian Inference for Radio Observations (BIRO), determines all parameters directly from the raw data, bypassing image-making entirely, by sampling from the joint posterior probability distribution. Thi...
Energy Technology Data Exchange (ETDEWEB)
Chertkov, Michael [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahn, Sungsoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of); Shin, Jinwoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)
2017-05-25
Computing partition function is the most important statistical inference task arising in applications of Graphical Models (GM). Since it is computationally intractable, approximate methods have been used to resolve the issue in practice, where meanfield (MF) and belief propagation (BP) are arguably the most popular and successful approaches of a variational type. In this paper, we propose two new variational schemes, coined Gauged-MF (G-MF) and Gauged-BP (G-BP), improving MF and BP, respectively. Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we prove that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case. Our extensive experiments, on complete GMs of relatively small size and on large GM (up-to 300 variables) confirm that the newly proposed algorithms outperform and generalize MF and BP.
Generalized Collective Inference with Symmetric Clique Potentials
Gupta, Rahul; Dewan, Ajit A
2009-01-01
Collective graphical models exploit inter-instance associative dependence to output more accurate labelings. However existing models support very limited kind of associativity which restricts accuracy gains. This paper makes two major contributions. First, we propose a general collective inference framework that biases data instances to agree on a set of {\\em properties} of their labelings. Agreement is encouraged through symmetric clique potentials. We show that rich properties leads to bigger gains, and present a systematic inference procedure for a large class of such properties. The procedure performs message passing on the cluster graph, where property-aware messages are computed with cluster specific algorithms. This provides an inference-only solution for domain adaptation. Our experiments on bibliographic information extraction illustrate significant test error reduction over unseen domains. Our second major contribution consists of algorithms for computing outgoing messages from clique clusters with ...
Inferring comprehensible business/ICT alignment rules
Cumps, B.; Martens, D.; De Backer, M.; Haesen, R.; Viaene, S.; Dedene, G.; Baesens, B.; Snoeck, M.
2009-01-01
We inferred business rules for business/ICT alignment by applying a novel rule induction algorithm on a data set containing rich alignment information polled from 641 organisations in 7 European countries. The alignment rule set was created using AntMiner+, a rule induction technique with a reputati
Artificial Hydrocarbon Networks Fuzzy Inference System
Directory of Open Access Journals (Sweden)
Hiram Ponce
2013-01-01
Full Text Available This paper presents a novel fuzzy inference model based on artificial hydrocarbon networks, a computational algorithm for modeling problems based on chemical hydrocarbon compounds. In particular, the proposed fuzzy-molecular inference model (FIM-model uses molecular units of information to partition the output space in the defuzzification step. Moreover, these molecules are linguistic units that can be partially understandable due to the organized structure of the topology and metadata parameters involved in artificial hydrocarbon networks. In addition, a position controller for a direct current (DC motor was implemented using the proposed FIM-model in type-1 and type-2 fuzzy inference systems. Experimental results demonstrate that the fuzzy-molecular inference model can be applied as an alternative of type-2 Mamdani’s fuzzy control systems because the set of molecular units can deal with dynamic uncertainties mostly present in real-world control applications.
DEFF Research Database (Denmark)
Andersen, Jesper
2009-01-01
Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....
Bayesian inference for OPC modeling
Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.
2016-03-01
The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.
The NIFTY way of Bayesian signal inference
Energy Technology Data Exchange (ETDEWEB)
Selig, Marco, E-mail: mselig@mpa-Garching.mpg.de [Max Planck Institut für Astrophysik, Karl-Schwarzschild-Straße 1, D-85748 Garching, Germany, and Ludwig-Maximilians-Universität München, Geschwister-Scholl-Platz 1, D-80539 München (Germany)
2014-12-05
We introduce NIFTY, 'Numerical Information Field Theory', a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D{sup 3}PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.
Energy Technology Data Exchange (ETDEWEB)
Petrov, S.
1996-10-01
Languages with a solvable implication problem but without complete and consistent systems of inference rules (`poor` languages) are considered. The problem of existence of finite complete and consistent inference rule system for a ``poor`` language is stated independently of the language or rules syntax. Several properties of the problem arc proved. An application of results to the language of join dependencies is given.
Variational Probabilistic Inference and the QMR-DT Network
Jaakkola, T S; 10.1613/jair.583
2011-01-01
We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the `Quick Medical Reference' (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.
Nagao, Makoto
1990-01-01
Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig
Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach
Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.
An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.
Probability and Statistical Inference
Prosper, Harrison B.
2006-01-01
These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.
Emmert-Streib, Frank; Glazko, Galina V; Altay, Gökmen; de Matos Simoes, Ricardo
2012-01-01
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Introductory statistical inference
Mukhopadhyay, Nitis
2014-01-01
This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
Nor, Igor; Charlat, Sylvain; Engelstadter, Jan; Reuter, Max; Duron, Olivier; Sagot, Marie-France
2010-01-01
We address in this paper a new computational biology problem that aims at understanding a mechanism that could potentially be used to genetically manipulate natural insect populations infected by inherited, intra-cellular parasitic bacteria. In this problem, that we denote by \\textsc{Mod/Resc Parsimony Inference}, we are given a boolean matrix and the goal is to find two other boolean matrices with a minimum number of columns such that an appropriately defined operation on these matrices gives back the input. We show that this is formally equivalent to the \\textsc{Bipartite Biclique Edge Cover} problem and derive some complexity results for our problem using this equivalence. We provide a new, fixed-parameter tractability approach for solving both that slightly improves upon a previously published algorithm for the \\textsc{Bipartite Biclique Edge Cover}. Finally, we present experimental results where we applied some of our techniques to a real-life data set.
Bayesian Inference with Optimal Maps
Moselhy, Tarek A El
2011-01-01
We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selectio...
Inferring Boolean network states from partial information
2013-01-01
Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data. PMID:24006954
Computationally efficient Bayesian inference for inverse problems.
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
Inference of Isoforms from Short Sequence Reads
Feng, Jianxing; Li, Wei; Jiang, Tao
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS and PAS information, especially for isoforms whose expression levels are significantly high.
Knuth, Kevin H
2010-01-01
We present a foundation for inference that unites and significantly extends the approaches of Kolmogorov and Cox. Our approach is based on quantifying finite lattices of logical statements in a way that satisfies general lattice symmetries. With other applications in mind, our derivations assume minimal symmetries, relying on neither complementarity nor continuity or differentiability. Each relevant symmetry corresponds to an axiom of quantification, and these axioms are used to derive a unique set of rules governing quantification of the lattice. These rules form the familiar probability calculus. We also derive a unique quantification of divergence and information. Taken together these results form a simple and clear foundation for the quantification of inference.
Mathematical inference and control of molecular networks from perturbation experiments
Mohammed-Rasheed, Mohammed
One of the main challenges facing biologists and mathematicians in the post genomic era is to understand the behavior of molecular networks and harness this understanding into an educated intervention of the cell. The cell maintains its function via an elaborate network of interconnecting positive and negative feedback loops of genes, RNA and proteins that send different signals to a large number of pathways and molecules. These structures are referred to as genetic regulatory networks (GRNs) or molecular networks. GRNs can be viewed as dynamical systems with inherent properties and mechanisms, such as steady-state equilibriums and stability, that determine the behavior of the cell. The biological relevance of the mathematical concepts are important as they may predict the differentiation of a stem cell, the maintenance of a normal cell, the development of cancer and its aberrant behavior, and the design of drugs and response to therapy. Uncovering the underlying GRN structure from gene/protein expression data, e.g., microarrays or perturbation experiments, is called inference or reverse engineering of the molecular network. Because of the high cost and time consuming nature of biological experiments, the number of available measurements or experiments is very small compared to the number of molecules (genes, RNA and proteins). In addition, the observations are noisy, where the noise is due to the measurements imperfections as well as the inherent stochasticity of genetic expression levels. Intra-cellular activities and extra-cellular environmental attributes are also another source of variability. Thus, the inference of GRNs is, in general, an under-determined problem with a highly noisy set of observations. The ultimate goal of GRN inference and analysis is to be able to intervene within the network, in order to force it away from undesirable cellular states and into desirable ones. However, it remains a major challenge to design optimal intervention strategies
Safety Analysis versus Type Inference with Partial Types
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff; Palsberg, Jens
1992-01-01
Safety analysis is an algorithm for determining if a term in an untyped lambda calculus with constants is safe, i.e., if it does not cause an error during evaluation. This ambition is also shared by algorithms for type inference. Safety analysis and type inference are based on rather different...... perspectives, however. Safety analysis is global in that it can only analyze a complete program. In contrast, type inference is local in that it can analyze pieces of a program in isolation. In this paper we prove that safety analysis is sound, relative to both a strict and a lazy operational semantics. We...... also prove that safety analysis accepts strictly more safe lambda terms than does type inference for simple types. The latter result demonstrates that global program analysis can be more precise than local ones....
Benchmarking Spike Rate Inference in Population Calcium Imaging.
Theis, Lucas; Berens, Philipp; Froudarakis, Emmanouil; Reimer, Jacob; Román Rosón, Miroslav; Baden, Tom; Euler, Thomas; Tolias, Andreas S; Bethge, Matthias
2016-05-01
A fundamental challenge in calcium imaging has been to infer spike rates of neurons from the measured noisy fluorescence traces. We systematically evaluate different spike inference algorithms on a large benchmark dataset (>100,000 spikes) recorded from varying neural tissue (V1 and retina) using different calcium indicators (OGB-1 and GCaMP6). In addition, we introduce a new algorithm based on supervised learning in flexible probabilistic models and find that it performs better than other published techniques. Importantly, it outperforms other algorithms even when applied to entirely new datasets for which no simultaneously recorded data is available. Future data acquired in new experimental conditions can be used to further improve the spike prediction accuracy and generalization performance of the model. Finally, we show that comparing algorithms on artificial data is not informative about performance on real data, suggesting that benchmarking different methods with real-world datasets may greatly facilitate future algorithmic developments in neuroscience.
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
On efficient Bayesian inference for models with stochastic volatility
Griffin, Jim E.; Sakaria, Dhirendra Kumar
2016-01-01
An efficient method for Bayesian inference in stochastic volatility models uses a linear state space representation to define a Gibbs sampler in which the volatilities are jointly updated. This method involves the choice of an offset parameter and we illustrate how its choice can have an important effect on the posterior inference. A Metropolis-Hastings algorithm is developed to robustify this approach to choice of the offset parameter. The method is illustrated on simulated data with known p...
Adaptive Cluster Expansion for Inferring Boltzmann Machines with Noisy Data
Cocco, Simona
2011-01-01
We introduce a procedure to infer the interactions among a set of binary variables, based on their sampled frequencies and pairwise correlations. The algorithm builds the clusters of variables contributing most to the entropy of the inferred Ising model, and rejects the small contributions due to the sampling noise. Our procedure successfully recovers benchmark Ising models even at criticality and in the low temperature phase, and is applied to neurobiological data.
Inferring interaction partners from protein sequences
Bitbol, Anne-Florence; Colwell, Lucy J; Wingreen, Ned S
2016-01-01
Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a pri...
Inferring Phylogenetic Networks from Gene Order Data
Directory of Open Access Journals (Sweden)
Alexey Anatolievich Morozov
2013-01-01
Full Text Available Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary, sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm. Binary encoding can also be useful, but only when the methods mentioned above cannot be used.
An algebra-based method for inferring gene regulatory networks.
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
2014-03-26
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the
Validation of Gene Regulatory Network Inference Based on Controllability
Directory of Open Access Journals (Sweden)
Edward eDougherty
2013-12-01
Full Text Available There are two distinct issues regarding network validation: (1 Does an inferred network provide good predictions relative to experimental data? (2 Does a network inference algorithm applied within a certain network model framework yield networks that are accurate relative to some criterion of goodness? The first issue concerns scientific validation and the second concerns algorithm validation. In this paper we consider inferential validation relative to controllability; that is, if an inference procedure is applied to synthetic data generated from a gene regulatory network and an intervention procedure is designed on the inferred network, how well does it perform on the true network? The reasoning behind such a criterion is that, if our purpose is to use gene regulatory networks to design therapeutic intervention strategies, then we are not concerned with network fidelity, per se, but only with our ability to design effective interventions based on the inferred network. We will consider the problem from the perspectives of stationary control, which involves designing a control policy to be applied over time based on the current state of the network, with the decision procedure itself being time independent. {The objective of a control policy is to optimally reduce the total steady-state probability mass of the undesirable states (phenotypes, which is equivalent to optimally increasing the total steady-state mass of the desirable states. Based on this criterion we compare several proposed network inference procedures. We will see that inference procedure psi may perform poorer than inference procedure xi relative to inferring the full network structure but perform better than xi relative to controllability. Hence, when one is aiming at a specific application, it may be wise to use an objective-based measure of inference validity.
Inferring Pedigree Graphs from Genetic Distances
Tamura, Takeyuki; Ito, Hiro
In this paper, we study a problem of inferring blood relationships which satisfy a given matrix of genetic distances between all pairs of n nodes. Blood relationships are represented by our proposed graph class, which is called a pedigree graph. A pedigree graph is a directed acyclic graph in which the maximum indegree is at most two. We show that the number of pedigree graphs which satisfy the condition of given genetic distances may be exponential, but they can be represented by one directed acyclic graph with n nodes. Moreover, an O(n3) time algorithm which solves the problem is also given. Although phylogenetic trees and phylogenetic networks are similar data structures to pedigree graphs, it seems that inferring methods for phylogenetic trees and networks cannot be applied to infer pedigree graphs since nodes of phylogenetic trees and networks represent species whereas nodes of pedigree graphs represent individuals. We also show an O(n2) time algorithm which detects a contradiction between a given pedigreee graph and distance matrix of genetic distances.
HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR
Energy Technology Data Exchange (ETDEWEB)
Schneider, Michael D.; Dawson, William A. [Lawrence Livermore National Laboratory, Livermore, CA 94551 (United States); Hogg, David W. [Center for Cosmology and Particle Physics, New York University, New York, NY 10003 (United States); Marshall, Philip J.; Bard, Deborah J. [SLAC National Accelerator Laboratory, Menlo Park, CA 94025 (United States); Meyers, Joshua [Kavli Institute for Particle Astrophysics and Cosmology, Stanford University, 452 Lomita Mall, Stanford, CA 94035 (United States); Lang, Dustin, E-mail: schneider42@llnl.gov [Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213 (United States)
2015-07-01
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.
Causal inference in econometrics
Kreinovich, Vladik; Sriboonchitta, Songsak
2016-01-01
This book is devoted to the analysis of causal inference which is one of the most difficult tasks in data analysis: when two phenomena are observed to be related, it is often difficult to decide whether one of them causally influences the other one, or whether these two phenomena have a common cause. This analysis is the main focus of this volume. To get a good understanding of the causal inference, it is important to have models of economic phenomena which are as accurate as possible. Because of this need, this volume also contains papers that use non-traditional economic models, such as fuzzy models and models obtained by using neural networks and data mining techniques. It also contains papers that apply different econometric models to analyze real-life economic dependencies.
Directory of Open Access Journals (Sweden)
João Paulo Monteiro
2001-12-01
Full Text Available Russell's The Problems of Philosophy tries to establish a new theory of induction, at the same time that Hume is there accused of an irrational/ scepticism about induction". But a careful analysis of the theory of knowledge explicitly acknowledged by Hume reveals that, contrary to the standard interpretation in the XXth century, possibly influenced by Russell, Hume deals exclusively with causal inference (which he never classifies as "causal induction", although now we are entitled to do so, never with inductive inference in general, mainly generalizations about sensible qualities of objects ( whether, e.g., "all crows are black" or not is not among Hume's concerns. Russell's theories are thus only false alternatives to Hume's, in (1912 or in his (1948.
Stochastic processes inference theory
Rao, Malempati M
2014-01-01
This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.
Energy Technology Data Exchange (ETDEWEB)
KENNETH M. HANSON; JANE M. BOOKER
2000-09-08
The authors an uncertainty analysis of data taken using the Rossi technique, in which the horizontal oscilloscope sweep is driven sinusoidally in time ,while the vertical axis follows the signal amplitude. The analysis is done within a Bayesian framework. Complete inferences are obtained by tilting the Markov chain Monte Carlo technique, which produces random samples from the posterior probability distribution expressed in terms of the parameters.
Inferring Microbial Fitness Landscapes
2016-02-25
experiments on evolving microbial populations. Although these experiments have produced examples of remarkable phenomena – e.g. the emergence of mutator...what specific mutations, avian influenza viruses will adapt to novel human hosts; or how readily infectious bacteria will escape antibiotics or the...infer from data the determinants of microbial evolution with sufficient resolution that we can quantify 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND
Continuous Integrated Invariant Inference Project
National Aeronautics and Space Administration — The proposed project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an...
Personalized recommendation via unbalance full-connectivity inference
Ma, Wenping; Ren, Chen; Wu, Yue; Wang, Shanfeng; Feng, Xiang
2017-10-01
Recommender systems play an important role to help us to find useful information. They are widely used by most e-commerce web sites to push the potential items to individual user according to purchase history. Network-based recommendation algorithms are popular and effective in recommendation, which use two types of elements to represent users and items respectively. In this paper, based on consistence-based inference (CBI) algorithm, we propose a novel network-based algorithm, in which users and items are recognized with no difference. The proposed algorithm also uses information diffusion to find the relationship between users and items. Different from traditional network-based recommendation algorithms, information diffusion initializes from users and items, respectively. Experiments show that the proposed algorithm is effective compared with traditional network-based recommendation algorithms.
Probabilistic Inferences in Bayesian Networks
Ding, Jianguo
2010-01-01
This chapter summarizes the popular inferences methods in Bayesian networks. The results demonstrates that the evidence can propagated across the Bayesian networks by any links, whatever it is forward or backward or intercausal style. The belief updating of Bayesian networks can be obtained by various available inference techniques. Theoretically, exact inferences in Bayesian networks is feasible and manageable. However, the computing and inference is NP-hard. That means, in applications, in ...
Towards Stratification Learning through Homology Inference
Bendich, Paul; Wang, Bei
2010-01-01
A topological approach to stratification learning is developed for point cloud data drawn from a stratified space. Given such data, our objective is to infer which points belong to the same strata. First we define a multi-scale notion of a stratified space, giving a stratification for each radius level. We then use methods derived from kernel and cokernel persistent homology to cluster the data points into different strata, and we prove a result which guarantees the correctness of our clustering, given certain topological conditions; some geometric intuition for these topological conditions is also provided. Our correctness result is then given a probabilistic flavor: we give bounds on the minimum number of sample points required to infer, with probability, which points belong to the same strata. Finally, we give an explicit algorithm for the clustering, prove its correctness, and apply it to some simulated data.
Multimodel inference and adaptive management
Rehme, S.E.; Powell, L.A.; Allen, C.R.
2011-01-01
Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.
On-Line Real Time Realization and Application of Adaptive Fuzzy Inference Neural Network
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
In this paper,a modeling algorithm developed by transferring the adaptive fuzzy inference neural network into an on-line real time algorithm,combining the algorithm with conventional system identification method and applying them to separate identification of nonlinear mu Iti-variable systems is introduced and discussed.
Inferring Trust Based on Similarity with TILLIT
Tavakolifard, Mozhgan; Herrmann, Peter; Knapskog, Svein J.
A network of people having established trust relations and a model for propagation of related trust scores are fundamental building blocks in many of today’s most successful e-commerce and recommendation systems. However, the web of trust is often too sparse to predict trust values between non-familiar people with high accuracy. Trust inferences are transitive associations among users in the context of an underlying social network and may provide additional information to alleviate the consequences of the sparsity and possible cold-start problems. Such approaches are helpful, provided that a complete trust path exists between the two users. An alternative approach to the problem is advocated in this paper. Based on collaborative filtering one can exploit the like-mindedness resp. similarity of individuals to infer trust to yet unknown parties which increases the trust relations in the web. For instance, if one knows that with respect to a specific property, two parties are trusted alike by a large number of different trusters, one can assume that they are similar. Thus, if one has a certain degree of trust to the one party, one can safely assume a very similar trustworthiness of the other one. In an attempt to provide high quality recommendations and proper initial trust values even when no complete trust propagation path or user profile exists, we propose TILLIT — a model based on combination of trust inferences and user similarity. The similarity is derived from the structure of the trust graph and users’ trust behavior as opposed to other collaborative-filtering based approaches which use ratings of items or user’s profile. We describe an algorithm realizing the approach based on a combination of trust inferences and user similarity, and validate the algorithm using a real large-scale data-set.
Classical methods for interpreting objective function minimization as intelligent inference
Energy Technology Data Exchange (ETDEWEB)
Golden, R.M. [Univ. of Texas, Dallas, TX (United States)
1996-12-31
Most recognition algorithms and neural networks can be formally viewed as seeking a minimum value of an appropriate objective function during either classification or learning phases. The goal of this paper is to argue that in order to show a recognition algorithm is making intelligent inferences, it is not sufficient to show that the recognition algorithm is computing (or trying to compute) the global minimum of some objective function. One must explicitly define a {open_quotes}relational system{close_quotes} for the recognition algorithm or neural network which identifies the: (i) sample space, (ii) the relevant sigmafield of events generated by the sample space, and (iii) the {open_quotes}relation{close_quotes} for that relational system. Only when such a {open_quotes}relational system{close_quotes} is properly defined, is it possible to formally establish the sense in which computing the global minimum of an objective function is an intelligent, inference.
Nanotechnology and statistical inference
Vesely, Sara; Vesely, Leonardo; Vesely, Alessandro
2017-08-01
We discuss some problems that arise when applying statistical inference to data with the aim of disclosing new func-tionalities. A predictive model analyzes the data taken from experiments on a specific material to assess the likelihood that another product, with similar structure and properties, will exhibit the same functionality. It doesn't have much predictive power if vari-ability occurs as a consequence of a specific, non-linear behavior. We exemplify our discussion on some experiments with biased dice.
Directory of Open Access Journals (Sweden)
Kevin H. Knuth
2012-06-01
Full Text Available We present a simple and clear foundation for finite inference that unites and significantly extends the approaches of Kolmogorov and Cox. Our approach is based on quantifying lattices of logical statements in a way that satisfies general lattice symmetries. With other applications such as measure theory in mind, our derivations assume minimal symmetries, relying on neither negation nor continuity nor differentiability. Each relevant symmetry corresponds to an axiom of quantification, and these axioms are used to derive a unique set of quantifying rules that form the familiar probability calculus. We also derive a unique quantification of divergence, entropy and information.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
DEFF Research Database (Denmark)
Andersen, Jesper; Lawall, Julia
2010-01-01
A key issue in maintaining Linux device drivers is the need to keep them up to date with respect to evolutions in Linux internal libraries. Currently, there is little tool support for performing and documenting such changes. In this paper we present a tool, spdiff, that identifies common changes...... developers can use it to extract an abstract representation of the set of changes that others have made. Our experiments on recent changes in Linux show that the inferred generic patches are more concise than the corresponding patches found in commits to the Linux source tree while being safe with respect...
Inferring Evolutionary Scenarios for Protein Domain Compositions
Wiedenhoeft, John; Krause, Roland; Eulenstein, Oliver
Essential cellular processes are controlled by functional interactions of protein domains, which can be inferred from their evolutionary histories. Methods to reconstruct these histories are challenged by the complexity of reconstructing macroevolutionary events. In this work we model these events using a novel network-like structure that represents the evolution of domain combinations, called plexus. We describe an algorithm to find a plexus that represents the evolution of a given collection of domain histories as phylogenetic trees with the minimum number of macroevolutionary events, and demonstrate its effectiveness in practice.
Statistical Mechanics of Optimal Convex Inference in High Dimensions
Advani, Madhu; Ganguli, Surya
2016-07-01
A fundamental problem in modern high-dimensional data analysis involves efficiently inferring a set of P unknown model parameters governing the relationship between the inputs and outputs of N noisy measurements. Various methods have been proposed to regress the outputs against the inputs to recover the P parameters. What are fundamental limits on the accuracy of regression, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we optimally combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density α =(N /P )→∞ . However, these classical results are not relevant to modern high-dimensional inference problems, which instead occur at finite α . We employ replica theory to answer these questions for a class of inference algorithms, known in the statistics literature as M-estimators. These algorithms attempt to recover the P model parameters by solving an optimization problem involving minimizing the sum of a loss function that penalizes deviations between the data and model predictions, and a regularizer that leverages prior information about model parameters. Widely cherished algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference arise as special cases of M-estimators. Our analysis uncovers fundamental limits on the inference accuracy of a subclass of M-estimators corresponding to computationally tractable convex optimization problems. These limits generalize classical statistical theorems like the Cramer-Rao bound to the high-dimensional setting with prior information. We further discover the optimal M-estimator for log-concave signal and noise distributions; we demonstrate that it can achieve our high-dimensional limits on inference accuracy, while ML and MAP cannot. Intriguingly, in high dimensions, these optimal algorithms become computationally simpler than
Statistical inferences in phylogeography
DEFF Research Database (Denmark)
Nielsen, Rasmus; Beaumont, Mark A
2009-01-01
In conventional phylogeographic studies, historical demographic processes are elucidated from the geographical distribution of individuals represented on an inferred gene tree. However, the interpretation of gene trees in this context can be difficult as the same demographic/geographical process ...... may also be challenged by computational problems or poor model choice. In this review, we will describe the development of statistical methods in phylogeographic analysis, and discuss some of the challenges facing these methods....... can randomly lead to multiple different genealogies. Likewise, the same gene trees can arise under different demographic models. This problem has led to the emergence of many statistical methods for making phylogeographic inferences. A popular phylogeographic approach based on nested clade analysis...... is challenged by the fact that a certain amount of the interpretation of the data is left to the subjective choices of the user, and it has been argued that the method performs poorly in simulation studies. More rigorous statistical methods based on coalescence theory have been developed. However, these methods...
Moment inference from tomograms
Day-Lewis, F. D.; Chen, Y.; Singha, K.
2007-01-01
Time-lapse geophysical tomography can provide valuable qualitative insights into hydrologic transport phenomena associated with aquifer dynamics, tracer experiments, and engineered remediation. Increasingly, tomograms are used to infer the spatial and/or temporal moments of solute plumes; these moments provide quantitative information about transport processes (e.g., advection, dispersion, and rate-limited mass transfer) and controlling parameters (e.g., permeability, dispersivity, and rate coefficients). The reliability of moments calculated from tomograms is, however, poorly understood because classic approaches to image appraisal (e.g., the model resolution matrix) are not directly applicable to moment inference. Here, we present a semi-analytical approach to construct a moment resolution matrix based on (1) the classic model resolution matrix and (2) image reconstruction from orthogonal moments. Numerical results for radar and electrical-resistivity imaging of solute plumes demonstrate that moment values calculated from tomograms depend strongly on plume location within the tomogram, survey geometry, regularization criteria, and measurement error. Copyright 2007 by the American Geophysical Union.
A Full Bayesian Approach for Boolean Genetic Network Inference
Han, Shengtong; Wong, Raymond K. W.; Lee, Thomas C. M.; Shen, Linghao; Li, Shuo-Yen R.; Fan, Xiaodan
2014-01-01
Boolean networks are a simple but efficient model for describing gene regulatory systems. A number of algorithms have been proposed to infer Boolean networks. However, these methods do not take full consideration of the effects of noise and model uncertainty. In this paper, we propose a full Bayesian approach to infer Boolean genetic networks. Markov chain Monte Carlo algorithms are used to obtain the posterior samples of both the network structure and the related parameters. In addition to regular link addition and removal moves, which can guarantee the irreducibility of the Markov chain for traversing the whole network space, carefully constructed mixture proposals are used to improve the Markov chain Monte Carlo convergence. Both simulations and a real application on cell-cycle data show that our method is more powerful than existing methods for the inference of both the topology and logic relations of the Boolean network from observed data. PMID:25551820
A full bayesian approach for boolean genetic network inference.
Directory of Open Access Journals (Sweden)
Shengtong Han
Full Text Available Boolean networks are a simple but efficient model for describing gene regulatory systems. A number of algorithms have been proposed to infer Boolean networks. However, these methods do not take full consideration of the effects of noise and model uncertainty. In this paper, we propose a full Bayesian approach to infer Boolean genetic networks. Markov chain Monte Carlo algorithms are used to obtain the posterior samples of both the network structure and the related parameters. In addition to regular link addition and removal moves, which can guarantee the irreducibility of the Markov chain for traversing the whole network space, carefully constructed mixture proposals are used to improve the Markov chain Monte Carlo convergence. Both simulations and a real application on cell-cycle data show that our method is more powerful than existing methods for the inference of both the topology and logic relations of the Boolean network from observed data.
HieranoiDB: a database of orthologs inferred by Hieranoid
Kaduk, Mateusz; Riegler, Christian; Lemp, Oliver; Sonnhammer, Erik L. L.
2017-01-01
HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees. PMID:27742821
SPEEDY: An Eclipse-based IDE for invariant inference
Directory of Open Access Journals (Sweden)
David R. Cok
2014-04-01
Full Text Available SPEEDY is an Eclipse-based IDE for exploring techniques that assist users in generating correct specifications, particularly including invariant inference algorithms and tools. It integrates with several back-end tools that propose invariants and will incorporate published algorithms for inferring object and loop invariants. Though the architecture is language-neutral, current SPEEDY targets C programs. Building and using SPEEDY has confirmed earlier experience demonstrating the importance of showing and editing specifications in the IDEs that developers customarily use, automating as much of the production and checking of specifications as possible, and showing counterexample information directly in the source code editing environment. As in previous work, automation of specification checking is provided by back-end SMT solvers. However, reducing the effort demanded of software developers using formal methods also requires a GUI design that guides users in writing, reviewing, and correcting specifications and automates specification inference.
Fisher information and statistical inference for phase-type distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis
2011-01-01
This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we p...
Personalized microbial network inference via co-regularized spectral clustering
Imangaliyev, S.; Keijser, B.; Crielaard, W.; Tsivtsivadze, E.
2015-01-01
We use Human Microbiome Project (HMP) cohort (Peterson et al., 2009) to infer personalized oral microbial networks of healthy individuals. To determine clustering of individuals with similar microbial profiles, co-regularized spectral clustering algorithm is applied to the dataset. For each cluster
Personalized microbial network inference via co-regularized spectral clustering
Imangaliyev, S.; Keijser, B.J.; Crielaard, W.; Tsivtsivadze, E.; Zheng, H.; Hu, X.; Berrar, D.; Wang, Y.; Dubitzky, W.; Hao, J.K.; Cho, K.H.; Gilbert, D.
2014-01-01
We use Human Microbiome Project (HMP) cohort [1] to infer personalized oral microbial networks of healthy individuals. To determine clustering of individuals with similar microbial profiles, co-regularized spectral clustering algorithm is applied to the dataset. For each cluster we discovered, we co
Fisher information and statistical inference for phase-type distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis
2011-01-01
This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we...
Personalized microbial network inference via co-regularized spectral clustering
Imangaliyev, S.; Keijser, B.; Crielaard, W.; Tsivtsivadze, E.
2015-01-01
We use Human Microbiome Project (HMP) cohort (Peterson et al., 2009) to infer personalized oral microbial networks of healthy individuals. To determine clustering of individuals with similar microbial profiles, co-regularized spectral clustering algorithm is applied to the dataset. For each cluster
Inference in Probabilistic Logic Programs using Weighted CNF's
Fierens, Daan; Thon, Ingo; Gutmann, Bernd; De Raedt, Luc
2012-01-01
Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. Several classical probabilistic inference tasks (such as MAP and computing marginals) have not yet received a lot of attention for this formalism. The contribution of this paper is that we develop efficient inference algorithms for these tasks. This is based on a conversion of the probabilistic logic program and the query and evidence to a weighted CNF formula. This allows us to reduce the inference tasks to well-studied tasks such as weighted model counting. To solve such tasks, we employ state-of-the-art methods. We consider multiple methods for the conversion of the programs as well as for inference on the weighted CNF. The resulting approach is evaluated experimentally and shown to improve upon the state-of-the-art in probabilistic logic programming.
Institute of Scientific and Technical Information of China (English)
何小娟; 曾建潮
2012-01-01
在Bayesian统计推理理论的基础上,提出一种新的求解柔性车间调度问题的分布估计算法.首先,根据所有工件的工序排列顺序提取进化过程中种群的优良信息,建立一个不断更新的先验分布概率模型,再以相邻工序出现的频率为基础建立条件概率模型；然后,结合两个模型的信息使用Bayesian公式建立一个后验概率模型,该模型综合了进化过程中不断更新的优良信息和相邻工序出现的频率信息,可用以更好地指导产生新群体.仿真结果表明算法具有较好的寻优能力.%A new estimation of distribution algorithm for flexible job-shop scheduling problems based on Bayesian statistical inference theory is proposed. At first, according to the permutation of all operations, the model of priori distribution probability is built extracting the information of superior solutions updating, then the model of conditional probability is also built based on the frequencies of neighbor operations appearing. After then, the model of posterior probability is given by combining the above two models to guide new population generating with Bayesian formula. Such model is characteristic of guiding new population generating well, for it synthesizes the information both of updating knowledge and of neighbor operations appearing frequencies. The simulation results show that the proposed algorithm has the preferable search ability.
TGF-β induced epithelial-mesenchymal transition modeling
Xenitidis, P.; Seimenis, I.; Kakolyris, S.; Adamopoulos, A.
2015-09-01
Epithelial cells may undergo a process called epithelial to mesenchymal transition (EMT). During EMT, cells lose their epithelial characteristics and acquire a migratory ability. Transforming growth factor-beta (TGF-β) signaling is considered to play an important role in EMT by regulating a set of genes through a gene regulatory network (GRN). This work aims at TGF-β induced EMT GRN modeling using publicly available experimental data (gene expression microarray data). The time-series network identification (TSNI) algorithm was used for inferring the EMT GRN. Receiver operating characteristic (ROC) and precision-recall (P-R) curves were constructed and the areas under them were used for evaluating the algorithm performance regarding network inference.
Inferring attitudes from mindwandering.
Critcher, Clayton R; Gilovich, Thomas
2010-09-01
Self-perception theory posits that people understand their own attitudes and preferences much as they understand others', by interpreting the meaning of their behavior in light of the context in which it occurs. Four studies tested whether people also rely on unobservable "behavior," their mindwandering, when making such inferences. It is proposed here that people rely on the content of their mindwandering to decide whether it reflects boredom with an ongoing task or a reverie's irresistible pull. Having the mind wander to positive events, to concurrent as opposed to past activities, and to many events rather than just one tends to be attributed to boredom and therefore leads to perceived dissatisfaction with an ongoing task. Participants appeared to rely spontaneously on the content of their wandering minds as a cue to their attitudes, but not when an alternative cause for their mindwandering was made salient.
Bayesian inference in geomagnetism
Backus, George E.
1988-01-01
The inverse problem in empirical geomagnetic modeling is investigated, with critical examination of recently published studies. Particular attention is given to the use of Bayesian inference (BI) to select the damping parameter lambda in the uniqueness portion of the inverse problem. The mathematical bases of BI and stochastic inversion are explored, with consideration of bound-softening problems and resolution in linear Gaussian BI. The problem of estimating the radial magnetic field B(r) at the earth core-mantle boundary from surface and satellite measurements is then analyzed in detail, with specific attention to the selection of lambda in the studies of Gubbins (1983) and Gubbins and Bloxham (1985). It is argued that the selection method is inappropriate and leads to lambda values much larger than those that would result if a reasonable bound on the heat flow at the CMB were assumed.
Inferring the eccentricity distribution
Hogg, David W; Bovy, Jo
2010-01-01
Standard maximum-likelihood estimators for binary-star and exoplanet eccentricities are biased high, in the sense that the estimated eccentricity tends to be larger than the true eccentricity. As with most non-trivial observables, a simple histogram of estimated eccentricities is not a good estimate of the true eccentricity distribution. Here we develop and test a hierarchical probabilistic method for performing the relevant meta-analysis, that is, inferring the true eccentricity distribution, taking as input the likelihood functions for the individual-star eccentricities, or samplings of the posterior probability distributions for the eccentricities (under a given, uninformative prior). The method is a simple implementation of a hierarchical Bayesian model; it can also be seen as a kind of heteroscedastic deconvolution. It can be applied to any quantity measured with finite precision--other orbital parameters, or indeed any astronomical measurements of any kind, including magnitudes, parallaxes, or photometr...
Inferring deterministic causal relations
Daniusis, Povilas; Mooij, Joris; Zscheischler, Jakob; Steudel, Bastian; Zhang, Kun; Schoelkopf, Bernhard
2012-01-01
We consider two variables that are related to each other by an invertible function. While it has previously been shown that the dependence structure of the noise can provide hints to determine which of the two variables is the cause, we presently show that even in the deterministic (noise-free) case, there are asymmetries that can be exploited for causal inference. Our method is based on the idea that if the function and the probability density of the cause are chosen independently, then the distribution of the effect will, in a certain sense, depend on the function. We provide a theoretical analysis of this method, showing that it also works in the low noise regime, and link it to information geometry. We report strong empirical results on various real-world data sets from different domains.
Combinatorics of distance-based tree inference.
Pardi, Fabio; Gascuel, Olivier
2012-10-01
Several popular methods for phylogenetic inference (or hierarchical clustering) are based on a matrix of pairwise distances between taxa (or any kind of objects): The objective is to construct a tree with branch lengths so that the distances between the leaves in that tree are as close as possible to the input distances. If we hold the structure (topology) of the tree fixed, in some relevant cases (e.g., ordinary least squares) the optimal values for the branch lengths can be expressed using simple combinatorial formulae. Here we define a general form for these formulae and show that they all have two desirable properties: First, the common tree reconstruction approaches (least squares, minimum evolution), when used in combination with these formulae, are guaranteed to infer the correct tree when given enough data (consistency); second, the branch lengths of all the simple (nearest neighbor interchange) rearrangements of a tree can be calculated, optimally, in quadratic time in the size of the tree, thus allowing the efficient application of hill climbing heuristics. The study presented here is a continuation of that by Mihaescu and Pachter on branch length estimation [Mihaescu R, Pachter L (2008) Proc Natl Acad Sci USA 105:13206-13211]. The focus here is on the inference of the tree itself and on providing a basis for novel algorithms to reconstruct trees from distances.
Admissibility of logical inference rules
Rybakov, VV
1997-01-01
The aim of this book is to present the fundamental theoretical results concerning inference rules in deductive formal systems. Primary attention is focused on: admissible or permissible inference rules the derivability of the admissible inference rules the structural completeness of logics the bases for admissible and valid inference rules. There is particular emphasis on propositional non-standard logics (primary, superintuitionistic and modal logics) but general logical consequence relations and classical first-order theories are also considered. The book is basically self-contained and
Rahmati, Vahid; Kirmse, Knut; Marković, Dimitrije; Holthoff, Knut; Kiebel, Stefan J
2016-02-01
Calcium imaging has been used as a promising technique to monitor the dynamic activity of neuronal populations. However, the calcium trace is temporally smeared which restricts the extraction of quantities of interest such as spike trains of individual neurons. To address this issue, spike reconstruction algorithms have been introduced. One limitation of such reconstructions is that the underlying models are not informed about the biophysics of spike and burst generations. Such existing prior knowledge might be useful for constraining the possible solutions of spikes. Here we describe, in a novel Bayesian approach, how principled knowledge about neuronal dynamics can be employed to infer biophysical variables and parameters from fluorescence traces. By using both synthetic and in vitro recorded fluorescence traces, we demonstrate that the new approach is able to reconstruct different repetitive spiking and/or bursting patterns with accurate single spike resolution. Furthermore, we show that the high inference precision of the new approach is preserved even if the fluorescence trace is rather noisy or if the fluorescence transients show slow rise kinetics lasting several hundred milliseconds, and inhomogeneous rise and decay times. In addition, we discuss the use of the new approach for inferring parameter changes, e.g. due to a pharmacological intervention, as well as for inferring complex characteristics of immature neuronal circuits.
Detection of algorithmic trading
Bogoev, Dimitar; Karam, Arzé
2017-10-01
We develop a new approach to reflect the behavior of algorithmic traders. Specifically, we provide an analytical and tractable way to infer patterns of quote volatility and price momentum consistent with different types of strategies employed by algorithmic traders, and we propose two ratios to quantify these patterns. Quote volatility ratio is based on the rate of oscillation of the best ask and best bid quotes over an extremely short period of time; whereas price momentum ratio is based on identifying patterns of rapid upward or downward movement in prices. The two ratios are evaluated across several asset classes. We further run a two-stage Artificial Neural Network experiment on the quote volatility ratio; the first stage is used to detect the quote volatility patterns resulting from algorithmic activity, while the second is used to validate the quality of signal detection provided by our measure.
Universal Darwinism as a process of Bayesian inference
Directory of Open Access Journals (Sweden)
John Oberon Campbell
2016-06-01
Full Text Available Many of the mathematical frameworks describing natural selection are equivalent to Bayes’ Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians. As Bayesian inference can always be cast in terms of (variational free energy minimization, natural selection can be viewed as comprising two components: a generative model of an ‘experiment’ in the external world environment, and the results of that 'experiment' or the 'surprise' entailed by predicted and actual outcomes of the ‘experiment’. Minimization of free energy implies that the implicit measure of 'surprise' experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Universal Darwinism As a Process of Bayesian Inference.
Campbell, John O
2016-01-01
Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.
Resummed mean-field inference for strongly coupled data
Jacquin, Hugo; Rançon, A.
2016-10-01
We present a resummed mean-field approximation for inferring the parameters of an Ising or a Potts model from empirical, noisy, one- and two-point correlation functions. Based on a resummation of a class of diagrams of the small correlation expansion of the log-likelihood, the method outperforms standard mean-field inference methods, even when they are regularized. The inference is stable with respect to sampling noise, contrarily to previous works based either on the small correlation expansion, on the Bethe free energy, or on the mean-field and Gaussian models. Because it is mostly analytic, its complexity is still very low, requiring an iterative algorithm to solve for N auxiliary variables, that resorts only to matrix inversions and multiplications. We test our algorithm on the Sherrington-Kirkpatrick model submitted to a random external field and large random couplings, and demonstrate that even without regularization, the inference is stable across the whole phase diagram. In addition, the calculation leads to a consistent estimation of the entropy of the data and allows us to sample form the inferred distribution to obtain artificial data that are consistent with the empirical distribution.
Interactive Instruction in Bayesian Inference
DEFF Research Database (Denmark)
Khan, Azam; Breslav, Simon; Hornbæk, Kasper
2017-01-01
An instructional approach is presented to improve human performance in solving Bayesian inference problems. Starting from the original text of the classic Mammography Problem, the textual expression is modified and visualizations are added according to Mayer’s principles of instruction...... that an instructional approach to improving human performance in Bayesian inference is a promising direction....
Causal Inference and Developmental Psychology
Foster, E. Michael
2010-01-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…
Causal Inference and Developmental Psychology
Foster, E. Michael
2010-01-01
Causal inference is of central importance to developmental psychology. Many key questions in the field revolve around improving the lives of children and their families. These include identifying risk factors that if manipulated in some way would foster child development. Such a task inherently involves causal inference: One wants to know whether…
Harik, Georges
2010-01-01
We introduce a framework for representing a variety of interesting problems as inference over the execution of probabilistic model programs. We represent a "solution" to such a problem as a guide program which runs alongside the model program and influences the model program's random choices, leading the model program to sample from a different distribution than from its priors. Ideally the guide program influences the model program to sample from the posteriors given the evidence. We show how the KL- divergence between the true posterior distribution and the distribution induced by the guided model program can be efficiently estimated (up to an additive constant) by sampling multiple executions of the guided model program. In addition, we show how to use the guide program as a proposal distribution in importance sampling to statistically prove lower bounds on the probability of the evidence and on the probability of a hypothesis and the evidence. We can use the quotient of these two bounds as an estimate of ...
Tuning the Learning Rate for Stochastic Variational Inference
Institute of Scientific and Technical Information of China (English)
Xi-Ming Li; Ji-Hong Ouyang∗
2016-01-01
Stochastic variational inference (SVI) can learn topic models with very big corpora. It optimizes the variational objective by using the stochastic natural gradient algorithm with a decreasing learning rate. This rate is crucial for SVI;however, it is often tuned by hand in real applications. To address this, we develop a novel algorithm, which tunes the learning rate of each iteration adaptively. The proposed algorithm uses the Kullback-Leibler (KL) divergence to measure the similarity between the variational distribution with noisy update and that with batch update, and then optimizes the learning rates by minimizing the KL divergence. We apply our algorithm to two representative topic models: latent Dirichlet allocation and hierarchical Dirichlet process. Experimental results indicate that our algorithm performs better and converges faster than commonly used learning rates.
Statistical Inference and String Theory
Heckman, Jonathan J
2013-01-01
In this note we expose some surprising connections between string theory and statistical inference. We consider a large collective of agents sweeping out a family of nearby statistical models for an M-dimensional manifold of statistical fitting parameters. When the agents making nearby inferences align along a d-dimensional grid, we find that the pooled probability that the collective reaches a correct inference is the partition function of a non-linear sigma model in d dimensions. Stability under perturbations to the original inference scheme requires the agents of the collective to distribute along two dimensions. Conformal invariance of the sigma model corresponds to the condition of a stable inference scheme, directly leading to the Einstein field equations for classical gravity. By summing over all possible arrangements of the agents in the collective, we reach a string theory. We also use this perspective to quantify how much an observer can hope to learn about the internal geometry of a superstring com...
Kleinberg, Jon
2006-01-01
Algorithm Design introduces algorithms by looking at the real-world problems that motivate them. The book teaches students a range of design and analysis techniques for problems that arise in computing applications. The text encourages an understanding of the algorithm design process and an appreciation of the role of algorithms in the broader field of computer science.
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Algorithms and inferences: The challenge of multifactorial diseases
Energy Technology Data Exchange (ETDEWEB)
Elston, R.C. [Case Western Reserve Univ., Cleveland, OH (United States)
1997-02-01
On learning that I was to receive the Allan Award for 1996, I had mixed feelings of elation, dismay, and surprise. Elation, because receiving the Allan Award is an honor bestowed on so few; dismay, because I realized that it would mean I would have to face this audience with something interesting to say; and surprise, because the decision to give me the award must have been made shortly after Kevin Davis, the editor of Nature Genetics, published the following statement regarding a letter I had just coauthored: {open_quotes}We disagree with this approach for three reasons. It is intellectually wrong. . . . It is historically inconsistent. It is technically antediluvian{close_quotes}. The letter I had coauthored concerned the {open_quotes}genetic dissection of complex traits,{close_quotes} a subject that has in recent years received a surge of interest. I felt certain that the Awards Committee would want me to give an address on a topic of current interest, and what could be of more current interest in the area I am known to work in? And why would my views on this topic be of any interest if they are intellectually wrong, historically inconsistent, and technically antediluvian? So what should I talk about? 17 refs., 6 figs., 2 tabs.
An object-oriented cluster search algorithm
Energy Technology Data Exchange (ETDEWEB)
Silin, Dmitry; Patzek, Tad
2003-01-24
In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.
Inference-based procedural modeling of solids
Biggers, Keith
2011-11-01
As virtual environments become larger and more complex, there is an increasing need for more automated construction algorithms to support the development process. We present an approach for modeling solids by combining prior examples with a simple sketch. Our algorithm uses an inference-based approach to incrementally fit patches together in a consistent fashion to define the boundary of an object. This algorithm samples and extracts surface patches from input models, and develops a Petri net structure that describes the relationship between patches along an imposed parameterization. Then, given a new parameterized line or curve, we use the Petri net to logically fit patches together in a manner consistent with the input model. This allows us to easily construct objects of varying sizes and configurations using arbitrary articulation, repetition, and interchanging of parts. The result of our process is a solid model representation of the constructed object that can be integrated into a simulation-based environment. © 2011 Elsevier Ltd. All rights reserved.
Optimization methods for logical inference
Chandru, Vijay
2011-01-01
Merging logic and mathematics in deductive inference-an innovative, cutting-edge approach. Optimization methods for logical inference? Absolutely, say Vijay Chandru and John Hooker, two major contributors to this rapidly expanding field. And even though ""solving logical inference problems with optimization methods may seem a bit like eating sauerkraut with chopsticks. . . it is the mathematical structure of a problem that determines whether an optimization model can help solve it, not the context in which the problem occurs."" Presenting powerful, proven optimization techniques for logic in
Bootstrapping phylogenies inferred from rearrangement data
Directory of Open Access Journals (Sweden)
Lin Yu
2012-08-01
Full Text Available Abstract Background Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. Results We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Conclusions Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its
Institute of Scientific and Technical Information of China (English)
游友珍; 戴剑勇
2013-01-01
According to the influencing factors of open-pit mine production process and production cost,the mine production cost model is set up with adaptive fuzzy inference sys-tem,then,the parameters of the model are optimized by parallel genetic algorithm,which realize the minimization of production cost of open-pit mine. With the cement raw material mine as an example,it successfully realized the optimization of production cost and techni-cal economic parameters of open-pit mine,which not only effectively reduced the produc-tion cost of mine,but also provided important reference for production cost control of manu-facturing enterprises.%根据露天矿山生产工艺流程及生产成本影响因素，运用自适应模糊推理系统建立矿山生产成本系统模型，应用并行遗传算法优化模型参数，实现露天矿山生产成本最小化。并以水泥原料矿山为例，成功实现了露天矿山生产成本与技术经济参数的优化问题，这不仅有效地降低了矿山生产成本，而且为制造企业生产成本控制提供了重要参考价值。
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
On principles of inductive inference
Kostecki, Ryszard Paweł
2011-01-01
We propose an intersubjective epistemic approach to foundations of probability theory and statistical inference, based on relative entropy and category theory, and aimed to bypass the mathematical and conceptual problems of existing foundational approaches.
Joux, Antoine
2009-01-01
Illustrating the power of algorithms, Algorithmic Cryptanalysis describes algorithmic methods with cryptographically relevant examples. Focusing on both private- and public-key cryptographic algorithms, it presents each algorithm either as a textual description, in pseudo-code, or in a C code program.Divided into three parts, the book begins with a short introduction to cryptography and a background chapter on elementary number theory and algebra. It then moves on to algorithms, with each chapter in this section dedicated to a single topic and often illustrated with simple cryptographic applic
Type Inference for Guarded Recursive Data Types
Stuckey, Peter J.; Sulzmann, Martin
2005-01-01
We consider type inference for guarded recursive data types (GRDTs) -- a recent generalization of algebraic data types. We reduce type inference for GRDTs to unification under a mixed prefix. Thus, we obtain efficient type inference. Inference is incomplete because the set of type constraints allowed to appear in the type system is only a subset of those type constraints generated by type inference. Hence, inference only succeeds if the program is sufficiently type annotated. We present refin...
Implementing Deep Inference in Tom
Kahramanogullari, Ozan; Moreau, Pierre-Etienne; Reilles, Antoine
2005-01-01
ISSN 1430-211X; The calculus of structures is a proof theoretical formalism which generalizes sequent calculus with the feature of deep inference: in contrast to sequent calculus, the calculus of structures does not rely on the notion of main connective and, like in term rewriting, it permits the application of the inference rules at any depth inside a formula. Tom is a pattern matching processor that integrates term rewriting facilities into imperative languages. In this paper, relying on th...
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
Automated weighing by sequential inference in dynamic environments
Martin, A D
2015-01-01
We demonstrate sequential mass inference of a suspended bag of milk powder from simulated measurements of the vertical force component at the pivot while the bag is being filled. We compare the predictions of various sequential inference methods both with and without a physics model to capture the system dynamics. We find that non-augmented and augmented-state unscented Kalman filters (UKFs) in conjunction with a physics model of a pendulum of varying mass and length provide rapid and accurate predictions of the milk powder mass as a function of time. The UKFs outperform the other method tested - a particle filter. Moreover, inference methods which incorporate a physics model outperform equivalent algorithms which do not.
Inferring time derivatives including cell growth rates using Gaussian processes
Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta
2016-12-01
Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.
Abductive inference and delusional belief.
Coltheart, Max; Menzies, Peter; Sutton, John
2010-01-01
Delusional beliefs have sometimes been considered as rational inferences from abnormal experiences. We explore this idea in more detail, making the following points. First, the abnormalities of cognition that initially prompt the entertaining of a delusional belief are not always conscious and since we prefer to restrict the term "experience" to consciousness we refer to "abnormal data" rather than "abnormal experience". Second, we argue that in relation to many delusions (we consider seven) one can clearly identify what the abnormal cognitive data are which prompted the delusion and what the neuropsychological impairment is which is responsible for the occurrence of these data; but one can equally clearly point to cases where this impairment is present but delusion is not. So the impairment is not sufficient for delusion to occur: a second cognitive impairment, one that affects the ability to evaluate beliefs, must also be present. Third (and this is the main thrust of our paper), we consider in detail what the nature of the inference is that leads from the abnormal data to the belief. This is not deductive inference and it is not inference by enumerative induction; it is abductive inference. We offer a Bayesian account of abductive inference and apply it to the explanation of delusional belief.
Active inference, communication and hermeneutics.
Friston, Karl J; Frith, Christopher D
2015-07-01
Hermeneutics refers to interpretation and translation of text (typically ancient scriptures) but also applies to verbal and non-verbal communication. In a psychological setting it nicely frames the problem of inferring the intended content of a communication. In this paper, we offer a solution to the problem of neural hermeneutics based upon active inference. In active inference, action fulfils predictions about how we will behave (e.g., predicting we will speak). Crucially, these predictions can be used to predict both self and others--during speaking and listening respectively. Active inference mandates the suppression of prediction errors by updating an internal model that generates predictions--both at fast timescales (through perceptual inference) and slower timescales (through perceptual learning). If two agents adopt the same model, then--in principle--they can predict each other and minimise their mutual prediction errors. Heuristically, this ensures they are singing from the same hymn sheet. This paper builds upon recent work on active inference and communication to illustrate perceptual learning using simulated birdsongs. Our focus here is the neural hermeneutics implicit in learning, where communication facilitates long-term changes in generative models that are trying to predict each other. In other words, communication induces perceptual learning and enables others to (literally) change our minds and vice versa. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Bayesian Estimation and Inference Using Stochastic Electronics.
Thakur, Chetan Singh; Afshar, Saeed; Wang, Runchun M; Hamilton, Tara J; Tapson, Jonathan; van Schaik, André
2016-01-01
In this paper, we present the implementation of two types of Bayesian inference problems to demonstrate the potential of building probabilistic algorithms in hardware using single set of building blocks with the ability to perform these computations in real time. The first implementation, referred to as the BEAST (Bayesian Estimation and Stochastic Tracker), demonstrates a simple problem where an observer uses an underlying Hidden Markov Model (HMM) to track a target in one dimension. In this implementation, sensors make noisy observations of the target position at discrete time steps. The tracker learns the transition model for target movement, and the observation model for the noisy sensors, and uses these to estimate the target position by solving the Bayesian recursive equation online. We show the tracking performance of the system and demonstrate how it can learn the observation model, the transition model, and the external distractor (noise) probability interfering with the observations. In the second implementation, referred to as the Bayesian INference in DAG (BIND), we show how inference can be performed in a Directed Acyclic Graph (DAG) using stochastic circuits. We show how these building blocks can be easily implemented using simple digital logic gates. An advantage of the stochastic electronic implementation is that it is robust to certain types of noise, which may become an issue in integrated circuit (IC) technology with feature sizes in the order of tens of nanometers due to their low noise margin, the effect of high-energy cosmic rays and the low supply voltage. In our framework, the flipping of random individual bits would not affect the system performance because information is encoded in a bit stream.
Logical inference techniques for loop parallelization
DEFF Research Database (Denmark)
Oancea, Cosmin Eugen; Rauchwerger, Lawrence
2012-01-01
This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates...... the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... ( F(S) => S = {} ). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates F(S) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order...
Network Topology Inference from Spectral Templates
Segarra, Santiago; Mateos, Gonzalo; Ribeiro, Alejandro
2016-01-01
We address the problem of identifying a graph structure from the observation of signals defined on its nodes. Fundamentally, the unknown graph encodes direct relationships between signal elements, which we aim to recover from observable indirect relationships generated by a diffusion process on the graph. The fresh look advocated here permeates benefits from convex optimization and stationarity of graph signals, in order to identify the graph shift operator (a matrix representation of the graph) given only its eigenvectors. These spectral templates can be obtained, e.g., from the sample covariance of independent graph signals diffused on the sought network. The novel idea is to find a graph shift that, while being consistent with the provided spectral information, endows the network with certain desired properties such as sparsity. To that end we develop efficient inference algorithms stemming from provably-tight convex relaxations of natural nonconvex criteria, particularizing the results for two shifts: the...
Inferring Networks of Diffusion and Influence
Gomez-Rodriguez, Manuel; Krause, Andreas
2010-01-01
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably near-optimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a...
Directory of Open Access Journals (Sweden)
Oliver Serang
Full Text Available Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called "causal independence". For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to O(k log(k2 and the space to O(k log(k where k is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.
Mathematical algorithms for approximate reasoning
Murphy, John H.; Chay, Seung C.; Downs, Mary M.
1988-01-01
Most state of the art expert system environments contain a single and often ad hoc strategy for approximate reasoning. Some environments provide facilities to program the approximate reasoning algorithms. However, the next generation of expert systems should have an environment which contain a choice of several mathematical algorithms for approximate reasoning. To meet the need for validatable and verifiable coding, the expert system environment must no longer depend upon ad hoc reasoning techniques but instead must include mathematically rigorous techniques for approximate reasoning. Popular approximate reasoning techniques are reviewed, including: certainty factors, belief measures, Bayesian probabilities, fuzzy logic, and Shafer-Dempster techniques for reasoning. A group of mathematically rigorous algorithms for approximate reasoning are focused on that could form the basis of a next generation expert system environment. These algorithms are based upon the axioms of set theory and probability theory. To separate these algorithms for approximate reasoning various conditions of mutual exclusivity and independence are imposed upon the assertions. Approximate reasoning algorithms presented include: reasoning with statistically independent assertions, reasoning with mutually exclusive assertions, reasoning with assertions that exhibit minimum overlay within the state space, reasoning with assertions that exhibit maximum overlay within the state space (i.e. fuzzy logic), pessimistic reasoning (i.e. worst case analysis), optimistic reasoning (i.e. best case analysis), and reasoning with assertions with absolutely no knowledge of the possible dependency among the assertions. A robust environment for expert system construction should include the two modes of inference: modus ponens and modus tollens. Modus ponens inference is based upon reasoning towards the conclusion in a statement of logical implication, whereas modus tollens inference is based upon reasoning away
Inferring Taxi Status Using GPS Trajectories
Zhu, Yin; Zhang, Liuhang; Santani, Darshan; Xie, Xing; Yang, Qiang
2012-01-01
In this paper, we infer the statuses of a taxi, consisting of occupied, non-occupied and parked, in terms of its GPS trajectory. The status information can enable urban computing for improving a city's transportation systems and land use planning. In our solution, we first identify and extract a set of effective features incorporating the knowledge of a single trajectory, historical trajectories and geographic data like road network. Second, a parking status detection algorithm is devised to find parking places (from a given trajectory), dividing a trajectory into segments (i.e., sub-trajectories). Third, we propose a two-phase inference model to learn the status (occupied or non-occupied) of each point from a taxi segment. This model first uses the identified features to train a local probabilistic classifier and then carries out a Hidden Semi-Markov Model (HSMM) for globally considering long term travel patterns. We evaluated our method with a large-scale real-world trajectory dataset generated by 600 taxis...
Hierarchical Bayesian inference in the visual cortex
Lee, Tai Sing; Mumford, David
2003-07-01
Traditional views of visual processing suggest that early visual neurons in areas V1 and V2 are static spatiotemporal filters that extract local features from a visual scene. The extracted information is then channeled through a feedforward chain of modules in successively higher visual areas for further analysis. Recent electrophysiological recordings from early visual neurons in awake behaving monkeys reveal that there are many levels of complexity in the information processing of the early visual cortex, as seen in the long-latency responses of its neurons. These new findings suggest that activity in the early visual cortex is tightly coupled and highly interactive with the rest of the visual system. They lead us to propose a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system. In this framework, the recurrent feedforward/feedback loops in the cortex serve to integrate top-down contextual priors and bottom-up observations so as to implement concurrent probabilistic inference along the visual hierarchy. We suggest that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations. We review some recent neurophysiological evidences that support the plausibility of these ideas. 2003 Optical Society of America
Inferring biochemical reaction pathways: the case of the gemcitabine pharmacokinetics
Directory of Open Access Journals (Sweden)
Lecca Paola
2012-05-01
Full Text Available Abstract Background The representation of a biochemical system as a network is the precursor of any mathematical model of the processes driving the dynamics of that system. Pharmacokinetics uses mathematical models to describe the interactions between drug, and drug metabolites and targets and through the simulation of these models predicts drug levels and/or dynamic behaviors of drug entities in the body. Therefore, the development of computational techniques for inferring the interaction network of the drug entities and its kinetic parameters from observational data is raising great interest in the scientific community of pharmacologists. In fact, the network inference is a set of mathematical procedures deducing the structure of a model from the experimental data associated to the nodes of the network of interactions. In this paper, we deal with the inference of a pharmacokinetic network from the concentrations of the drug and its metabolites observed at discrete time points. Results The method of network inference presented in this paper is inspired by the theory of time-lagged correlation inference with regard to the deduction of the interaction network, and on a maximum likelihood approach with regard to the estimation of the kinetic parameters of the network. Both network inference and parameter estimation have been designed specifically to identify systems of biotransformations, at the biochemical level, from noisy time-resolved experimental data. We use our inference method to deduce the metabolic pathway of the gemcitabine. The inputs to our inference algorithm are the experimental time series of the concentration of gemcitabine and its metabolites. The output is the set of reactions of the metabolic network of the gemcitabine. Conclusions Time-lagged correlation based inference pairs up to a probabilistic model of parameter inference from metabolites time series allows the identification of the microscopic pharmacokinetics and
Gene tree correction for reconciliation and species tree inference
Directory of Open Access Journals (Sweden)
Swenson Krister M
2012-11-01
Full Text Available Abstract Background Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in “embedding” inferred gene trees into a known species tree, revealing the evolution of the gene family by duplications and losses. When a species tree is not known, a natural algorithmic problem is to infer a species tree from a set of gene trees, such that the corresponding reconciliation minimizes the number of duplications and/or losses. The main drawback of reconciliation is that the inferred evolutionary scenario is strongly dependent on the considered gene trees, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses. Results In this paper, we take advantage of certain gene trees’ properties in order to preprocess them for reconciliation or species tree inference. We flag certain duplication vertices of a gene tree, the “non-apparent duplication” (NAD vertices, as resulting from the misplacement of leaves. In the case of species tree inference, we develop a polynomial-time heuristic for removing the minimum number of species leading to a set of gene trees that exhibit no NAD vertices with respect to at least one species tree. In the case of reconciliation, we consider the optimization problem of removing the minimum number of leaves or species leading to a tree without any NAD vertex. We develop a polynomial-time algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case.
Hougardy, Stefan
2016-01-01
Algorithms play an increasingly important role in nearly all fields of mathematics. This book allows readers to develop basic mathematical abilities, in particular those concerning the design and analysis of algorithms as well as their implementation. It presents not only fundamental algorithms like the sieve of Eratosthenes, the Euclidean algorithm, sorting algorithms, algorithms on graphs, and Gaussian elimination, but also discusses elementary data structures, basic graph theory, and numerical questions. In addition, it provides an introduction to programming and demonstrates in detail how to implement algorithms in C++. This textbook is suitable for students who are new to the subject and covers a basic mathematical lecture course, complementing traditional courses on analysis and linear algebra. Both authors have given this "Algorithmic Mathematics" course at the University of Bonn several times in recent years.
Bayesian inference from count data using discrete uniform priors.
Comoglio, Federico; Fracchia, Letizia; Rinaldi, Maurizio
2013-01-01
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.
Directory of Open Access Journals (Sweden)
Greg Jensen
Full Text Available Transitive inference (the ability to infer that B > D given that B > C and C > D is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1 representing stimulus positions along a unit span using beta distributions, (2 treating positive and negative feedback asymmetrically, and (3 updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models and its computational efficiency (when compared to full Markov decision process implementations suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
Inference-Based Surface Reconstruction of Cluttered Environments
Biggers, K.
2012-08-01
We present an inference-based surface reconstruction algorithm that is capable of identifying objects of interest among a cluttered scene, and reconstructing solid model representations even in the presence of occluded surfaces. Our proposed approach incorporates a predictive modeling framework that uses a set of user-provided models for prior knowledge, and applies this knowledge to the iterative identification and construction process. Our approach uses a local to global construction process guided by rules for fitting high-quality surface patches obtained from these prior models. We demonstrate the application of this algorithm on several example data sets containing heavy clutter and occlusion. © 2012 IEEE.
Abrams, D.; Williams, C.
1999-01-01
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases for which all know classical algorithms require exponential time.
Tel, G.
1993-01-01
We define the notion of total algorithms for networks of processes. A total algorithm enforces that a "decision" is taken by a subset of the processes, and that participation of all processes is required to reach this decision. Total algorithms are an important building block in the design of distri
Modelling, simulation and inference for multivariate time series of counts
Veraart, Almut E. D.
2016-01-01
This article presents a new continuous-time modelling framework for multivariate time series of counts which have an infinitely divisible marginal distribution. The model is based on a mixed moving average process driven by L\\'{e}vy noise - called a trawl process - where the serial correlation and the cross-sectional dependence are modelled independently of each other. Such processes can exhibit short or long memory. We derive a stochastic simulation algorithm and a statistical inference meth...
Visual recognition and inference using dynamic overcomplete sparse learning.
Murray, Joseph F; Kreutz-Delgado, Kenneth
2007-09-01
We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.
Reinforcement and inference in cross-situational word learning
Directory of Open Access Journals (Sweden)
Paulo F.C. Tilles
2013-11-01
Full Text Available Cross-situational word learning is based on the notion that a learner can determine the referent of a word by finding something in common across many observed uses of that word. Here we propose an adaptive learning algorithm that contains a parameter that controls the strength of the reinforcement applied to associations between concurrent words and referents, and a parameter that regulates inference, which includes built-in biases, such as mutual exclusivity, and information of past learning events. By adjusting these parameters so that the model predictions agree with data from representative experiments on cross-situational word learning, we were able to explain the learning strategies adopted by the participants of those experiments in terms of a trade-off between reinforcement and inference. These strategies can vary wildly depending on the conditions of the experiments. For instance, for fast mapping experiments (i.e., the correct referent could, in principle, be inferred in a single observation inference is prevalent, whereas for segregated contextual diversity experiments (i.e., the referents are separated in groups and are exhibited with members of their groups only reinforcement is predominant. Other experiments are explained with more balanced doses of reinforcement and inference.
Neural model of gene regulatory network: a survey on supportive meta-heuristics.
Biswas, Surama; Acharyya, Sriyankar
2016-06-01
Gene regulatory network (GRN) is produced as a result of regulatory interactions between different genes through their coded proteins in cellular context. Having immense importance in disease detection and drug finding, GRN has been modelled through various mathematical and computational schemes and reported in survey articles. Neural and neuro-fuzzy models have been the focus of attraction in bioinformatics. Predominant use of meta-heuristic algorithms in training neural models has proved its excellence. Considering these facts, this paper is organized to survey neural modelling schemes of GRN and the efficacy of meta-heuristic algorithms towards parameter learning (i.e. weighting connections) within the model. This survey paper renders two different structure-related approaches to infer GRN which are global structure approach and substructure approach. It also describes two neural modelling schemes, such as artificial neural network/recurrent neural network based modelling and neuro-fuzzy modelling. The meta-heuristic algorithms applied so far to learn the structure and parameters of neutrally modelled GRN have been reviewed here.
Inferring interaction partners from protein sequences
Bitbol, Anne-Florence; Dwyer, Robert S.; Colwell, Lucy J.; Wingreen, Ned S.
2016-01-01
Specific protein−protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm’s performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data. PMID:27663738
Locative inferences in medical texts.
Mayer, P S; Bailey, G H; Mayer, R J; Hillis, A; Dvoracek, J E
1987-06-01
Medical research relies on epidemiological studies conducted on a large set of clinical records that have been collected from physicians recording individual patient observations. These clinical records are recorded for the purpose of individual care of the patient with little consideration for their use by a biostatistician interested in studying a disease over a large population. Natural language processing of clinical records for epidemiological studies must deal with temporal, locative, and conceptual issues. This makes text understanding and data extraction of clinical records an excellent area for applied research. While much has been done in making temporal or conceptual inferences in medical texts, parallel work in locative inferences has not been done. This paper examines the locative inferences as well as the integration of temporal, locative, and conceptual issues in the clinical record understanding domain by presenting an application that utilizes two key concepts in its parsing strategy--a knowledge-based parsing strategy and a minimal lexicon.
Sick, the spectroscopic inference crank
Casey, Andrew R
2016-01-01
There exists an inordinate amount of spectral data in both public and private astronomical archives which remain severely under-utilised. The lack of reliable open-source tools for analysing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this Article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick can be used to provide a nearest-neighbour estimate of model parameters, a numerically optimised point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalise on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-di...
Eight challenges in phylodynamic inference
Directory of Open Access Journals (Sweden)
Simon D.W. Frost
2015-03-01
Full Text Available The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data.
Perception, illusions and Bayesian inference.
Nour, Matthew M; Nour, Joseph M
2015-01-01
Descriptive psychopathology makes a distinction between veridical perception and illusory perception. In both cases a perception is tied to a sensory stimulus, but in illusions the perception is of a false object. This article re-examines this distinction in light of new work in theoretical and computational neurobiology, which views all perception as a form of Bayesian statistical inference that combines sensory signals with prior expectations. Bayesian perceptual inference can solve the 'inverse optics' problem of veridical perception and provides a biologically plausible account of a number of illusory phenomena, suggesting that veridical and illusory perceptions are generated by precisely the same inferential mechanisms.
Object-Oriented Type Inference
DEFF Research Database (Denmark)
Schwartzbach, Michael Ignatieff; Palsberg, Jens
1991-01-01
We present a new approach to inferring types in untyped object-oriented programs with inheritance, assignments, and late binding. It guarantees that all messages are understood, annotates the program with type information, allows polymorphic methods, and can be used as the basis of an op-timizing......We present a new approach to inferring types in untyped object-oriented programs with inheritance, assignments, and late binding. It guarantees that all messages are understood, annotates the program with type information, allows polymorphic methods, and can be used as the basis of an op...
Directory of Open Access Journals (Sweden)
Sandra Mara Guse Scós Venske
2011-04-01
Full Text Available Num processo de inferência, busca-se encontrar uma resposta genéricabaseando-se na análise de uma amostra de fatos. A inferência gramatical visa obter uma gramática para uma determinada linguagem baseada em exemplos de cadeias que pertencem ou não à linguagem analisada. Neste trabalho propõe-se um algoritmo para o uso deinferência em gramáticas livres de contexto baseadas em uma cadeia exemplo não pertencente à linguagem. A técnica evolutiva de algoritmos genéticos foi aplicada no processo com o objetivo de auxiliar na criação das regras de produção para as gramáticas,atendendo às restrições impostas pela cadeia exemplo. Uma aplicação do algoritmo de inferência está relacionada a linguagens que possuem padrões específicos pré-definidos, como é o caso de documentos XML no contexto de esquemas. A eficiência do algoritmo proposto é mostrada através de pequenos testes onde são obtidas gramáticas geneticamentegeradas.Inference process try to find a generic answer based on a sample of facts. This process aims to achieve a grammar for a particular language based in string samples that belong or not belong to thespecific language. In this work we propose an algorithm for context-free grammars inference based in only one sample string that not belongs to the language. The genetic algorithm evolutive technical was applied in order to assist the generation of productionrules for grammars. This process must to validate the sample string restrictions. Inference algorithm proposed can be applied in computer languages that have specific pre-defined standards, like schemas for XML documents. Suitability of the proposed algorithm is shown by small experiments where grammars are genetically obtained and generated.
Network inference via adaptive optimal design
Directory of Open Access Journals (Sweden)
Stigter Johannes D
2012-09-01
Full Text Available Abstract Background Current research in network reverse engineering for genetic or metabolic networks very often does not include a proper experimental and/or input design. In this paper we address this issue in more detail and suggest a method that includes an iterative design of experiments based, on the most recent data that become available. The presented approach allows a reliable reconstruction of the network and addresses an important issue, i.e., the analysis and the propagation of uncertainties as they exist in both the data and in our own knowledge. These two types of uncertainties have their immediate ramifications for the uncertainties in the parameter estimates and, hence, are taken into account from the very beginning of our experimental design. Findings The method is demonstrated for two small networks that include a genetic network for mRNA synthesis and degradation and an oscillatory network describing a molecular network underlying adenosine 3’-5’ cyclic monophosphate (cAMP as observed in populations of Dyctyostelium cells. In both cases a substantial reduction in parameter uncertainty was observed. Extension to larger scale networks is possible but needs a more rigorous parameter estimation algorithm that includes sparsity as a constraint in the optimization procedure. Conclusion We conclude that a careful experiment design very often (but not always pays off in terms of reliability in the inferred network topology. For large scale networks a better parameter estimation algorithm is required that includes sparsity as an additional constraint. These algorithms are available in the literature and can also be used in an adaptive optimal design setting as demonstrated in this paper.
A grammar inference approach for predicting kinase specific phosphorylation sites.
Datta, Sutapa; Mukhopadhyay, Subhasis
2015-01-01
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael
2009-01-01
Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....
Bayesian inference for Hawkes processes
DEFF Research Database (Denmark)
Rasmussen, Jakob Gulddahl
The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...
Bayesian inference for Hawkes processes
DEFF Research Database (Denmark)
Rasmussen, Jakob Gulddahl
2013-01-01
The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...
Bayesian inference for Hawkes processes
DEFF Research Database (Denmark)
Rasmussen, Jakob Gulddahl
The Hawkes process is a practically and theoretically important class of point processes, but parameter-estimation for such a process can pose various problems. In this paper we explore and compare two approaches to Bayesian inference. The first approach is based on the so-called conditional...
On principles of inductive inference
Kostecki, Ryszard Paweł
2011-01-01
We discuss the mathematical and conceptual problems of main approaches to foundations of probability theory and statistical inference and propose new foundational approach, aimed to improve the mathematical structure of the theory and to bypass the old conceptual problems. In particular, we introduce the intersubjective interpretation of probability, which is designed to deal with the troubles of `subjective' and `objective' bayesian interpretations.
Regular inference as vertex coloring
Costa Florêncio, C.; Verwer, S.
2012-01-01
This paper is concerned with the problem of supervised learning of deterministic finite state automata, in the technical sense of identification in the limit from complete data, by finding a minimal DFA consistent with the data (regular inference). We solve this problem by translating it in its enti
Type inference for COBOL systems
Deursen, A. van; Moonen, L.M.F.
1998-01-01
Types are a good starting point for various software reengineering tasks. Unfortunately, programs requiring reengineering most desperately are written in languages without an adequate type system (such as COBOL). To solve this problem, we propose a method of automated type inference for these lang
Regular inference as vertex coloring
Costa Florêncio, C.; Verwer, S.
2012-01-01
This paper is concerned with the problem of supervised learning of deterministic finite state automata, in the technical sense of identification in the limit from complete data, by finding a minimal DFA consistent with the data (regular inference). We solve this problem by translating it in its
Statistical inference on variance components
Verdooren, L.R.
1988-01-01
In several sciences but especially in animal and plant breeding, the general mixed model with fixed and random effects plays a great role. Statistical inference on variance components means tests of hypotheses about variance components, constructing confidence intervals for them, estimating them,
Covering, Packing and Logical Inference
1993-10-01
of Operations Research 43 (1993). [34] *Hooker, J. N., Generalized resolution for 0-1 linear inequalities, Annals of Mathematics and A 16 271-286. [35...Hooker, J. N. and C. Fedjki, Branch-and-cut solution of inference prob- lems in propositional logic, Annals of Mathematics and AI 1 (1990) 123-140. [40
Mathematical Programming and Logical Inference
1990-12-01
solution of inference problems in propositional logic, to appear in Annals of Mathematics and Al. (271 Howard, R. A., and J. E. Matheson, Influence...1981). (281 Jeroslow, R., and J. Wang, Solving propositional satisfiability problems, to appear in Annals of Mathematics and Al. [29] Nilsson, N. J
An Introduction to Causal Inference
2009-11-02
legitimize causal inference, has removed causation from its natural habitat, and distorted its face beyond recognition. This exclusivist attitude is...In contrast, when the mediation problem is approached from an exclusivist potential-outcome viewpoint, void of the structural guidance of Eq. (28
DEFF Research Database (Denmark)
Picchini, Umberto; Forman, Julie Lyng
2016-01-01
In recent years, dynamical modelling has been provided with a range of breakthrough methods to perform exact Bayesian inference. However, it is often computationally unfeasible to apply exact statistical methodologies in the context of large data sets and complex models. This paper considers...... a nonlinear stochastic differential equation model observed with correlated measurement errors and an application to protein folding modelling. An approximate Bayesian computation (ABC)-MCMC algorithm is suggested to allow inference for model parameters within reasonable time constraints. The ABC algorithm...... applications. A simulation study is conducted to compare our strategy with exact Bayesian inference, the latter resulting two orders of magnitude slower than ABC-MCMC for the considered set-up. Finally, the ABC algorithm is applied to a large size protein data. The suggested methodology is fairly general...
Directory of Open Access Journals (Sweden)
Ji Wei
2010-10-01
Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.
Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models
Vats, Divyanshu
2011-01-01
Graphical models compactly capture stochastic dependencies amongst a collection of random variables using a graph. Inference over graphical models corresponds to finding marginal probability distributions given joint probability distributions. Several inference algorithms rely on iterative message passing between nodes. Although these algorithms can be generalized so that the message passing occurs between clusters of nodes, there are limited frameworks for finding such clusters. Moreover, current frameworks rely on finding clusters that are overlapping. This increases the computational complexity of finding clusters since the edges over a graph with overlapping clusters must be chosen carefully to avoid inconsistencies in the marginal distribution computations. In this paper, we propose a framework for finding clusters in a graph for generalized inference so that the clusters are \\emph{non-overlapping}. Given an undirected graph, we first derive a linear time algorithm for constructing a block-tree, a tree-s...
MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics
Feroz, F; Bridges, M
2008-01-01
We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focussing on the extension of the vanilla $\\Lambda$CDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software, which is fully parallelized using MPI and includes an inte...
Lawrence, C.; Lin, L.; Lisiecki, L. E.; Khider, D.
2014-12-01
The broad goal of this presentation is to demonstrate the utility of probabilistic generative models to capture investigators' knowledge of geological processes and proxy data to draw statistical inferences about unobserved paleoclimatological events. We illustrate how this approach forces investigators to be explicit about their assumptions, and about how probability theory yields results that are a mathematical consequence of these assumptions and the data. We illustrate these ideas with the HMM-Match model that infers common times of sediment deposition in two records and the uncertainty in these inferences in the form of confidence bands. HMM-Match models the sedimentation processes that led to proxy data measured in marine sediment cores. This Bayesian model has three components: 1) a generative probabilistic model that proceeds from the underlying geophysical and geochemical events, specifically the sedimentation events to the generation the proxy data Sedimentation ---> Proxy Data ; 2) a recursive algorithm that reverses the logic of the model to yield inference about the unobserved sedimentation events and the associated alignment of the records based on proxy data Proxy Data ---> Sedimentation (Alignment) ; 3) an expectation maximization algorithm for estimating two unknown parameters. We applied HMM-Match to align 35 Late Pleistocene records to a global benthic d18Ostack and found that the mean width of 95% confidence intervals varies between 3-23 kyr depending on the resolution and noisiness of the core's d18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. Results from this algorithm will allow researchers to examine the robustness of their conclusions with respect to alignment uncertainty. Figure 1 shows the confidence bands for one low resolution record.
Position and orientation inference via on-board triangulation.
Advani, Madhu; Weile, Daniel S
2017-01-01
This work proposes a new approach to determine the spatial location and orientation of an object using measurements performed on the object itself. The on-board triangulation algorithm we outline could be implemented in lieu of, or in addition to, well-known alternatives such as Global Positioning System (GPS) or standard triangulation, since both of these correspond to significantly different geometric pictures and necessitate different hardware and algorithms. We motivate the theory by describing situations in which on-board triangulation would be useful and even preferable to standard methods. The on-board triangulation algorithm we outline involves utilizing dumb beacons which broadcast omnidirectional single frequency radio waves, and smart antenna arrays on the object itself to infer the direction of the beacon signals, which may be used for onboard calculation of the position and orientation of the object. Numerical examples demonstrate the utility of the method and its noise tolerance.
An application of Bayesian inference for solar-like pulsators
Benomar, O.
2008-12-01
As the amount of data collected by space-borne asteroseismic instruments (such as CoRoT and Kepler) increases drastically, it will be useful to have automated processes to extract a maximum of information from these data. The use of a Bayesian approach could be very help- ful for this goal. Only a few attempts have been made in this way (e.g. Brewer et al. 2007). We propose to use Markov Chain Monte Carlo simulations (MCMC) with Metropolis-Hasting (MH) based algorithms to infer the main stellar oscillation parameters from the power spec- trum, in the case of solar-like pulsators. Given a number of modes to be fitted, the algorithm is able to give the best set of parameters (frequency, linewidth, amplitude, rotational split- ting) corresponding to a chosen input model. We illustrate this algorithm with one of the first CoRoT targets: HD 49933.
Bayesian inference on EMRI signals using low frequency approximations
Ali, Asad; Meyer, Renate; Röver, Christian; 10.1088/0264-9381/29/14/145014
2013-01-01
Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation meth...
Ha, Jeongmok; Jeong, Hong
2016-07-01
This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.
Inferring connectivity in networked dynamical systems: Challenges using Granger causality
Lusch, Bethany; Maia, Pedro D.; Kutz, J. Nathan
2016-09-01
Determining the interactions and causal relationships between nodes in an unknown networked dynamical system from measurement data alone is a challenging, contemporary task across the physical, biological, and engineering sciences. Statistical methods, such as the increasingly popular Granger causality, are being broadly applied for data-driven discovery of connectivity in fields from economics to neuroscience. A common version of the algorithm is called pairwise-conditional Granger causality, which we systematically test on data generated from a nonlinear model with known causal network structure. Specifically, we simulate networked systems of Kuramoto oscillators and use the Multivariate Granger Causality Toolbox to discover the underlying coupling structure of the system. We compare the inferred results to the original connectivity for a wide range of parameters such as initial conditions, connection strengths, community structures, and natural frequencies. Our results show a significant systematic disparity between the original and inferred network, unless the true structure is extremely sparse or dense. Specifically, the inferred networks have significant discrepancies in the number of edges and the eigenvalues of the connectivity matrix, demonstrating that they typically generate dynamics which are inconsistent with the ground truth. We provide a detailed account of the dynamics for the Erdős-Rényi network model due to its importance in random graph theory and network science. We conclude that Granger causal methods for inferring network structure are highly suspect and should always be checked against a ground truth model. The results also advocate the need to perform such comparisons with any network inference method since the inferred connectivity results appear to have very little to do with the ground truth system.
Spontaneous evaluative inferences and their relationship to spontaneous trait inferences.
Schneid, Erica D; Carlston, Donal E; Skowronski, John J
2015-05-01
Three experiments are reported that explore affectively based spontaneous evaluative impressions (SEIs) of stimulus persons. Experiments 1 and 2 used modified versions of the savings in relearning paradigm (Carlston & Skowronski, 1994) to confirm the occurrence of SEIs, indicating that they are equivalent whether participants are instructed to form trait impressions, evaluative impressions, or neither. These experiments also show that SEIs occur independently of explicit recall for the trait implications of the stimuli. Experiment 3 provides a single dissociation test to distinguish SEIs from spontaneous trait inferences (STIs), showing that disrupting cognitive processing interferes with a trait-based prediction task that presumably reflects STIs, but not with an affectively based social approach task that presumably reflects SEIs. Implications of these findings for the potential independence of spontaneous trait and evaluative inferences, as well as limitations and important steps for future study are discussed. (c) 2015 APA, all rights reserved).
Inferring Directed Road Networks from GPS Traces by Track Alignment
Directory of Open Access Journals (Sweden)
Xingzhe Xie
2015-11-01
Full Text Available This paper proposes a method to infer road networks from GPS traces. These networks include intersections between roads, the connectivity between the intersections and the possible traffic directions between directly-connected intersections. These intersections are localized by detecting and clustering turning points, which are locations where the moving direction changes on GPS traces. We infer the structure of road networks by segmenting all of the GPS traces to identify these intersections. We can then form both a connectivity matrix of the intersections and a small representative GPS track for each road segment. The road segment between each pair of directly-connected intersections is represented using a series of geographical locations, which are averaged from all of the tracks on this road segment by aligning them using the dynamic time warping (DTW algorithm. Our contribution is two-fold. First, we detect potential intersections by clustering the turning points on the GPS traces. Second, we infer the geometry of the road segments between intersections by aligning GPS tracks point by point using a “stretch and then compress” strategy based on the DTW algorithm. This approach not only allows road estimation by averaging the aligned tracks, but also a deeper statistical analysis based on the individual track’s time alignment, for example the variance of speed along a road segment.
Halo detection via large-scale Bayesian inference
Merson, Alexander I.; Jasche, Jens; Abdalla, Filipe B.; Lahav, Ofer; Wandelt, Benjamin; Jones, D. Heath; Colless, Matthew
2016-08-01
We present a proof-of-concept of a novel and fully Bayesian methodology designed to detect haloes of different masses in cosmological observations subject to noise and systematic uncertainties. Our methodology combines the previously published Bayesian large-scale structure inference algorithm, HAmiltonian Density Estimation and Sampling algorithm (HADES), and a Bayesian chain rule (the Blackwell-Rao estimator), which we use to connect the inferred density field to the properties of dark matter haloes. To demonstrate the capability of our approach, we construct a realistic galaxy mock catalogue emulating the wide-area 6-degree Field Galaxy Survey, which has a median redshift of approximately 0.05. Application of HADES to the catalogue provides us with accurately inferred three-dimensional density fields and corresponding quantification of uncertainties inherent to any cosmological observation. We then use a cosmological simulation to relate the amplitude of the density field to the probability of detecting a halo with mass above a specified threshold. With this information, we can sum over the HADES density field realisations to construct maps of detection probabilities and demonstrate the validity of this approach within our mock scenario. We find that the probability of successful detection of haloes in the mock catalogue increases as a function of the signal to noise of the local galaxy observations. Our proposed methodology can easily be extended to account for more complex scientific questions and is a promising novel tool to analyse the cosmic large-scale structure in observations.
Smail, Linda
2016-06-01
The basic task of any probabilistic inference system in Bayesian networks is computing the posterior probability distribution for a subset or subsets of random variables, given values or evidence for some other variables from the same Bayesian network. Many methods and algorithms have been developed to exact and approximate inference in Bayesian networks. This work compares two exact inference methods in Bayesian networks-Lauritzen-Spiegelhalter and the successive restrictions algorithm-from the perspective of computational efficiency. The two methods were applied for comparison to a Chest Clinic Bayesian Network. Results indicate that the successive restrictions algorithm shows more computational efficiency than the Lauritzen-Spiegelhalter algorithm.
Statistical inference on residual life
Jeong, Jong-Hyeon
2014-01-01
This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.
Probability biases as Bayesian inference
Directory of Open Access Journals (Sweden)
Andre; C. R. Martins
2006-11-01
Full Text Available In this article, I will show how several observed biases in human probabilistic reasoning can be partially explained as good heuristics for making inferences in an environment where probabilities have uncertainties associated to them. Previous results show that the weight functions and the observed violations of coalescing and stochastic dominance can be understood from a Bayesian point of view. We will review those results and see that Bayesian methods should also be used as part of the explanation behind other known biases. That means that, although the observed errors are still errors under the be understood as adaptations to the solution of real life problems. Heuristics that allow fast evaluations and mimic a Bayesian inference would be an evolutionary advantage, since they would give us an efficient way of making decisions. %XX In that sense, it should be no surprise that humans reason with % probability as it has been observed.
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Network Inference from Grouped Data
Zhao, Yunpeng
2016-01-01
In medical research, economics, and the social sciences data frequently appear as subsets of a set of objects. Over the past century a number of descriptive statistics have been developed to construct network structure from such data. However, these measures lack a generating mechanism that links the inferred network structure to the observed groups. To address this issue, we propose a model-based approach called the Hub Model which assumes that every observed group has a leader and that the leader has brought together the other members of the group. The performance of Hub Models is demonstrated by simulation studies. We apply this model to infer the relationships among Senators serving in the 110th United States Congress, the characters in a famous 18th century Chinese novel, and the distribution of flora in North America.
Bayesian inference with ecological applications
Link, William A
2009-01-01
This text is written to provide a mathematically sound but accessible and engaging introduction to Bayesian inference specifically for environmental scientists, ecologists and wildlife biologists. It emphasizes the power and usefulness of Bayesian methods in an ecological context. The advent of fast personal computers and easily available software has simplified the use of Bayesian and hierarchical models . One obstacle remains for ecologists and wildlife biologists, namely the near absence of Bayesian texts written specifically for them. The book includes many relevant examples, is supported by software and examples on a companion website and will become an essential grounding in this approach for students and research ecologists. Engagingly written text specifically designed to demystify a complex subject Examples drawn from ecology and wildlife research An essential grounding for graduate and research ecologists in the increasingly prevalent Bayesian approach to inference Companion website with analyt...
Inferring Centrality from Network Snapshots
Shao, Haibin; Mesbahi, Mehran; Li, Dewei; Xi, Yugeng
2017-01-01
The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagation rate of influence exerted on consensus networks and the Kirchhoff index of the underlying graph. Moreover, the tempo centrality also encodes the disturbance rejection of nodes in a consensus network. Our findings provide an approach to infer the performance of a consensus network from its temporal data. PMID:28098166
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Bayesian inference on proportional elections.
Directory of Open Access Journals (Sweden)
Gabriel Hideki Vatanabe Brunello
Full Text Available Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.
Causal inference based on counterfactuals
Directory of Open Access Journals (Sweden)
Höfler M
2005-09-01
Full Text Available Abstract Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept.
Applied statistical inference with MINITAB
Lesik, Sally
2009-01-01
Through clear, step-by-step mathematical calculations, Applied Statistical Inference with MINITAB enables students to gain a solid understanding of how to apply statistical techniques using a statistical software program. It focuses on the concepts of confidence intervals, hypothesis testing, validating model assumptions, and power analysis.Illustrates the techniques and methods using MINITABAfter introducing some common terminology, the author explains how to create simple graphs using MINITAB and how to calculate descriptive statistics using both traditional hand computations and MINITAB. Sh
Security Inference from Noisy Data
2008-04-08
Junk Mail Samples (JMS)” later) is collected from Hotmail using a different method. JMS is collected from email in inboxes that is reported as spam (or...The data consist of side channel traces from attackers: spam email messages received by Hotmail, one of the largest Web mail services. The basic...similar content and determining the senders of these email messages, one can infer the composition of the botnet. This approach can analyze botnets re
Optimal Inference in Cointegrated Systems
1988-01-01
This paper studies the properties of maximum likelihood estimates of co-integrated systems. Alternative formulations of such models are considered including a new triangular system error correction mechanism. It is shown that full system maximum likelihood brings the problem of inference within the family that is covered by the locally asymptotically mixed normal asymptotic theory provided that all unit roots in the system have been eliminated by specification and data transformation. This re...
Inferring Centrality from Network Snapshots
Haibin Shao; Mehran Mesbahi; Dewei Li; Yugeng Xi
2017-01-01
The topology and dynamics of a complex network shape its functionality. However, the topologies of many large-scale networks are either unavailable or incomplete. Without the explicit knowledge of network topology, we show how the data generated from the network dynamics can be utilised to infer the tempo centrality, which is proposed to quantify the influence of nodes in a consensus network. We show that the tempo centrality can be used to construct an accurate estimate of both the propagati...
On Quantum Statistical Inference, II
Barndorff-Nielsen, O. E.; Gill, R. D.; Jupp, P.E.
2003-01-01
Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, theoretical developments in the theory of quantum measurements have brought the basic mathematical framework for the probability calculations much closer to that of classical probability theory. The present paper reviews this field and proposes and inte...
An introduction to causal inference.
Pearl, Judea
2010-02-26
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
A generalized characterization of algorithmic probability
T.F. Sterkenburg (Tom)
2017-01-01
textabstractAn a priori semimeasure (also known as “algorithmic probability” or “the Solomonoff prior” in the context of inductive inference) is defined as the transformation, by a given universal monotone Turing machine, of the uniform measure on the infinite strings. It is shown in this paper that
El: A Program for Ecological Inference
King, Gary
2004-01-01
The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997). Ecological inference, as traditionally defined, is the process of using aggregate (i.e., “ecological”) data to infer discrete individual-level relationships of interest when individual- level data are not avai...
EI: A Program for Ecological Inference
Gary King
2004-01-01
The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997). Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological") data to infer discrete individual-level relationships of interest when individual-level data are not ava...
Hromkovic, Juraj
2009-01-01
Explores the science of computing. This book starts with the development of computer science, algorithms and programming, and then explains and shows how to exploit the concepts of infinity, computability, computational complexity, nondeterminism and randomness.
Computational methods for Gene Orthology inference
Kristensen, David M.; Wolf, Yuri I.; Mushegian, Arcady R.
2011-01-01
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple ‘tree-like’ mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods. PMID:21690100
Inferring Ancestral Recombination Graphs from Bacterial Genomic Data.
Vaughan, Timothy G; Welch, David; Drummond, Alexei J; Biggs, Patrick J; George, Tessy; French, Nigel P
2017-02-01
Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter. Copyright © 2017 Vaughan et al.
Inference of asynchronous Boolean network from biological pathways.
Das, Haimabati; Layek, Ritwik Kumar
2015-01-01
Gene regulation is a complex process with multiple levels of interactions. In order to describe this complex dynamical system with tractable parameterization, the choice of the dynamical system model is of paramount importance. The right abstraction of the modeling scheme can reduce the complexity in the inference and intervention design, both computationally and experimentally. This article proposes an asynchronous Boolean network framework to capture the transcriptional regulation as well as the protein-protein interactions in a genetic regulatory system. The inference of asynchronous Boolean network from biological pathways information and experimental evidence are explained using an algorithm. The suitability of this paradigm for the variability of several reaction rates is also discussed. This methodology and model selection open up new research challenges in understanding gene-protein interactive system in a coherent way and can be beneficial for designing effective therapeutic intervention strategy.
Sign Inference for Dynamic Signed Networks via Dictionary Learning
Directory of Open Access Journals (Sweden)
Yi Cen
2013-01-01
Full Text Available Mobile online social network (mOSN is a burgeoning research area. However, most existing works referring to mOSNs deal with static network structures and simply encode whether relationships among entities exist or not. In contrast, relationships in signed mOSNs can be positive or negative and may be changed with time and locations. Applying certain global characteristics of social balance, in this paper, we aim to infer the unknown relationships in dynamic signed mOSNs and formulate this sign inference problem as a low-rank matrix estimation problem. Specifically, motivated by the Singular Value Thresholding (SVT algorithm, a compact dictionary is selected from the observed dataset. Based on this compact dictionary, the relationships in the dynamic signed mOSNs are estimated via solving the formulated problem. Furthermore, the estimation accuracy is improved by employing a dictionary self-updating mechanism.
Evolution in Mind: Evolutionary Dynamics, Cognitive Processes, and Bayesian Inference.
Suchow, Jordan W; Bourgin, David D; Griffiths, Thomas L
2017-07-01
Evolutionary theory describes the dynamics of population change in settings affected by reproduction, selection, mutation, and drift. In the context of human cognition, evolutionary theory is most often invoked to explain the origins of capacities such as language, metacognition, and spatial reasoning, framing them as functional adaptations to an ancestral environment. However, evolutionary theory is useful for understanding the mind in a second way: as a mathematical framework for describing evolving populations of thoughts, ideas, and memories within a single mind. In fact, deep correspondences exist between the mathematics of evolution and of learning, with perhaps the deepest being an equivalence between certain evolutionary dynamics and Bayesian inference. This equivalence permits reinterpretation of evolutionary processes as algorithms for Bayesian inference and has relevance for understanding diverse cognitive capacities, including memory and creativity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluating Data Assimilation Algorithms
Law, K J H
2011-01-01
Data assimilation refers to methodologies for the incorporation of noisy observations of a physical system into an underlying model in order to infer the properties of the state of the system (and/or parameters). The model itself is typically subject to uncertainties, in the input data and in the physical laws themselves. This leads naturally to a Bayesian formulation in which the posterior probability distribution of the system state, given the observations, plays a central conceptual role. The aim of this paper is to use this Bayesian posterior probability distribution as a gold standard against which to evaluate various commonly used data assimilation algorithms. A key aspect of geophysical data assimilation is the high dimensionality of the computational model. With this in mind, yet with the goal of allowing an explicit and accurate computation of the posterior distribution in order to facilitate our evaluation, we study the 2D Navier-Stokes equations in a periodic geometry. We compute the posterior prob...
Evidence and Inference in Educational Assessment.
1995-02-01
Educational assessment concerns inference about students’ knowledge, skills, and accomplishments. Because data are never so comprehensive and...techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective. (AN)
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
Logical inference techniques for loop parallelization
Oancea, Cosmin E.
2012-01-01
This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.
Towards Inferring Protein Interactions: Challenges and Solutions
Directory of Open Access Journals (Sweden)
Ji Xiang
2006-01-01
Full Text Available Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Attention as a Bayesian inference process
Chikkerur, Sharat; Serre, Thomas; Tan, Cheston; Poggio, Tomaso
2011-03-01
David Marr famously defined vision as "knowing what is where by seeing". In the framework described here, attention is the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that performs well in recognition tasks and that predicts some of the main properties of attention at the level of psychophysics and physiology. We propose an algorithmic implementation a Bayesian network that can be mapped into the basic functional anatomy of attention involving the ventral stream and the dorsal stream. This description integrates bottom-up, feature-based as well as spatial (context based) attentional mechanisms. We show that the Bayesian model predicts well human eye fixations (considered as a proxy for shifts of attention) in natural scenes, and can improve accuracy in object recognition tasks involving cluttered real world images. In both cases, we found that the proposed model can predict human performance better than existing bottom-up and top-down computational models.
Directory of Open Access Journals (Sweden)
Frank Emmert-Streib
2013-02-01
Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Perceptual inference and autistic traits
DEFF Research Database (Denmark)
Skewes, Joshua; Jegindø, Else-Marie Elmholdt; Gebauer, Line
2015-01-01
Autistic people are better at perceiving details. Major theories explain this in terms of bottom-up sensory mechanisms, or in terms of top-down cognitive biases. Recently, it has become possible to link these theories within a common framework. This framework assumes that perception is implicit...... neural inference, combining sensory evidence with prior perceptual knowledge. Within this framework, perceptual differences may occur because of enhanced precision in how sensory evidence is represented, or because sensory evidence is weighted much higher than prior perceptual knowledge...
Logical inferences in discourse analysis
Institute of Scientific and Technical Information of China (English)
刘峰廷
2014-01-01
Cohesion and coherence are two important characteristics of discourses. Halliday and Hasan have pointed out that cohesion is the basis of coherence and coherence is the premise of forming discourse. The commonly used cohesive devices are: preference, ellipsis, substitution, etc. Discourse coherence is mainly manifested in sentences and paragraphs. However, in real discourse analysis environment, traditional methods on cohesion and coherence are not enough. This article talks about the conception of discourse analysis at the beginning. Then, we list some of the traditional cohesive devices and its uses. Following that, we make corpus analysis. Finally, we explore and find a new device in textual analysis:discourse logical inferences.
SICK: THE SPECTROSCOPIC INFERENCE CRANK
Energy Technology Data Exchange (ETDEWEB)
Casey, Andrew R., E-mail: arc@ast.cam.ac.uk [Institute of Astronomy, University of Cambridge, Madingley Road, Cambdridge, CB3 0HA (United Kingdom)
2016-03-15
There exists an inordinate amount of spectral data in both public and private astronomical archives that remain severely under-utilized. The lack of reliable open-source tools for analyzing large volumes of spectra contributes to this situation, which is poised to worsen as large surveys successively release orders of magnitude more spectra. In this article I introduce sick, the spectroscopic inference crank, a flexible and fast Bayesian tool for inferring astrophysical parameters from spectra. sick is agnostic to the wavelength coverage, resolving power, or general data format, allowing any user to easily construct a generative model for their data, regardless of its source. sick can be used to provide a nearest-neighbor estimate of model parameters, a numerically optimized point estimate, or full Markov Chain Monte Carlo sampling of the posterior probability distributions. This generality empowers any astronomer to capitalize on the plethora of published synthetic and observed spectra, and make precise inferences for a host of astrophysical (and nuisance) quantities. Model intensities can be reliably approximated from existing grids of synthetic or observed spectra using linear multi-dimensional interpolation, or a Cannon-based model. Additional phenomena that transform the data (e.g., redshift, rotational broadening, continuum, spectral resolution) are incorporated as free parameters and can be marginalized away. Outlier pixels (e.g., cosmic rays or poorly modeled regimes) can be treated with a Gaussian mixture model, and a noise model is included to account for systematically underestimated variance. Combining these phenomena into a scalar-justified, quantitative model permits precise inferences with credible uncertainties on noisy data. I describe the common model features, the implementation details, and the default behavior, which is balanced to be suitable for most astronomical applications. Using a forward model on low-resolution, high signal
Universum Inference and Corpus Homogeneity
Vogel, Carl; Lynch, Gerard; Janssen, Jerom
Universum Inference is re-interpreted for assessment of corpus homogeneity in computational stylometry. Recent stylometric research quantifies strength of characterization within dramatic works by assessing the homogeneity of corpora associated with dramatic personas. A methodological advance is suggested to mitigate the potential for the assessment of homogeneity to be achieved by chance. Baseline comparison analysis is constructed for contributions to debates by nonfictional participants: the corpus analyzed consists of transcripts of US Presidential and Vice-Presidential debates from the 2000 election cycle. The corpus is also analyzed in translation to Italian, Spanish and Portuguese. Adding randomized categories makes assessments of homogeneity more conservative.
Inferring Network Structure from Cascades
Ghonge, Sushrut
2016-01-01
Many physical, biological and social phenomena can be described by cascades taking place on a network. Often, the activity can be empirically observed, but not the underlying network of interactions. In this paper we solve the dynamics of general cascade processes. We then offer three topological inversion methods to infer the structure of any directed network given a set of cascade arrival times. Our forward and inverse formulas hold for a very general class of models where the activation probability of a node is a generic function of its degree and the number of its active neighbors. We report high success rates for synthetic and real networks, for 5 different cascade models.
Bayesian inference for inverse problems occurring in uncertainty analysis
Fu, Shuai; Celeux, Gilles; Bousquet, Nicolas; Couplet, Mathieu
2012-01-01
The inverse problem considered here is to estimate the distribution of a non-observed random variable $X$ from some noisy observed data $Y$ linked to $X$ through a time-consuming physical model $H$. Bayesian inference is considered to take into account prior expert knowledge on $X$ in a small sample size setting. A Metropolis-Hastings within Gibbs algorithm is proposed to compute the posterior distribution of the parameters of $X$ through a data augmentation process. Since calls to $H$ are qu...
Statistical Inference for Big Data Problems in Molecular Biophysics
Energy Technology Data Exchange (ETDEWEB)
Ramanathan, Arvind [ORNL; Savol, Andrej [University of Pittsburgh School of Medicine, Pittsburgh PA; Burger, Virginia [University of Pittsburgh School of Medicine, Pittsburgh PA; Quinn, Shannon [University of Pittsburgh School of Medicine, Pittsburgh PA; Agarwal, Pratul K [ORNL; Chennubhotla, Chakra [University of Pittsburgh School of Medicine, Pittsburgh PA
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Dopamine, Affordance and Active Inference
Friston, Karl J.; Shiner, Tamara; FitzGerald, Thomas; Galea, Joseph M.; Adams, Rick; Brown, Harriet; Dolan, Raymond J.; Moran, Rosalyn; Stephan, Klaas Enno; Bestmann, Sven
2012-01-01
The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal) cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions) about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal) model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors) to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level. PMID:22241972
Dopamine, affordance and active inference.
Directory of Open Access Journals (Sweden)
Karl J Friston
2012-01-01
Full Text Available The role of dopamine in behaviour and decision-making is often cast in terms of reinforcement learning and optimal decision theory. Here, we present an alternative view that frames the physiology of dopamine in terms of Bayes-optimal behaviour. In this account, dopamine controls the precision or salience of (external or internal cues that engender action. In other words, dopamine balances bottom-up sensory information and top-down prior beliefs when making hierarchical inferences (predictions about cues that have affordance. In this paper, we focus on the consequences of changing tonic levels of dopamine firing using simulations of cued sequential movements. Crucially, the predictions driving movements are based upon a hierarchical generative model that infers the context in which movements are made. This means that we can confuse agents by changing the context (order in which cues are presented. These simulations provide a (Bayes-optimal model of contextual uncertainty and set switching that can be quantified in terms of behavioural and electrophysiological responses. Furthermore, one can simulate dopaminergic lesions (by changing the precision of prediction errors to produce pathological behaviours that are reminiscent of those seen in neurological disorders such as Parkinson's disease. We use these simulations to demonstrate how a single functional role for dopamine at the synaptic level can manifest in different ways at the behavioural level.
Hu, T C
2002-01-01
Newly enlarged, updated second edition of a valuable text presents algorithms for shortest paths, maximum flows, dynamic programming and backtracking. Also discusses binary trees, heuristic and near optimums, matrix multiplication, and NP-complete problems. 153 black-and-white illus. 23 tables.Newly enlarged, updated second edition of a valuable, widely used text presents algorithms for shortest paths, maximum flows, dynamic programming and backtracking. Also discussed are binary trees, heuristic and near optimums, matrix multiplication, and NP-complete problems. New to this edition: Chapter 9
Exploration of the Adaptive Neuro - Fuzzy Inference System Architecture and its Applications
Directory of Open Access Journals (Sweden)
Okereke Eze Aru
2016-09-01
Full Text Available In this paper we exhibited an architecture and essential learning process basic in fuzzy inference system and adaptive neuro fuzzy inference system which is a hybrid network implemented in framework of adaptive network. In genuine figuring environment, soft computing techniques including neural network, fuzzy logic algorithms have been generally used to infer a real choice utilizing given input or output information traits, ANFIS can build mapping taking into account both human learning and hybrid algorithms. This study includes investigation of ANFIS methodology. ANFIS procedure is utilized to display nonlinear functions, to control a standout amongst the most essential parameters of the impelling machine and anticipate a turbulent time arrangement, all yielding more viable, quicker result.
Inference of gene regulatory networks with the strong-inhibition Boolean model
Energy Technology Data Exchange (ETDEWEB)
Xia Qinzhi; Liu Lulu; Ye Weiming; Hu Gang, E-mail: ganghu@bnu.edu.cn [Department of Physics, Beijing Normal University, Beijing 100875 (China)
2011-08-15
The inference of gene regulatory networks (GRNs) is an important topic in biology. In this paper, a logic-based algorithm that infers the strong-inhibition Boolean genetic regulatory networks (where regulation by any single repressor can definitely suppress the expression of the gene regulated) from time series is discussed. By properly ordering various computation steps, we derive for the first time explicit formulae for the probabilities at which different interactions can be inferred given a certain number of data. With the formulae, we can predict the precision of reconstructions of regulation networks when the data are insufficient. Numerical simulations coincide well with the analytical results. The method and results are expected to be applicable to a wide range of general dynamic networks, where logic algorithms play essential roles in the network dynamics and the probabilities of various logics can be estimated well.
Model reductions for inference: generality of pairwise, binary, and planar factor graphs.
Eaton, Frederik; Ghahramani, Zoubin
2013-05-01
We offer a solution to the problem of efficiently translating algorithms between different types of discrete statistical model. We investigate the expressive power of three classes of model-those with binary variables, with pairwise factors, and with planar topology-as well as their four intersections. We formalize a notion of "simple reduction" for the problem of inferring marginal probabilities and consider whether it is possible to "simply reduce" marginal inference from general discrete factor graphs to factor graphs in each of these seven subclasses. We characterize the reducibility of each class, showing in particular that the class of binary pairwise factor graphs is able to simply reduce only positive models. We also exhibit a continuous "spectral reduction" based on polynomial interpolation, which overcomes this limitation. Experiments assess the performance of standard approximate inference algorithms on the outputs of our reductions.
Ostapov, Yuriy
2012-01-01
Algorithms of inference in a computer system oriented to input and semantic processing of text information are presented. Such inference is necessary for logical questions when the direct comparison of objects from a question and database can not give a result. The following classes of problems are considered: a check of hypotheses for persons and non-typical actions, the determination of persons and circumstances for non-typical actions, planning actions, the determination of event cause and state of persons. To form an answer both deduction and plausible reasoning are used. As a knowledge domain under consideration is social behavior of persons, plausible reasoning is based on laws of social psychology. Proposed algorithms of inference and plausible reasoning can be realized in computer systems closely connected with text processing (criminology, operation of business, medicine, document systems).
Partial Optimality by Pruning for MAP-Inference with General Graphical Models.
Swoboda, Paul; Shekhovtsov, Alexander; Kappes, Jorg Hendrik; Schnorr, Christoph; Savchynskyy, Bogdan
2016-07-01
We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
2015-07-01
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data Guy Van den Broeck∗ and Karthika Mohan∗ and Arthur Choi and Adnan...We propose a family of efficient algorithms for learning the parameters of a Bayesian network from incomplete data. Our approach is based on recent...algorithms like EM (which require inference). 1 INTRODUCTION When learning the parameters of a Bayesian network from data with missing values, the
Data Provenance Inference in Logic Programming: Reducing Effort of Instance-driven Debugging
Huq, Mohammad Rezwanul; Mileo, Alessandra; Wombacher, Andreas
2013-01-01
Data provenance allows scientists in different domains validating their models and algorithms to find out anomalies and unexpected behaviors. In previous works, we described on-the-fly interpretation of (Python) scripts to build workflow provenance graph automatically and then infer fine-grained pro
Nuclear Forensic Inferences Using Iterative Multidimensional Statistics
Energy Technology Data Exchange (ETDEWEB)
Robel, M; Kristo, M J; Heller, M A
2009-06-09
Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. In fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the
Global and Local Distortion Inference During Embedded Zerotree Wavelet Decompression
Huber, A. Kris; Budge, Scott E.
1996-01-01
This paper presents algorithms for inferring global and spatially local estimates of the squared-error distortion measures for the Embedded Zerotree Wavelet (EZW) image compression algorithm. All distortion estimates are obtained at the decoder without significantly compromising EZW's rate-distortion performance. Two methods are given for propagating distortion estimates from the wavelet domain to the spatial domain, thus giving individual estimates of distortion for each pixel of the decompressed image. These local distortion estimates seem to provide only slight improvement in the statistical characterization of EZW compression error relative to the global measure, unless actual squared errors are propagated. However, they provide qualitative information about the asymptotic nature of the error that may be helpful in wavelet filter selection for low bit rate applications.
Bayesian Inference for Radio Observations - Going beyond deconvolution
Lochner, Michelle; Kunz, Martin; Natarajan, Iniyan; Oozeer, Nadeem; Smirnov, Oleg; Zwart, Jon
2015-01-01
Radio interferometers suffer from the problem of missing information in their data, due to the gaps between the antennas. This results in artifacts, such as bright rings around sources, in the images obtained. Multiple deconvolution algorithms have been proposed to solve this problem and produce cleaner radio images. However, these algorithms are unable to correctly estimate uncertainties in derived scientific parameters or to always include the effects of instrumental errors. We propose an alternative technique called Bayesian Inference for Radio Observations (BIRO) which uses a Bayesian statistical framework to determine the scientific parameters and instrumental errors simultaneously directly from the raw data, without making an image. We use a simple simulation of Westerbork Synthesis Radio Telescope data including pointing errors and beam parameters as instrumental effects, to demonstrate the use of BIRO.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Restrictions for the causal inferences in an interferometric system
Rossi, R.
2017-07-01
Causal discovery algorithms allow for the inference of causal structures from probabilistic relations of random variables. A natural field for the application of this tool is quantum mechanics, where a long-standing debate about the role of causality in the theory has flourished since its early days. In this paper, a causal discovery algorithm is applied in the search for causal models to describe a quantum version of Wheeler's delayed-choice experiment. The outputs explicitly show the restrictions for the introduction of classical concepts in this system. The exclusion of models with two hidden variables is one of them. A consequence of such a constraint is the impossibility to construct a causal model that avoids superluminal causation and assumes an objective view of the wave and particle properties simultaneously.
Directory of Open Access Journals (Sweden)
Anna Bourmistrova
2011-02-01
Full Text Available The autodriver algorithm is an intelligent method to eliminate the need of steering by a driver on a well-defined road. The proposed method performs best on a four-wheel steering (4WS vehicle, though it is also applicable to two-wheel-steering (TWS vehicles. The algorithm is based on coinciding the actual vehicle center of rotation and road center of curvature, by adjusting the kinematic center of rotation. The road center of curvature is assumed prior information for a given road, while the dynamic center of rotation is the output of dynamic equations of motion of the vehicle using steering angle and velocity measurements as inputs. We use kinematic condition of steering to set the steering angles in such a way that the kinematic center of rotation of the vehicle sits at a desired point. At low speeds the ideal and actual paths of the vehicle are very close. With increase of forward speed the road and tire characteristics, along with the motion dynamics of the vehicle cause the vehicle to turn about time-varying points. By adjusting the steering angles, our algorithm controls the dynamic turning center of the vehicle so that it coincides with the road curvature center, hence keeping the vehicle on a given road autonomously. The position and orientation errors are used as feedback signals in a closed loop control to adjust the steering angles. The application of the presented autodriver algorithm demonstrates reliable performance under different driving conditions.
Inference of neuronal network spike dynamics and topology from calcium imaging data.
Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof
2013-01-01
Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence ("spike trains") from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.
Inference of neuronal network spike dynamics and topology from calcium imaging data
Directory of Open Access Journals (Sweden)
Henry eLütcke
2013-12-01
Full Text Available Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP occurrence ('spike trains' from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.
Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger
2017-01-01
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods. PMID:28166542
Spontaneous Trait Inferences on Social Media
Utz, Sonja
2016-01-01
The present research investigates whether spontaneous trait inferences occur under conditions characteristic of social media and networking sites: nonextreme, ostensibly self-generated content, simultaneous presentation of multiple cues, and self-paced browsing. We used an established measure of trait inferences (false recognition paradigm) and a direct assessment of impressions. Without being asked to do so, participants spontaneously formed impressions of people whose status updates they saw. Our results suggest that trait inferences occurred from nonextreme self-generated content, which is commonly found in social media updates (Experiment 1) and when nine status updates from different people were presented in parallel (Experiment 2). Although inferences did occur during free browsing, the results suggest that participants did not necessarily associate the traits with the corresponding status update authors (Experiment 3). Overall, the findings suggest that spontaneous trait inferences occur on social media. We discuss implications for online communication and research on spontaneous trait inferences. PMID:28123646
Causal inference in public health.
Glass, Thomas A; Goodman, Steven N; Hernán, Miguel A; Samet, Jonathan M
2013-01-01
Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action's consequences rather than the less precise notion of a risk factor's causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world.
Statistical inference for financial engineering
Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki
2014-01-01
This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.
Polynomial Regressions and Nonsense Inference
Directory of Open Access Journals (Sweden)
Daniel Ventosa-Santaulària
2013-11-01
Full Text Available Polynomial specifications are widely used, not only in applied economics, but also in epidemiology, physics, political analysis and psychology, just to mention a few examples. In many cases, the data employed to estimate such specifications are time series that may exhibit stochastic nonstationary behavior. We extend Phillips’ results (Phillips, P. Understanding spurious regressions in econometrics. J. Econom. 1986, 33, 311–340. by proving that an inference drawn from polynomial specifications, under stochastic nonstationarity, is misleading unless the variables cointegrate. We use a generalized polynomial specification as a vehicle to study its asymptotic and finite-sample properties. Our results, therefore, lead to a call to be cautious whenever practitioners estimate polynomial regressions.
Relevance-driven Pragmatic Inferences
Institute of Scientific and Technical Information of China (English)
王瑞彪
2013-01-01
Relevance theory, an inferential approach to pragmatics, claims that the hearer is expected to pick out the input of op-timal relevance from a mass of alternative inputs produced by the speaker in order to interpret the speaker ’s intentions. The de-gree of the relevance of an input can be assessed in terms of cognitive effects and the processing effort. The input of optimal rele-vance is the one yielding the greatest positive cognitive effect and requiring the least processing effort. This paper attempts to as-sess the degrees of the relevance of a mass of alternative inputs produced by an imaginary speaker from the perspective of her cor-responding hearer in terms of cognitive effects and the processing effort with a view to justifying the feasibility of the principle of relevance in pragmatic inferences.
A unified framework for haplotype inference in nuclear families.
Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong
2012-07-01
Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds.
A tutorial on time-evolving dynamical Bayesian inference
Stankovski, Tomislav; Duggento, Andrea; McClintock, Peter V. E.; Stefanovska, Aneta
2014-12-01
In view of the current availability and variety of measured data, there is an increasing demand for powerful signal processing tools that can cope successfully with the associated problems that often arise when data are being analysed. In practice many of the data-generating systems are not only time-variable, but also influenced by neighbouring systems and subject to random fluctuations (noise) from their environments. To encompass problems of this kind, we present a tutorial about the dynamical Bayesian inference of time-evolving coupled systems in the presence of noise. It includes the necessary theoretical description and the algorithms for its implementation. For general programming purposes, a pseudocode description is also given. Examples based on coupled phase and limit-cycle oscillators illustrate the salient features of phase dynamics inference. State domain inference is illustrated with an example of coupled chaotic oscillators. The applicability of the latter example to secure communications based on the modulation of coupling functions is outlined. MatLab codes for implementation of the method, as well as for the explicit examples, accompany the tutorial.
Hierarchical animal movement models for population-level inference
Hooten, Mevin B.; Buderman, Frances E.; Brost, Brian M.; Hanks, Ephraim M.; Ivans, Jacob S.
2016-01-01
New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for an automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data, and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Chavira, Mark; Darwiche, Adnan
2004-01-01
We describe a system for exact inference with relational Bayesian networks as defined in the publicly available \\primula\\ tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference by evaluating...... and differentiating these circuits in time linear in their size. We report on experimental results showing the successful compilation, and efficient inference, on relational Bayesian networks whose {\\primula}--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Inferring Stop-Locations from WiFi.
Directory of Open Access Journals (Sweden)
David Kofoed Wind
Full Text Available Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity from GPS or CDR data, or on detection of state (static/active. In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period.
Inferring Stop-Locations from WiFi.
Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna; Lehmann, Sune
2016-01-01
Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period.
Bayesian inference from count data using discrete uniform priors.
Directory of Open Access Journals (Sweden)
Federico Comoglio
Full Text Available We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.
Inference Attacks and Control on Database Structures
Directory of Open Access Journals (Sweden)
Muhamed Turkanovic
2015-02-01
Full Text Available Today’s databases store information with sensitivity levels that range from public to highly sensitive, hence ensuring confidentiality can be highly important, but also requires costly control. This paper focuses on the inference problem on different database structures. It presents possible treats on privacy with relation to the inference, and control methods for mitigating these treats. The paper shows that using only access control, without any inference control is inadequate, since these models are unable to protect against indirect data access. Furthermore, it covers new inference problems which rise from the dimensions of new technologies like XML, semantics, etc.
State Sampling Dependence of Hopfield Network Inference
Institute of Scientific and Technical Information of China (English)
黄海平
2012-01-01
The fully connected Hopfield network is inferred based on observed magnetizations and pairwise correlations. We present the system in the glassy phase with low temperature and high memory load. We find that the inference error is very sensitive to the form of state sampling. When a single state is sampled to compute magnetizations and correlations, the inference error is almost indistinguishable irrespective of the sampled state. However, the error can be greatly reduced if the data is collected with state transitions. Our result holds for different disorder samples and accounts for the previously observed large fluctuations of inference error at low temperatures.
Inferring modules from human protein interactome classes
Directory of Open Access Journals (Sweden)
Chaurasia Gautam
2010-07-01
Full Text Available Abstract Background The integration of protein-protein interaction networks derived from high-throughput screening approaches and complementary sources is a key topic in systems biology. Although integration of protein interaction data is conventionally performed, the effects of this procedure on the result of network analyses has not been examined yet. In particular, in order to optimize the fusion of heterogeneous interaction datasets, it is crucial to consider not only their degree of coverage and accuracy, but also their mutual dependencies and additional salient features. Results We examined this issue based on the analysis of modules detected by network clustering methods applied to both integrated and individual (disaggregated data sources, which we call interactome classes. Due to class diversity, we deal with variable dependencies of data features arising from structural specificities and biases, but also from possible overlaps. Since highly connected regions of the human interactome may point to potential protein complexes, we have focused on the concept of modularity, and elucidated the detection power of module extraction algorithms by independent validations based on GO, MIPS and KEGG. From the combination of protein interactions with gene expressions, a confidence scoring scheme has been proposed before proceeding via GO with further classification in permanent and transient modules. Conclusions Disaggregated interactomes are shown to be informative for inferring modularity, thus contributing to perform an effective integrative analysis. Validation of the extracted modules by multiple annotation allows for the assessment of confidence measures assigned to the modules in a protein pathway context. Notably, the proposed multilayer confidence scheme can be used for network calibration by enabling a transition from unweighted to weighted interactomes based on biological evidence.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
DEFF Research Database (Denmark)
Markham, Annette
layered set of accounts to help build our understanding of how individuals relate to their devices, search systems, and social network sites. This work extends critical analyses of the power of algorithms in implicating the social self by offering narrative accounts from multiple perspectives. It also......This paper takes an actor network theory approach to explore some of the ways that algorithms co-construct identity and relational meaning in contemporary use of social media. Based on intensive interviews with participants as well as activity logging and data tracking, the author presents a richly...... contributes an innovative method for blending actor network theory with symbolic interaction to grapple with the complexity of everyday sensemaking practices within networked global information flows....
NIFTY - Numerical Information Field Theory - a versatile Python library for signal inference
Selig, Marco; Junklewitz, Henrik; Oppermann, Niels; Reinecke, Martin; Greiner, Maksim; Pachajoa, Carlos; Enßlin, Torsten A
2013-01-01
NIFTY, "Numerical Information Field Theory", is a software package designed to enable the development of signal inference algorithms that operate regardless of the underlying spatial grid and its resolution. Its object-oriented framework is written in Python, although it accesses libraries written in Cython, C++, and C for efficiency. NIFTY offers a toolkit that abstracts discretized representations of continuous spaces, fields in these spaces, and operators acting on fields into classes. Thereby, the correct normalization of operations on fields is taken care of automatically without concerning the user. This allows for an abstract formulation and programming of inference algorithms, including those derived within information field theory. Thus, NIFTY permits its user to rapidly prototype algorithms in 1D, and then apply the developed code in higher-dimensional settings of real world problems. The set of spaces on which NIFTY operates comprises point sets, n-dimensional regular grids, spherical spaces, their...
Improved orthologous databases to ease protozoan targets inference.
Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R
2015-09-29
Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13
NeuroData; Mishchenko, Y.; AM, Packer; TA, Machado; Yuste, R.; Paninski, L
2015-01-01
Vogelstein JT, Mishchenko Y, Packer AM, Machado TA, Yuste R, Paninski L. Towards Confirming Neural Circuit Inference from Population Calcium Imaging. NIPS Workshop on Connectivity Inference in Neuroimaging, 2009
Directory of Open Access Journals (Sweden)
Xiaodong Cai
Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Ham, J.R.C.; Vonk, R.
2003-01-01
Social perceivers have been shown to draw spontaneous trait inferences (STI's) about the behavior of an actor as well as spontaneous situational inferences (SSI's) about the situation the actor is in. In two studies, we examined inferences about behaviors that allow for both an STI and an SSI. In
Fast Nonnegative Deconvolution for Spike Train Inference From Population Calcium Imaging
Packer, Adam M.; Machado, Timothy A.; Sippy, Tanya; Babadi, Baktash; Yuste, Rafael; Paninski, Liam
2010-01-01
Fluorescent calcium indicators are becoming increasingly popular as a means for observing the spiking activity of large neuronal populations. Unfortunately, extracting the spike train of each neuron from a raw fluorescence movie is a nontrivial problem. This work presents a fast nonnegative deconvolution filter to infer the approximately most likely spike train of each neuron, given the fluorescence observations. This algorithm outperforms optimal linear deconvolution (Wiener filtering) on both simulated and biological data. The performance gains come from restricting the inferred spike trains to be positive (using an interior-point method), unlike the Wiener filter. The algorithm runs in linear time, and is fast enough that even when simultaneously imaging >100 neurons, inference can be performed on the set of all observed traces faster than real time. Performing optimal spatial filtering on the images further refines the inferred spike train estimates. Importantly, all the parameters required to perform the inference can be estimated using only the fluorescence data, obviating the need to perform joint electrophysiological and imaging calibration experiments. PMID:20554834
Validating Inductive Hypotheses by Mode Inference
Institute of Scientific and Technical Information of China (English)
王志坚
1993-01-01
Sme criteria based on mode inference for validating inductive hypotheses are presented in this paper.Mode inference is caried out mechanically,thus such kind of validation can result in low overhead in consistency check and high efficiency in performance.
Causal inference in economics and marketing.
Varian, Hal R
2016-07-05
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual-a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference.
Local and Global Thinking in Statistical Inference
Pratt, Dave; Johnston-Wilder, Peter; Ainley, Janet; Mason, John
2008-01-01
In this reflective paper, we explore students' local and global thinking about informal statistical inference through our observations of 10- to 11-year-olds, challenged to infer the unknown configuration of a virtual die, but able to use the die to generate as much data as they felt necessary. We report how they tended to focus on local changes…
The Reasoning behind Informal Statistical Inference
Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani
2011-01-01
Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…
Forward and backward inference in spatial cognition.
Directory of Open Access Journals (Sweden)
Will D Penny
Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Fiducial inference - A Neyman-Pearson interpretation
Salome, D; VonderLinden, W; Dose,; Fischer, R; Preuss, R
1999-01-01
Fisher's fiducial argument is a tool for deriving inferences in the form of a probability distribution on the parameter space, not based on Bayes's Theorem. Lindley established that in exceptional situations fiducial inferences coincide with posterior distributions; in the other situations fiducial
Institute of Scientific and Technical Information of China (English)
Bo YANG; Xiaorong CHEN
2009-01-01
A Mechanism-Inferring method of networks exploited from machine learning theory can effectively evaluate the predicting performance of a network model. The existing method for inferring network mechanisms based on a census of subgraph numbers has some drawbacks, especially the need for a runtime increasing strongly with network size and network density. In this paper, an improved method has been proposed by introducing a census algorithm of subgraph concentrations. Network mechanism can be quickly inferred by the new method even though the network has large scale and high density. Therefore, the application perspective of mechanism-inferring method has been extended into the wider fields of large-scale complex networks. By applying the new method to a case of protein interaction network, the authors obtain the same inferring result as the existing method, which approves the effectiveness of the method.
Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc
2012-01-06
An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.
ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics.
Huang, Ting; Gong, Haipeng; Yang, Can; He, Zengyou
2013-04-01
Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.
Active Inference: A Process Theory.
Friston, Karl; FitzGerald, Thomas; Rigoli, Francesco; Schwartenbeck, Philipp; Pezzulo, Giovanni
2017-01-01
This article describes a process theory based on active inference and belief propagation. Starting from the premise that all neuronal processing (and action selection) can be explained by maximizing Bayesian model evidence-or minimizing variational free energy-we ask whether neuronal responses can be described as a gradient descent on variational free energy. Using a standard (Markov decision process) generative model, we derive the neuronal dynamics implicit in this description and reproduce a remarkable range of well-characterized neuronal phenomena. These include repetition suppression, mismatch negativity, violation responses, place-cell activity, phase precession, theta sequences, theta-gamma coupling, evidence accumulation, race-to-bound dynamics, and transfer of dopamine responses. Furthermore, the (approximately Bayes' optimal) behavior prescribed by these dynamics has a degree of face validity, providing a formal explanation for reward seeking, context learning, and epistemic foraging. Technically, the fact that a gradient descent appears to be a valid description of neuronal activity means that variational free energy is a Lyapunov function for neuronal dynamics, which therefore conform to Hamilton's principle of least action.
Redshift data and statistical inference
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
EI: A Program for Ecological Inference
Directory of Open Access Journals (Sweden)
Gary King
2004-09-01
Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.
On the criticality of inferred models
Mastromatteo, Iacopo
2011-01-01
Advanced inference techniques allow one to reconstruct the pattern of interaction from high dimensional data sets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to a phase transition. On one side, we show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher Information) is directly related to the model's susceptibility. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. On the other, this region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time-scales naturally yield models which are close to criticality.
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
DEFF Research Database (Denmark)
Gustavson, Fred G.; Reid, John K.; Wasniewski, Jerzy
2007-01-01
variables, and the speed is usually better than that of the LAPACK algorithm that uses full storage (n2 variables). Included are subroutines for rearranging a matrix whose upper or lower-triangular part is packed by columns to this format and for the inverse rearrangement. Also included is a kernel......We present subroutines for the Cholesky factorization of a positive-definite symmetric matrix and for solving corresponding sets of linear equations. They exploit cache memory by using the block hybrid format proposed by the authors in a companion article. The matrix is packed into n(n + 1)/2 real...
Neurogenetic Algorithm for Solving Combinatorial Engineering Problems
Directory of Open Access Journals (Sweden)
M. Jalali Varnamkhasti
2012-01-01
Full Text Available Diversity of the population in a genetic algorithm plays an important role in impeding premature convergence. This paper proposes an adaptive neurofuzzy inference system genetic algorithm based on sexual selection. In this technique, for choosing the female chromosome during sexual selection, a bilinear allocation lifetime approach is used to label the chromosomes based on their fitness value which will then be used to characterize the diversity of the population. The motivation of this algorithm is to maintain the population diversity throughout the search procedure. To promote diversity, the proposed algorithm combines the concept of gender and age of individuals and the fuzzy logic during the selection of parents. In order to appraise the performance of the techniques used in this study, one of the chemistry problems and some nonlinear functions available in literature is used.
Netter: re-ranking gene network inference predictions using structural network properties.
Ruyssinck, Joeri; Demeester, Piet; Dhaene, Tom; Saeys, Yvan
2016-02-09
Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E.coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics.intec.ugent.be. Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior
Anderson, Eric C; Ng, Thomas C
2016-02-01
We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.
Causal inference in obesity research.
Franks, P W; Atabaki-Pasdar, N
2017-03-01
Obesity is a risk factor for a plethora of severe morbidities and premature death. Most supporting evidence comes from observational studies that are prone to chance, bias and confounding. Even data on the protective effects of weight loss from randomized controlled trials will be susceptible to confounding and bias if treatment assignment cannot be masked, which is usually the case with lifestyle and surgical interventions. Thus, whilst obesity is widely considered the major modifiable risk factor for many chronic diseases, its causes and consequences are often difficult to determine. Addressing this is important, as the prevention and treatment of any disease requires that interventions focus on causal risk factors. Disease prediction, although not dependent on knowing the causes, is nevertheless enhanced by such knowledge. Here, we provide an overview of some of the barriers to causal inference in obesity research and discuss analytical approaches, such as Mendelian randomization, that can help to overcome these obstacles. In a systematic review of the literature in this field, we found: (i) probable causal relationships between adiposity and bone health/disease, cancers (colorectal, lung and kidney cancers), cardiometabolic traits (blood pressure, fasting insulin, inflammatory markers and lipids), uric acid concentrations, coronary heart disease and venous thrombosis (in the presence of pulmonary embolism), (ii) possible causal relationships between adiposity and gray matter volume, depression and common mental disorders, oesophageal cancer, macroalbuminuria, end-stage renal disease, diabetic kidney disease, nuclear cataract and gall stone disease, and (iii) no evidence for causal relationships between adiposity and Alzheimer's disease, pancreatic cancer, venous thrombosis (in the absence of pulmonary embolism), liver function and periodontitis.
Social signals and algorithmic trading of Bitcoin
David Garcia; Frank Schweitzer
2015-01-01
The availability of data on digital traces is growing to unprecedented sizes, but inferring actionable knowledge from large-scale data is far from being trivial. This is especially important for computational finance, where digital traces of human behavior offer a great potential to drive trading strategies. We contribute to this by providing a consistent approach that integrates various datasources in the design of algorithmic traders. This allows us to derive insights into the principles be...
Linguistic Markers of Inference Generation While Reading.
Clinton, Virginia; Carlson, Sarah E; Seipel, Ben
2016-06-01
Words can be informative linguistic markers of psychological constructs. The purpose of this study is to examine associations between word use and the process of making meaningful connections to a text while reading (i.e., inference generation). To achieve this purpose, think-aloud data from third-fifth grade students ([Formula: see text]) reading narrative texts were hand-coded for inferences. These data were also processed with a computer text analysis tool, Linguistic Inquiry and Word Count, for percentages of word use in the following categories: cognitive mechanism words, nonfluencies, and nine types of function words. Findings indicate that cognitive mechanisms were an independent, positive predictor of connections to background knowledge (i.e., elaborative inference generation) and nonfluencies were an independent, negative predictor of connections within the text (i.e., bridging inference generation). Function words did not provide unique variance towards predicting inference generation. These findings are discussed in the context of a cognitive reflection model and the differences between bridging and elaborative inference generation. In addition, potential practical implications for intelligent tutoring systems and computer-based methods of inference identification are presented.
Research on the Algorithm of Avionic Device Fault Diagnosis Based on Fuzzy Expert System
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Based on the fuzzy expert system fault diagnosis theory, the knowledge base architecture and inference engine algorithm are put forward for avionic device fault diagnosis. The knowledge base is constructed by fault query network, of which the basic element is the test-diagnosis fault unit. Every underlying fault cause's membership degree is calculated using fuzzy product inference algorithm, and the fault answer best selection algorithm is developed, to which the deep knowledge is applied. Using some examples,the proposed algorithm is analyzed for its capability of synthesis diagnosis and its improvement compared to greater membership degree first principle.
Haplotype and missing data inference in nuclear families.
Lin, Shin; Chakravarti, Aravinda; Cutler, David J
2004-08-01
Determining linkage phase from population samples with statistical methods is accurate only within regions of high linkage disequilibrium (LD). Yet, affected individuals in a genetic mapping study, including those involving cases and controls, may share sequences identical-by-descent stretching on the order of 10s to 100s of kilobases, quite possibly over regions of low LD in the population. At the same time, inferring phase from nuclear families may be hampered by missing family members, missing genotypes, and the noninformativity of certain genotype patterns. In this study, we reformulate our previous haplotype reconstruction algorithm, and its associated computer program, to phase parents with information derived from population samples as well as from their offspring. In applications of our algorithm to 100-kb stretches, simulated in accordance to a Wright-Fisher model with typical levels of LD in humans, we find that phase reconstruction for 160 trios with 10% missing data is highly accurate (>90%) over the entire length. Furthermore, our algorithm can estimate allelic status for missing data at high accuracy (>95%). Finally, the input capacity of the program is vast, easily handling thousands of segregating sites in > or = 1000 chromosomes.
Compiling Relational Bayesian Networks for Exact Inference
DEFF Research Database (Denmark)
Jaeger, Manfred; Darwiche, Adnan; Chavira, Mark
2006-01-01
We describe in this paper a system for exact inference with relational Bayesian networks as defined in the publicly available PRIMULA tool. The system is based on compiling propositional instances of relational Bayesian networks into arithmetic circuits and then performing online inference...... by evaluating and differentiating these circuits in time linear in their size. We report on experimental results showing successful compilation and efficient inference on relational Bayesian networks, whose PRIMULA--generated propositional instances have thousands of variables, and whose jointrees have clusters...
Inference and the introductory statistics course
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-10-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its hypothetical probabilistic reasoning process is examined in some depth. We argue that the revolution in the teaching of inference must begin. We also discuss some perplexing issues, problematic areas and some new insights into language conundrums associated with introducing the logic of inference through randomization methods.
Relevance of different prior knowledge sources for inferring gene interaction networks.
Olsen, Catharina; Bontempi, Gianluca; Emmert-Streib, Frank; Quackenbush, John; Haibe-Kains, Benjamin
2014-01-01
When inferring networks from high-throughput genomic data, one of the main challenges is the subsequent validation of these networks. In the best case scenario, the true network is partially known from previous research results published in structured databases or research articles. Traditionally, inferred networks are validated against these known interactions. Whenever the recovery rate is gauged to be high enough, subsequent high scoring but unknown inferred interactions are deemed good candidates for further experimental validation. Therefore such validation framework strongly depends on the quantity and quality of published interactions and presents serious pitfalls: (1) availability of these known interactions for the studied problem might be sparse; (2) quantitatively comparing different inference algorithms is not trivial; and (3) the use of these known interactions for validation prevents their integration in the inference procedure. The latter is particularly relevant as it has recently been showed that integration of priors during network inference significantly improves the quality of inferred networks. To overcome these problems when validating inferred networks, we recently proposed a data-driven validation framework based on single gene knock-down experiments. Using this framework, we were able to demonstrate the benefits of integrating prior knowledge and expression data. In this paper we used this framework to assess the quality of different sources of prior knowledge on their own and in combination with different genomic data sets in colorectal cancer. We observed that most prior sources lead to significant F-scores. Furthermore, their integration with genomic data leads to a significant increase in F-scores, especially for priors extracted from full text PubMed articles, known co-expression modules and genetic interactions. Lastly, we observed that the results are consistent for three different data sets: experimental knock-down data and two
Are Evaluations Inferred Directly From Overt Actions?
Brown, Donald; And Others
1975-01-01
The operation of a covert information processing mechanism was investigated in two experiments of the self-persuasion phenomena; i. e., making an inference about a stimulus on the basis of one's past behavior. (Editor)
Autonomous forward inference via DNA computing
Institute of Scientific and Technical Information of China (English)
Fu Yan; Li Gen; Li Yin; Meng Dazhi
2007-01-01
Recent studies direct the researchers into building DNA computing machines with intelligence, which is measured by three main points: autonomous, programmable and able to learn and adapt. Logical inference plays an important role in programmable information processing or computing. Here we present a new method to perform autonomous molecular forward inference for expert system.A novel repetitive recognition site (RRS) technique is invented to design rule-molecules in knowledge base. The inference engine runs autonomously by digesting the rule-molecule, using a Class ⅡB restriction enzyme PpiⅠ. Concentration model has been built to show the feasibility of the inference process under ideal chemical reaction conditions. Moreover, we extend to implement a triggering communication between molecular automata, as a further application of the RRS technique in our model.
Bayesian Cosmological inference beyond statistical isotropy
Souradeep, Tarun; Das, Santanu; Wandelt, Benjamin
2016-10-01
With advent of rich data sets, computationally challenge of inference in cosmology has relied on stochastic sampling method. First, I review the widely used MCMC approach used to infer cosmological parameters and present a adaptive improved implementation SCoPE developed by our group. Next, I present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method with a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. The general, principled, approach to a Bayesian inference of the covariance structure in a random field on a sphere presented here has huge potential for application to other many aspects of cosmology and astronomy, as well as, more distant areas of research like geosciences and climate modelling.
Metacognitive inferences from other people's memory performance.
Smith, Robert W; Schwarz, Norbert
2016-09-01
Three studies show that people draw metacognitive inferences about events from how well others remember the event. Given that memory fades over time, detailed accounts of distant events suggest that the event must have been particularly memorable, for example, because it was extreme. Accordingly, participants inferred that a physical assault (Study 1) or a poor restaurant experience (Studies 2-3) were more extreme when they were well remembered one year rather than one week later. These inferences influence behavioral intentions. For example, participants recommended a more severe punishment for a well-remembered distant rather than recent assault (Study 1). These metacognitive inferences are eliminated when people attribute the reporter's good memory to an irrelevant cause (e.g., photographic memory), thus undermining the informational value of memory performance (Study 3). These studies illuminate how people use lay theories of memory to learn from others' memory performance about characteristics of the world. (PsycINFO Database Record
Experimental evidence for circular inference in schizophrenia
Jardri, Renaud; Duverne, Sandrine; Litvinova, Alexandra S.; Denève, Sophie
2017-01-01
Schizophrenia (SCZ) is a complex mental disorder that may result in some combination of hallucinations, delusions and disorganized thinking. Here SCZ patients and healthy controls (CTLs) report their level of confidence on a forced-choice task that manipulated the strength of sensory evidence and prior information. Neither group's responses can be explained by simple Bayesian inference. Rather, individual responses are best captured by a model with different degrees of circular inference. Circular inference refers to a corruption of sensory data by prior information and vice versa, leading us to `see what we expect' (through descending loops), to `expect what we see' (through ascending loops) or both. Ascending loops are stronger for SCZ than CTLs and correlate with the severity of positive symptoms. Descending loops correlate with the severity of negative symptoms. Both loops correlate with disorganized symptoms. The findings suggest that circular inference might mediate the clinical manifestations of SCZ.
Composite likelihood method for inferring local pedigrees
Nielsen, Rasmus
2017-01-01
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit. PMID:28827797
Semi-supervised prediction of gene regulatory networks using machine learning algorithms
Indian Academy of Sciences (India)
Nihir Patel; T L Wang
2015-10-01
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Inferring Aquifer Transmissivity from River Flow Data
Trichakis, Ioannis; Pistocchi, Alberto
2016-04-01
Daily streamflow data is the measurable result of many different hydrological processes within a basin; therefore, it includes information about all these processes. In this work, recession analysis applied to a pan-European dataset of measured streamflow was used to estimate hydrogeological parameters of the aquifers that contribute to the stream flow. Under the assumption that base-flow in times of no precipitation is mainly due to groundwater, we estimated parameters of European shallow aquifers connected with the stream network, and identified on the basis of the 1:1,500,000 scale Hydrogeological map of Europe. To this end, Master recession curves (MRCs) were constructed based on the RECESS model of the USGS for 1601 stream gauge stations across Europe. The process consists of three stages. Firstly, the model analyses the stream flow time-series. Then, it uses regression to calculate the recession index. Finally, it infers characteristics of the aquifer from the recession index. During time-series analysis, the model identifies those segments, where the number of successive recession days is above a certain threshold. The reason for this pre-processing lies in the necessity for an adequate number of points when performing regression at a later stage. The recession index derives from the semi-logarithmic plot of stream flow over time, and the post processing involves the calculation of geometrical parameters of the watershed through a GIS platform. The program scans the full stream flow dataset of all the stations. For each station, it identifies the segments with continuous recession that exceed a predefined number of days. When the algorithm finds all the segments of a certain station, it analyses them and calculates the best linear fit between time and the logarithm of flow. The algorithm repeats this procedure for the full number of segments, thus it calculates many different values of recession index for each station. After the program has found all the
Operation of the Bayes Inference Engine
Energy Technology Data Exchange (ETDEWEB)
Hanson, K.M.; Cunningham, G.S.
1998-07-27
The authors have developed a computer application, called the Bayes Inference Engine, to enable one to make inferences about models of a physical object from radiographs taken of it. In the BIE calculational models are represented by a data-flow diagram that can be manipulated by the analyst in a graphical-programming environment. The authors demonstrate the operation of the BIE in terms of examples of two-dimensional tomographic reconstruction including uncertainty estimation.
Causal inference in economics and marketing
Varian, Hal R.
2016-01-01
This is an elementary introduction to causal inference in economics written for readers familiar with machine learning methods. The critical step in any causal analysis is estimating the counterfactual—a prediction of what would have happened in the absence of the treatment. The powerful techniques used in machine learning may be useful for developing better estimates of the counterfactual, potentially improving causal inference. PMID:27382144
Inferring action structure and causal relationships in continuous sequences of human action.
Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare
2015-02-01
In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries.
Andrade, Daniel
2012-01-01
We present a new method to propagate lower bounds on conditional probability distributions in conventional Bayesian networks. Our method guarantees to provide outer approximations of the exact lower bounds. A key advantage is that we can use any available algorithms and tools for Bayesian networks in order to represent and infer lower bounds. This new method yields results that are provable exact for trees with binary variables, and results which are competitive to existing approximations in credal networks for all other network structures. Our method is not limited to a specific kind of network structure. Basically, it is also not restricted to a specific kind of inference, but we restrict our analysis to prognostic inference in this article. The computational complexity is superior to that of other existing approaches.
Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian Inference
Regier, Jeffrey; Giordano, Ryan; Thomas, Rollin; Schlegel, David; McAuliffe, Jon; Prabhat,
2016-01-01
Celeste is a procedure for inferring astronomical catalogs that attains state-of-the-art scientific results. To date, Celeste has been scaled to at most hundreds of megabytes of astronomical images: Bayesian posterior inference is notoriously demanding computationally. In this paper, we report on a scalable, parallel version of Celeste, suitable for learning catalogs from modern large-scale astronomical datasets. Our algorithmic innovations include a fast numerical optimization routine for Bayesian posterior inference and a statistically efficient scheme for decomposing astronomical optimization problems into subproblems. Our scalable implementation is written entirely in Julia, a new high-level dynamic programming language designed for scientific and numerical computing. We use Julia's high-level constructs for shared and distributed memory parallelism, and demonstrate effective load balancing and efficient scaling on up to 8192 Xeon cores on the NERSC Cori supercomputer.
Evolutionary Graph Drawing Algorithms
Institute of Scientific and Technical Information of China (English)
Huang Jing-wei; Wei Wen-fang
2003-01-01
In this paper, graph drawing algorithms based on genetic algorithms are designed for general undirected graphs and directed graphs. As being shown, graph drawing algorithms designed by genetic algorithms have the following advantages: the frames of the algorithms are unified, the method is simple, different algorithms may be attained by designing different objective functions, therefore enhance the reuse of the algorithms. Also, aesthetics or constrains may be added to satisfy different requirements.
Polynomial Chaos Surrogates for Bayesian Inference
Le Maitre, Olivier
2016-01-06
The Bayesian inference is a popular probabilistic method to solve inverse problems, such as the identification of field parameter in a PDE model. The inference rely on the Bayes rule to update the prior density of the sought field, from observations, and derive its posterior distribution. In most cases the posterior distribution has no explicit form and has to be sampled, for instance using a Markov-Chain Monte Carlo method. In practice the prior field parameter is decomposed and truncated (e.g. by means of Karhunen- Lo´eve decomposition) to recast the inference problem into the inference of a finite number of coordinates. Although proved effective in many situations, the Bayesian inference as sketched above faces several difficulties requiring improvements. First, sampling the posterior can be a extremely costly task as it requires multiple resolutions of the PDE model for different values of the field parameter. Second, when the observations are not very much informative, the inferred parameter field can highly depends on its prior which can be somehow arbitrary. These issues have motivated the introduction of reduced modeling or surrogates for the (approximate) determination of the parametrized PDE solution and hyperparameters in the description of the prior field. Our contribution focuses on recent developments in these two directions: the acceleration of the posterior sampling by means of Polynomial Chaos expansions and the efficient treatment of parametrized covariance functions for the prior field. We also discuss the possibility of making such approach adaptive to further improve its efficiency.
Fast and accurate inference of local ancestry in Latino populations
Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran
2012-01-01
Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos. Availability: http://lamp.icsi.berkeley.edu/lamp/lampld/ Contact: bpasaniu@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22495753
Inferring Group Processes from Computer-Mediated Affective Text Analysis
Energy Technology Data Exchange (ETDEWEB)
Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University
2011-02-01
Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.
Bayesian large-scale structure inference and cosmic web analysis
Leclercq, Florent
2015-01-01
Surveys of the cosmic large-scale structure carry opportunities for building and testing cosmological theories about the origin and evolution of the Universe. This endeavor requires appropriate data assimilation tools, for establishing the contact between survey catalogs and models of structure formation. In this thesis, we present an innovative statistical approach for the ab initio simultaneous analysis of the formation history and morphology of the cosmic web: the BORG algorithm infers the primordial density fluctuations and produces physical reconstructions of the dark matter distribution that underlies observed galaxies, by assimilating the survey data into a cosmological structure formation model. The method, based on Bayesian probability theory, provides accurate means of uncertainty quantification. We demonstrate the application of BORG to the Sloan Digital Sky Survey data and describe the primordial and late-time large-scale structure in the observed volume. We show how the approach has led to the fi...
Inferring physical properties of galaxies from their emission line spectra
Ucci, Graziano; Gallerani, Simona; Pallottini, Andrea
2016-01-01
We present a new approach based on Supervised Machine Learning (SML) algorithms to infer key physical properties of galaxies (density, metallicity, column density and ionization parameter) from their emission line spectra. We introduce a numerical code (called GAME, GAlaxy Machine learning for Emission lines) implementing this method and test it extensively. GAME delivers excellent predictive performances, especially for estimates of metallicity and column densities. We compare GAME with the most widely used diagnostics (e.g. R$_{23}$, [NII]$\\lambda$6584 / H$\\alpha$ indicators) showing that it provides much better accuracy and wider applicability range. GAME is particularly suitable for use in combination with Integral Field Unit (IFU) spectroscopy, both for rest-frame optical/UV nebular lines and far-infrared/sub-mm lines arising from Photo-Dissociation Regions. Finally, GAME can also be applied to the analysis of synthetic galaxy maps built from numerical simulations.
Inference of Tumor Phylogenies with Improved Somatic Mutation Discovery
Salari, Raheleh
2013-01-01
Next-generation sequencing technologies provide a powerful tool for studying genome evolution during progression of advanced diseases such as cancer. Although many recent studies have employed new sequencing technologies to detect mutations across multiple, genetically related tumors, current methods do not exploit available phylogenetic information to improve the accuracy of their variant calls. Here, we present a novel algorithm that uses somatic single nucleotide variations (SNVs) in multiple, related tissue samples as lineage markers for phylogenetic tree reconstruction. Our method then leverages the inferred phylogeny to improve the accuracy of SNV discovery. Experimental analyses demonstrate that our method achieves up to 32% improvement for somatic SNV calling of multiple related samples over the accuracy of GATK\\'s Unified Genotyper, the state of the art multisample SNV caller. © 2013 Springer-Verlag.
Inference of population history and patterns from molecular data
DEFF Research Database (Denmark)
Tataru, Paula
, the existing mathematical models and computational methods need to be reformulated. I address this from an inference perspective in two areas of bioinformatics. Population genetics studies the influence exerted by various factors on the dynamics of a population's genetic variation. These factors cover...... evolutionary forces, such as mutation and selection, but also changes in population size. The aim in population genetics is to untangle the history of a population from observed genetic variation. This subject is dominated by two dual models, the Wright-Fisher and coalescent. I first introduce a new...... how standard algorithms can be improved by including pattern occurrence in the hidden structure of observed sequences. Such a hidden structure could be the localization and composition of genes within a DNA sequence. The second problem I target is the computational prediction of the pattern...
An Alternating Iterative Method and Its Application in Statistical Inference
Institute of Scientific and Technical Information of China (English)
Ning Zhong SHI; Guo Rong HU; Qing CUI
2008-01-01
This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465–472 (1983)), Shi N. Z. (J. Multivariate Anal.,50, 282–293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878–1893 (1998)).
Maximum caliber inference and the stochastic Ising model
Cafaro, Carlo; Ali, Sean Alan
2016-11-01
We investigate the maximum caliber variational principle as an inference algorithm used to predict dynamical properties of complex nonequilibrium, stationary, statistical systems in the presence of incomplete information. Specifically, we maximize the path entropy over discrete time step trajectories subject to normalization, stationarity, and detailed balance constraints together with a path-dependent dynamical information constraint reflecting a given average global behavior of the complex system. A general expression for the transition probability values associated with the stationary random Markov processes describing the nonequilibrium stationary system is computed. By virtue of our analysis, we uncover that a convenient choice of the dynamical information constraint together with a perturbative asymptotic expansion with respect to its corresponding Lagrange multiplier of the general expression for the transition probability leads to a formal overlap with the well-known Glauber hyperbolic tangent rule for the transition probability for the stochastic Ising model in the limit of very high temperatures of the heat reservoir.
Energy Technology Data Exchange (ETDEWEB)
Fontana, W.
1990-12-13
In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.
Inferring topologies of complex networks with hidden variables.
Wu, Xiaoqun; Wang, Weihan; Zheng, Wei Xing
2012-10-01
Network topology plays a crucial role in determining a network's intrinsic dynamics and function, thus understanding and modeling the topology of a complex network will lead to greater knowledge of its evolutionary mechanisms and to a better understanding of its behaviors. In the past few years, topology identification of complex networks has received increasing interest and wide attention. Many approaches have been developed for this purpose, including synchronization-based identification, information-theoretic methods, and intelligent optimization algorithms. However, inferring interaction patterns from observed dynamical time series is still challenging, especially in the absence of knowledge of nodal dynamics and in the presence of system noise. The purpose of this work is to present a simple and efficient approach to inferring the topologies of such complex networks. The proposed approach is called "piecewise partial Granger causality." It measures the cause-effect connections of nonlinear time series influenced by hidden variables. One commonly used testing network, two regular networks with a few additional links, and small-world networks are used to evaluate the performance and illustrate the influence of network parameters on the proposed approach. Application to experimental data further demonstrates the validity and robustness of our method.
Feedback Message Passing for Inference in Gaussian Graphical Models
Liu, Ying; Anandkumar, Animashree; Willsky, Alan S
2011-01-01
While loopy belief propagation (LBP) performs reasonably well for inference in some Gaussian graphical models with cycles, its performance is unsatisfactory for many others. In particular for some models LBP does not converge, and in general when it does converge, the computed variances are incorrect (except for cycle-free graphs for which belief propagation (BP) is non-iterative and exact). In this paper we propose {\\em feedback message passing} (FMP), a message-passing algorithm that makes use of a special set of vertices (called a {\\em feedback vertex set} or {\\em FVS}) whose removal results in a cycle-free graph. In FMP, standard BP is employed several times on the cycle-free subgraph excluding the FVS while a special message-passing scheme is used for the nodes in the FVS. The computational complexity of exact inference is $O(k^2n)$, where $k$ is the number of feedback nodes, and $n$ is the total number of nodes. When the size of the FVS is very large, FMP is intractable. Hence we propose {\\em approximat...
Bayesian inference for generalized linear models for spiking neurons
Directory of Open Access Journals (Sweden)
Sebastian Gerwinn
2010-05-01
Full Text Available Generalized Linear Models (GLMs are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate.
An argument for mechanism-based statistical inference in cancer.
Geman, Donald; Ochs, Michael; Price, Nathan D; Tomasetti, Cristian; Younes, Laurent
2015-05-01
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
Generalized Fiducial Inference for Binary Logistic Item Response Models.
Liu, Yang; Hannig, Jan
2016-06-01
Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data.
Personalized microbial network inference via co-regularized spectral clustering.
Imangaliyev, Sultan; Keijser, Bart; Crielaard, Wim; Tsivtsivadze, Evgeni
2015-07-15
We use Human Microbiome Project (HMP) cohort (Peterson et al., 2009) to infer personalized oral microbial networks of healthy individuals. To determine clustering of individuals with similar microbial profiles, co-regularized spectral clustering algorithm is applied to the dataset. For each cluster we discovered, we compute co-occurrence relationships among the microbial species that determine microbial network per cluster of individuals. The results of our study suggest that there are several differences in microbial interactions on personalized network level in healthy oral samples acquired from various niches. Based on the results of co-regularized spectral clustering we discover two groups of individuals with different topology of their microbial interaction network. The results of microbial network inference suggest that niche-wise interactions are different in these two groups. Our study shows that healthy individuals have different microbial clusters according to their oral microbiota. Such personalized microbial networks open a better understanding of the microbial ecology of healthy oral cavities and new possibilities for future targeted medication. The scripts written in scientific Python and in Matlab, which were used for network visualization, are provided for download on the website http://learning-machines.com/. Copyright © 2015 Elsevier Inc. All rights reserved.
Structural influence of gene networks on their inference: analysis of C3NET
Directory of Open Access Journals (Sweden)
Emmert-Streib Frank
2011-06-01
Full Text Available Abstract Background The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. Results In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. Conclusions The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods. Reviewers This article was reviewed by Lev Klebanov, Joel Bader and Yuriy Gusev.
Mood Inference Machine: Framework to Infer Affective Phenomena in ROODA Virtual Learning Environment
Directory of Open Access Journals (Sweden)
Magalí Teresinha Longhi
2012-02-01
Full Text Available This article presents a mechanism to infer mood states, aiming to provide virtual learning environments (VLEs with a tool able to recognize the student’s motivation. The inference model has as its parameters personality traits, motivational factors obtained through behavioral standards and the affective subjectivity identified in texts made available in the communication functionalities of the VLE. In the inference machine, such variables are treated under probability reasoning, more precisely by Bayesian networks.
Algorithmic Randomness as Foundation of Inductive Reasoning and Artificial Intelligence
Hutter, Marcus
2011-01-01
This article is a brief personal account of the past, present, and future of algorithmic randomness, emphasizing its role in inductive inference and artificial intelligence. It is written for a general audience interested in science and philosophy. Intuitively, randomness is a lack of order or predictability. If randomness is the opposite of determinism, then algorithmic randomness is the opposite of computability. Besides many other things, these concepts have been used to quantify Ockham's razor, solve the induction problem, and define intelligence.
Use of a Novel Grammatical Inference Approach in Classification of Amyloidogenic Hexapeptides
Directory of Open Access Journals (Sweden)
Wojciech Wieczorek
2016-01-01
Full Text Available The present paper is a novel contribution to the field of bioinformatics by using grammatical inference in the analysis of data. We developed an algorithm for generating star-free regular expressions which turned out to be good recommendation tools, as they are characterized by a relatively high correlation coefficient between the observed and predicted binary classifications. The experiments have been performed for three datasets of amyloidogenic hexapeptides, and our results are compared with those obtained using the graph approaches, the current state-of-the-art methods in heuristic automata induction, and the support vector machine. The results showed the superior performance of the new grammatical inference algorithm on fixed-length amyloid datasets.
Approximate inference on planar graphs using loop calculus and belief progagation
Energy Technology Data Exchange (ETDEWEB)
Chertkov, Michael [Los Alamos National Laboratory; Gomez, Vicenc [RADBOUD UNIV; Kappen, Hilbert [RADBOUD UNIV
2009-01-01
We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006b) allows to express the exact partition function Z of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze in detail both the loop series and the Pfaffian series for models with binary variables and pairwise interactions, and show that the first term of the Pfaffian series can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.
Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models
Peixoto, Tiago P
2014-01-01
We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear $O(N\\ln^2N)$ complexity, where $N$ is the number of nodes in the network, independent on the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result. PMID:28133490
Directory of Open Access Journals (Sweden)
Yue Fan
2017-01-01
Full Text Available Gene regulatory networks (GRNs play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
The path inference filter: model-based low-latency map matching of probe vehicle data
Hunter, Timothy; Abbeel, Pieter; Bayen, Alexandre
2011-01-01
Traffic congestion has a significant impact around the world. Building reliable and cost effective traffic monitoring systems is a prerequisite to addressing this phenomenon. Historically, traffic estimation has been limited to highways, and has relied on a static, dedicated sensing infrastructure such as loop detectors or cameras. In the case of city roads, this estimation problem is rather involved. This situation can be partly attributed to the lack of effective sensing in an urban setting. In this context, the most promising source of data is the GPS receiver in personal smartphones and commercial fleet vehicles. In this article, we present some algorithms that leverage this trend to produce some streaming data compatible with current state-of-the-art traffic estimation algorithms. These algorithms, which we will refer altogether as the path inference algorithm, have been implemented and deployed inside the Mobile Millennium system at Berkeley.
A symmetry-based method to infer structural brain networks from probabilistic tractography data
Directory of Open Access Journals (Sweden)
Kamal Shadi
2016-11-01
Full Text Available Recent progress in diffusion MRI and tractography algorithms as well as the launch of the Human Connectome Project (HCP have provided brain research with an abundance of structural connectivity data. In this work, we describe and evaluate a method that can infer the structural brain network that interconnects a given set of Regions of Interest (ROIs from probabilistic tractography data. The proposed method, referred to as Minimum Asymmetry Network Inference Algorithm (MANIA, does not determine the connectivity between two ROIs based on an arbitrary connectivity threshold. Instead, we exploit a basic limitation of the tractography process: the observed streamlines from a source to a target do not provide any information about the polarity of the underlying white matter, and so if there are some fibers connecting two voxels (or two ROIs X and Y, tractography should be able in principle to follow this connection in both directions, from X to Y and from Y to X. We leverage this limitation to formulate the network inference process as an optimization problem that minimizes the (appropriately normalized asymmetry of the observed network. We evaluate the proposed method using both the FiberCup dataset and based on a noise model that randomly corrupts the observed connectivity of synthetic networks. As a case-study, we apply MANIA on diffusion MRI data from 28 healthy subjects to infer the structural network between 18 corticolimbic ROIs that are associated with various neuropsychiatric conditions including depression, anxiety and addiction.
Identifying Seizure Onset Zone From the Causal Connectivity Inferred Using Directed Information
Malladi, Rakesh; Kalamangalam, Giridhar; Tandon, Nitin; Aazhang, Behnaam
2016-10-01
In this paper, we developed a model-based and a data-driven estimator for directed information (DI) to infer the causal connectivity graph between electrocorticographic (ECoG) signals recorded from brain and to identify the seizure onset zone (SOZ) in epileptic patients. Directed information, an information theoretic quantity, is a general metric to infer causal connectivity between time-series and is not restricted to a particular class of models unlike the popular metrics based on Granger causality or transfer entropy. The proposed estimators are shown to be almost surely convergent. Causal connectivity between ECoG electrodes in five epileptic patients is inferred using the proposed DI estimators, after validating their performance on simulated data. We then proposed a model-based and a data-driven SOZ identification algorithm to identify SOZ from the causal connectivity inferred using model-based and data-driven DI estimators respectively. The data-driven SOZ identification outperforms the model-based SOZ identification algorithm when benchmarked against visual analysis by neurologist, the current clinical gold standard. The causal connectivity analysis presented here is the first step towards developing novel non-surgical treatments for epilepsy.
Dorazio, R.M.; Johnson, F.A.
2003-01-01
Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
Algorithmic and analytical methods in network biology.
Koyutürk, Mehmet
2010-01-01
During the genomic revolution, algorithmic and analytical methods for organizing, integrating, analyzing, and querying biological sequence data proved invaluable. Today, increasing availability of high-throughput data pertaining to functional states of biomolecules, as well as their interactions, enables genome-scale studies of the cell from a systems perspective. The past decade witnessed significant efforts on the development of computational infrastructure for large-scale modeling and analysis of biological systems, commonly using network models. Such efforts lead to novel insights into the complexity of living systems, through development of sophisticated abstractions, algorithms, and analytical techniques that address a broad range of problems, including the following: (1) inference and reconstruction of complex cellular networks; (2) identification of common and coherent patterns in cellular networks, with a view to understanding the organizing principles and building blocks of cellular signaling, regulation, and metabolism; and (3) characterization of cellular mechanisms that underlie the differences between living systems, in terms of evolutionary diversity, development and differentiation, and complex phenotypes, including human disease. These problems pose significant algorithmic and analytical challenges because of the inherent complexity of the systems being studied; limitations of data in terms of availability, scope, and scale; intractability of resulting computational problems; and limitations of reference models for reliable statistical inference. This article provides a broad overview of existing algorithmic and analytical approaches to these problems, highlights key biological insights provided by these approaches, and outlines emerging opportunities and challenges in computational systems biology.
Multisensory oddity detection as bayesian inference.
Directory of Open Access Journals (Sweden)
Timothy Hospedales
Full Text Available A key goal for the perceptual system is to optimally combine information from all the senses that may be available in order to develop the most accurate and unified picture possible of the outside world. The contemporary theoretical framework of ideal observer maximum likelihood integration (MLI has been highly successful in modelling how the human brain combines information from a variety of different sensory modalities. However, in various recent experiments involving multisensory stimuli of uncertain correspondence, MLI breaks down as a successful model of sensory combination. Within the paradigm of direct stimulus estimation, perceptual models which use Bayesian inference to resolve correspondence have recently been shown to generalize successfully to these cases where MLI fails. This approach has been known variously as model inference, causal inference or structure inference. In this paper, we examine causal uncertainty in another important class of multi-sensory perception paradigm--that of oddity detection and demonstrate how a Bayesian ideal observer also treats oddity detection as a structure inference problem. We validate this approach by showing that it provides an intuitive and quantitative explanation of an important pair of multi-sensory oddity detection experiments--involving cues across and within modalities--for which MLI previously failed dramatically, allowing a novel unifying treatment of within and cross modal multisensory perception. Our successful application of structure inference models to the new 'oddity detection' paradigm, and the resultant unified explanation of across and within modality cases provide further evidence to suggest that structure inference may be a commonly evolved principle for combining perceptual information in the brain.
Multisensory oddity detection as bayesian inference.
Hospedales, Timothy; Vijayakumar, Sethu
2009-01-01
A key goal for the perceptual system is to optimally combine information from all the senses that may be available in order to develop the most accurate and unified picture possible of the outside world. The contemporary theoretical framework of ideal observer maximum likelihood integration (MLI) has been highly successful in modelling how the human brain combines information from a variety of different sensory modalities. However, in various recent experiments involving multisensory stimuli of uncertain correspondence, MLI breaks down as a successful model of sensory combination. Within the paradigm of direct stimulus estimation, perceptual models which use Bayesian inference to resolve correspondence have recently been shown to generalize successfully to these cases where MLI fails. This approach has been known variously as model inference, causal inference or structure inference. In this paper, we examine causal uncertainty in another important class of multi-sensory perception paradigm--that of oddity detection and demonstrate how a Bayesian ideal observer also treats oddity detection as a structure inference problem. We validate this approach by showing that it provides an intuitive and quantitative explanation of an important pair of multi-sensory oddity detection experiments--involving cues across and within modalities--for which MLI previously failed dramatically, allowing a novel unifying treatment of within and cross modal multisensory perception. Our successful application of structure inference models to the new 'oddity detection' paradigm, and the resultant unified explanation of across and within modality cases provide further evidence to suggest that structure inference may be a commonly evolved principle for combining perceptual information in the brain.
A review of integration strategies to support gene regulatory network construction.
Chen, Hailin; VanBuren, Vincent
2012-01-01
Gene regulatory network (GRN) construction is a central task of systems biology. Integration of different data sources to infer and construct GRNs is an important consideration for the success of this effort. In this paper, we will discuss distinctive strategies of data integration for GRN construction. Basically, the process of integration of different data sources is divided into two phases: the first phase is collection of the required data and the second phase is data processing with advanced algorithms to infer the GRNs. In this paper these two phases are called "structural integration" and "analytic integration," respectively. Compared with the nonintegration strategies, the integration strategies perform quite well and have better agreement with the experimental evidence.
A Review of Integration Strategies to Support Gene Regulatory Network Construction
Directory of Open Access Journals (Sweden)
Hailin Chen
2012-01-01
Full Text Available Gene regulatory network (GRN construction is a central task of systems biology. Integration of different data sources to infer and construct GRNs is an important consideration for the success of this effort. In this paper, we will discuss distinctive strategies of data integration for GRN construction. Basically, the process of integration of different data sources is divided into two phases: the first phase is collection of the required data and the second phase is data processing with advanced algorithms to infer the GRNs. In this paper these two phases are called “structural integration” and “analytic integration,” respectively. Compared with the nonintegration strategies, the integration strategies perform quite well and have better agreement with the experimental evidence.
NIFTY - Numerical Information Field Theory. A versatile PYTHON library for signal inference
Selig, M.; Bell, M. R.; Junklewitz, H.; Oppermann, N.; Reinecke, M.; Greiner, M.; Pachajoa, C.; Enßlin, T. A.
2013-06-01
NIFTy (Numerical Information Field Theory) is a software package designed to enable the development of signal inference algorithms that operate regardless of the underlying spatial grid and its resolution. Its object-oriented framework is written in Python, although it accesses libraries written in Cython, C++, and C for efficiency. NIFTy offers a toolkit that abstracts discretized representations of continuous spaces, fields in these spaces, and operators acting on fields into classes. Thereby, the correct normalization of operations on fields is taken care of automatically without concerning the user. This allows for an abstract formulation and programming of inference algorithms, including those derived within information field theory. Thus, NIFTy permits its user to rapidly prototype algorithms in 1D, and then apply the developed code in higher-dimensional settings of real world problems. The set of spaces on which NIFTy operates comprises point sets, n-dimensional regular grids, spherical spaces, their harmonic counterparts, and product spaces constructed as combinations of those. The functionality and diversity of the package is demonstrated by a Wiener filter code example that successfully runs without modification regardless of the space on which the inference problem is defined. NIFTy homepage http://www.mpa-garching.mpg.de/ift/nifty/; Excerpts of this paper are part of the NIFTy source code and documentation.
Inference of gene regulatory networks from genetic perturbations with linear regression model.
Directory of Open Access Journals (Sweden)
Zijian Dong
Full Text Available It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM. In this paper, a linear regression (LR model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI. Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based and the Sparsity-aware Maximum Likelihood (SML, are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.
Directory of Open Access Journals (Sweden)
Yongjun Li
2017-01-01
Full Text Available A publication network contains abundant knowledge about advisor-student relationships. However, these relationship labels are not explicitly shown and need to be identified based on the hidden knowledge. The exploration of such relationships can benefit many interesting applications such as expert finding and research community analysis and has already drawn many scholars’ attention. In this paper, based on the common knowledge that a student usually coauthors his papers with his advisor, we propose an approximate MaxConfidence measure and present an advisor-student relationship identification algorithm based on the proposed measure. Based on the comparison of two authors’ publication list, we first employ the proposed measure to determine the time interval that a potential advising relationship lasts and then infer the likelihood of this potential advising relationship. Our algorithm suggests an advisor for each student based on the inference results. The experiment results show that our algorithm can infer advisor-student relationships efficiently and achieve a better accuracy than the time-constrained probabilistic factor graph (TPFG model without any supervised information. Also, we apply some reasonable restrictions on the dataset to reduce the search space significantly.
Inference of gene-phenotype associations via protein-protein interaction and orthology.
Directory of Open Access Journals (Sweden)
Panwen Wang
Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.
Amaral, Selene da Rocha; Baccalá, Luiz A; Barbosa, Leonardo S; Caticha, Nestor
2017-06-01
Proper neural connectivity inference has become essential for understanding cognitive processes associated with human brain function. Its efficacy is often hampered by the curse of dimensionality. In the electroencephalogram case, which is a noninvasive electrophysiological monitoring technique to record electrical activity of the brain, a possible way around this is to replace multichannel electrode information with dipole reconstructed data. We use a method based on maximum entropy and the renormalization group to infer the position of the sources, whose success hinges on transmitting information from low- to high-resolution representations of the cortex. The performance of this method compares favorably to other available source inference algorithms, which are ranked here in terms of their performance with respect to directed connectivity inference by using artificially generated dynamic data. We examine some representative scenarios comprising different numbers of dynamically connected dipoles over distinct cortical surface positions and under different sensor noise impairment levels. The overall conclusion is that inverse problem solutions do not affect the correct inference of the direction of the flow of information as long as the equivalent dipole sources are correctly found.
Estimating uncertainty of inference for validation
Energy Technology Data Exchange (ETDEWEB)
Booker, Jane M [Los Alamos National Laboratory; Langenbrunner, James R [Los Alamos National Laboratory; Hemez, Francois M [Los Alamos National Laboratory; Ross, Timothy J [UNM
2010-09-30
We present a validation process based upon the concept that validation is an inference-making activity. This has always been true, but the association has not been as important before as it is now. Previously, theory had been confirmed by more data, and predictions were possible based on data. The process today is to infer from theory to code and from code to prediction, making the role of prediction somewhat automatic, and a machine function. Validation is defined as determining the degree to which a model and code is an accurate representation of experimental test data. Imbedded in validation is the intention to use the computer code to predict. To predict is to accept the conclusion that an observable final state will manifest; therefore, prediction is an inference whose goodness relies on the validity of the code. Quantifying the uncertainty of a prediction amounts to quantifying the uncertainty of validation, and this involves the characterization of uncertainties inherent in theory/models/codes and the corresponding data. An introduction to inference making and its associated uncertainty is provided as a foundation for the validation problem. A mathematical construction for estimating the uncertainty in the validation inference is then presented, including a possibility distribution constructed to represent the inference uncertainty for validation under uncertainty. The estimation of inference uncertainty for validation is illustrated using data and calculations from Inertial Confinement Fusion (ICF). The ICF measurements of neutron yield and ion temperature were obtained for direct-drive inertial fusion capsules at the Omega laser facility. The glass capsules, containing the fusion gas, were systematically selected with the intent of establishing a reproducible baseline of high-yield 10{sup 13}-10{sup 14} neutron output. The deuterium-tritium ratio in these experiments was varied to study its influence upon yield. This paper on validation inference is the
Clustering algorithms for Stokes space modulation format recognition
DEFF Research Database (Denmark)
Boada, Ricard; Borkowski, Robert; Tafur Monroy, Idelfonso
2015-01-01
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely...
BiomeNet: a Bayesian model for inference of metabolic divergence among microbial communities.
Directory of Open Access Journals (Sweden)
Mahdi Shafiei
2014-11-01
Full Text Available Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks, for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks are shared across all groups (microbiome samples, but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems. The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD
Deep Learning for Population Genetic Inference.
Directory of Open Access Journals (Sweden)
Sara Sheehan
2016-03-01
Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Generative Inferences Based on Learned Relations.
Chen, Dawn; Lu, Hongjing; Holyoak, Keith J
2016-11-17
A key property of relational representations is their generativity: From partial descriptions of relations between entities, additional inferences can be drawn about other entities. A major theoretical challenge is to demonstrate how the capacity to make generative inferences could arise as a result of learning relations from non-relational inputs. In the present paper, we show that a bottom-up model of relation learning, initially developed to discriminate between positive and negative examples of comparative relations (e.g., deciding whether a sheep is larger than a rabbit), can be extended to make generative inferences. The model is able to make quasi-deductive transitive inferences (e.g., "If A is larger than B and B is larger than C, then A is larger than C") and to qualitatively account for human responses to generative questions such as "What is an animal that is smaller than a dog?" These results provide evidence that relational models based on bottom-up learning mechanisms are capable of supporting generative inferences.
A statistical learning algorithm for word segmentation
Van Aken, Jerry R
2011-01-01
In natural speech, the speaker does not pause between words, yet a human listener somehow perceives this continuous stream of phonemes as a series of distinct words. The detection of boundaries between spoken words is an instance of a general capability of the human neocortex to remember and to recognize recurring sequences. This paper describes a computer algorithm that is designed to solve the problem of locating word boundaries in blocks of English text from which the spaces have been removed. This problem avoids the complexities of processing speech but requires similar capabilities for detecting recurring sequences. The algorithm that is described in this paper relies entirely on statistical relationships between letters in the input stream to infer the locations of word boundaries. The source code for a C++ version of this algorithm is presented in an appendix.
Algorithm Animation with Galant.
Stallmann, Matthias F
2017-01-01
Although surveys suggest positive student attitudes toward the use of algorithm animations, it is not clear that they improve learning outcomes. The Graph Algorithm Animation Tool, or Galant, challenges and motivates students to engage more deeply with algorithm concepts, without distracting them with programming language details or GUIs. Even though Galant is specifically designed for graph algorithms, it has also been used to animate other algorithms, most notably sorting algorithms.
Using Alien Coins to Test Whether Simple Inference Is Bayesian
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Genetics Home Reference: GRN-related frontotemporal dementia
... Neumann M, Kwong LK, Trojanowski JQ, Lee VM, Grossman M. Clinical, genetic, and pathologic characteristics of patients ... Feldman H, Woltjer R, Miller CA, Wood EM, Grossman M, McCluskey L, Clark CM, Neumann M, Danek ...
Replica inference approach to unsupervised multiscale image segmentation.
Hu, Dandan; Ronhovde, Peter; Nussinov, Zohar
2012-01-01
We apply a replica-inference-based Potts model method to unsupervised image segmentation on multiple scales. This approach was inspired by the statistical mechanics problem of "community detection" and its phase diagram. Specifically, the problem is cast as identifying tightly bound clusters ("communities" or "solutes") against a background or "solvent." Within our multiresolution approach, we compute information-theory-based correlations among multiple solutions ("replicas") of the same graph over a range of resolutions. Significant multiresolution structures are identified by replica correlations manifest by information theory overlaps. We further employ such information theory measures (such as normalized mutual information and variation of information), thermodynamic quantities such as the system entropy and energy, and dynamic measures monitoring the convergence time to viable solutions as metrics for transitions between various solvable and unsolvable phases. Within the solvable phase, transitions between contending solutions (such as those corresponding to segmentations on different scales) may also appear. With the aid of these correlations as well as thermodynamic measures, the phase diagram of the corresponding Potts model is analyzed at both zero and finite temperatures. Optimal parameters corresponding to a sensible unsupervised segmentations appear within the "easy phase" of the Potts model. Our algorithm is fast and shown to be at least as accurate as the best algorithms to date and to be especially suited to the detection of camouflaged images.
Statistical inference based on divergence measures
Pardo, Leandro
2005-01-01
The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...
Hierarchical probabilistic inference of cosmic shear
Schneider, Michael D; Marshall, Philip J; Dawson, William A; Meyers, Joshua; Bard, Deborah J; Lang, Dustin
2014-01-01
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the glo...
On Tidal Inference in the Diurnal Band
Ray, R. D.
2017-01-01
Standard methods of tidal inference should be revised to account for a known resonance that occurs mostly within the K(sub 1) tidal group in the diurnal band. The resonance arises from a free rotational mode of Earth caused by the fluid core. In a set of 110 bottom-pressure tide stations, the amplitude of the P(sub 1) tidal constituent is shown to be suppressed relative to K(sub 1), which is in good agreement with the resonance theory. Standard formulas for the K(sub 1) nodal modulation remain essentially unaffected. Two examples are given of applications of the refined inference methodology: one with monthly tide gauge data and one with satellite altimetry. For some altimeter-constrained tide models, an inferred P(sub 1) constituent is found to be more accurate than a directly determined one.
Parameter inference with estimated covariance matrices
Sellentin, Elena
2015-01-01
When inferring parameters from a Gaussian-distributed data set by computing a likelihood, a covariance matrix is needed that describes the data errors and their correlations. If the covariance matrix is not known a priori, it may be estimated and thereby becomes a random object with some intrinsic uncertainty itself. We show how to infer parameters in the presence of such an estimated covariance matrix, by marginalising over the true covariance matrix, conditioned on its estimated value. This leads to a likelihood function that is no longer Gaussian, but rather an adapted version of a multivariate $t$-distribution, which has the same numerical complexity as the multivariate Gaussian. As expected, marginalisation over the true covariance matrix improves inference when compared with Hartlap et al.'s method, which uses an unbiased estimate of the inverse covariance matrix but still assumes that the likelihood is Gaussian.
Inferring epidemic network topology from surveillance data.
Wan, Xiang; Liu, Jiming; Cheung, William K; Tong, Tiejun
2014-01-01
The transmission of infectious diseases can be affected by many or even hidden factors, making it difficult to accurately predict when and where outbreaks may emerge. One approach at the moment is to develop and deploy surveillance systems in an effort to detect outbreaks as timely as possible. This enables policy makers to modify and implement strategies for the control of the transmission. The accumulated surveillance data including temporal, spatial, clinical, and demographic information, can provide valuable information with which to infer the underlying epidemic networks. Such networks can be quite informative and insightful as they characterize how infectious diseases transmit from one location to another. The aim of this work is to develop a computational model that allows inferences to be made regarding epidemic network topology in heterogeneous populations. We apply our model on the surveillance data from the 2009 H1N1 pandemic in Hong Kong. The inferred epidemic network displays significant effect on the propagation of infectious diseases.
Examples in parametric inference with R
Dixit, Ulhas Jayram
2016-01-01
This book discusses examples in parametric inference with R. Combining basic theory with modern approaches, it presents the latest developments and trends in statistical inference for students who do not have an advanced mathematical and statistical background. The topics discussed in the book are fundamental and common to many fields of statistical inference and thus serve as a point of departure for in-depth study. The book is divided into eight chapters: Chapter 1 provides an overview of topics on sufficiency and completeness, while Chapter 2 briefly discusses unbiased estimation. Chapter 3 focuses on the study of moments and maximum likelihood estimators, and Chapter 4 presents bounds for the variance. In Chapter 5, topics on consistent estimator are discussed. Chapter 6 discusses Bayes, while Chapter 7 studies some more powerful tests. Lastly, Chapter 8 examines unbiased and other tests. Senior undergraduate and graduate students in statistics and mathematics, and those who have taken an introductory cou...
Picturing classical and quantum Bayesian inference
Coecke, Bob
2011-01-01
We introduce a graphical framework for Bayesian inference that is sufficiently general to accommodate not just the standard case but also recent proposals for a theory of quantum Bayesian inference wherein one considers density operators rather than probability distributions as representative of degrees of belief. The diagrammatic framework is stated in the graphical language of symmetric monoidal categories and of compact structures and Frobenius structures therein, in which Bayesian inversion boils down to transposition with respect to an appropriate compact structure. We characterize classical Bayesian inference in terms of a graphical property and demonstrate that our approach eliminates some purely conventional elements that appear in common representations thereof, such as whether degrees of belief are represented by probabilities or entropic quantities. We also introduce a quantum-like calculus wherein the Frobenius structure is noncommutative and show that it can accommodate Leifer's calculus of `cond...
Inferring causality from noisy time series data
DEFF Research Database (Denmark)
Mønster, Dan; Fusaroli, Riccardo; Tylén, Kristian;
2016-01-01
Convergent Cross-Mapping (CCM) has shown high potential to perform causal inference in the absence of models. We assess the strengths and weaknesses of the method by varying coupling strength and noise levels in coupled logistic maps. We find that CCM fails to infer accurate coupling strength...... and even causality direction in synchronized time-series and in the presence of intermediate coupling. We find that the presence of noise deterministically reduces the level of cross-mapping fidelity, while the convergence rate exhibits higher levels of robustness. Finally, we propose that controlled noise...... injections in intermediate-to-strongly coupled systems could enable more accurate causal inferences. Given the inherent noisy nature of real-world systems, our findings enable a more accurate evaluation of CCM applicability and advance suggestions on how to overcome its weaknesses....
Institute of Scientific and Technical Information of China (English)
黎藜
2012-01-01
In the sports goods,it＇s always the male＇s world.It has not only promoted the product,also spread the correct concept of female in the new advertisements ＂I Run my happiness＂ of the Grn-group.Materialization of female in many advertisements has influenced the correct concept of female.At the same time,advertisement,one of the most important mass media,has influenced not only the popular culture but also the social concept.Based on the above reasons,this paper has the following viewpoint that the advertising communication must be an important means to guide the correct concept of female,to promote equality between men and women,and to urge social justice and equality.%体育用品一向是男性的天下,但贵人鸟运动鞋＂我Run我的快乐＂系列广告中采用了女性作为广告主角,不仅仅向受众推介了商品,也传播了健康、正确的女性观念。现代广告作为大众传播的重要媒介,对大众文化的传播、社会观念的形成影响甚大,众多商品广告中女性被物化的现象直接影响着社会女性观念的建构,并在一定程度上误导了性别歧视。因此广告传播者树立起正确的女性观念,引导大众的女性观念,才能有效促进以男女平等为核心的社会公平与平等,促进社会的文明与进步。
Likelihood inference for unions of interacting discs
DEFF Research Database (Denmark)
Møller, Jesper; Helisova, K.
2010-01-01
with respect to a given marked Poisson model (i.e. a Boolean model). We show how edge effects and other complications can be handled by considering a certain conditional likelihood. Our methodology is illustrated by analysing Peter Diggle's heather data set, where we discuss the results of simulation......This is probably the first paper which discusses likelihood inference for a random set using a germ-grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point......-based maximum likelihood inference and the effect of specifying different reference Poisson models....
Variational Bayesian Inference of Line Spectra
DEFF Research Database (Denmark)
Badiu, Mihai Alin; Hansen, Thomas Lundgaard; Fleury, Bernard Henri
2017-01-01
In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid; and the coeffici......In this paper, we address the fundamental problem of line spectral estimation in a Bayesian framework. We target model order and parameter estimation via variational inference in a probabilistic model in which the frequencies are continuous-valued, i.e., not restricted to a grid...
Statistical Inference for Partially Observed Diffusion Processes
DEFF Research Database (Denmark)
Jensen, Anders Christian
-dimensional Ornstein-Uhlenbeck where one coordinate is completely unobserved. This model does not have the Markov property and it makes parameter inference more complicated. Next we take a Bayesian approach and introduce some basic Markov chain Monte Carlo methods. In chapter ve and six we describe an Bayesian method...... to perform parameter inference in multivariate diffusion models that may be only partially observed. The methodology is applied to the stochastic FitzHugh-Nagumo model and the two-dimensional Ornstein-Uhlenbeck process. Chapter seven focus on parameter identifiability in the aprtially observed Ornstein...
Inference of gene regulatory networks from time series by Tsallis entropy
Directory of Open Access Journals (Sweden)
de Oliveira Evaldo A
2011-05-01
Full Text Available Abstract Background The inference of gene regulatory networks (GRNs from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information, a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5
Directory of Open Access Journals (Sweden)
Christley Scott
2010-07-01
Full Text Available Abstract Background Stochastic effects can be important for the behavior of processes involving small population numbers, so the study of stochastic models has become an important topic in the burgeoning field of computational systems biology. However analysis techniques for stochastic models have tended to lag behind their deterministic cousins due to the heavier computational demands of the statistical approaches for fitting the models to experimental data. There is a continuing need for more effective and efficient algorithms. In this article we focus on the parameter inference problem for stochastic kinetic models of biochemical reactions given discrete time-course observations of either some or all of the molecular species. Results We propose an algorithm for inference of kinetic rate parameters based upon maximum likelihood using stochastic gradient descent (SGD. We derive a general formula for the gradient of the likelihood function given discrete time-course observations. The formula applies to any explicit functional form of the kinetic rate laws such as mass-action, Michaelis-Menten, etc. Our algorithm estimates the gradient of the likelihood function by reversible jump Markov chain Monte Carlo sampling (RJMCMC, and then gradient descent method is employed to obtain the maximum likelihood estimation of parameter values. Furthermore, we utilize flux balance analysis and show how to automatically construct reversible jump samplers for arbitrary biochemical reaction models. We provide RJMCMC sampling algorithms for both fully observed and partially observed time-course observation data. Our methods are illustrated with two examples: a birth-death model and an auto-regulatory gene network. We find good agreement of the inferred parameters with the actual parameters in both models. Conclusions The SGD method proposed in the paper presents a general framework of inferring parameters for stochastic kinetic models. The method is
Bayesian parameter inference and model selection by population annealing in systems biology.
Murakami, Yohei
2014-01-01
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named "posterior parameter ensemble". We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.
Directory of Open Access Journals (Sweden)
Wang Woei-Fuh
2008-03-01
Full Text Available Abstract Background With the abundant information produced by microarray technology, various approaches have been proposed to infer transcriptional regulatory networks. However, few approaches have studied subtle and indirect interaction such as genetic compensation, the existence of which is widely recognized although its mechanism has yet to be clarified. Furthermore, when inferring gene networks most models include only observed variables whereas latent factors, such as proteins and mRNA degradation that are not measured by microarrays, do participate in networks in reality. Results Motivated by inferring transcriptional compensation (TC interactions in yeast, a stepwise structural equation modeling algorithm (SSEM is developed. In addition to observed variables, SSEM also incorporates hidden variables to capture interactions (or regulations from latent factors. Simulated gene networks are used to determine with which of six possible model selection criteria (MSC SSEM works best. SSEM with Bayesian information criterion (BIC results in the highest true positive rates, the largest percentage of correctly predicted interactions from all existing interactions, and the highest true negative (non-existing interactions rates. Next, we apply SSEM using real microarray data to infer TC interactions among (1 small groups of genes that are synthetic sick or lethal (SSL to SGS1, and (2 a group of SSL pairs of 51 yeast genes involved in DNA synthesis and repair that are of interest. For (1, SSEM with BIC is shown to outperform three Bayesian network algorithms and a multivariate autoregressive model, checked against the results of qRT-PCR experiments. The predictions for (2 are shown to coincide with several known pathways of Sgs1 and its partners that are involved in DNA replication, recombination and repair. In addition, experimentally testable interactions of Rad27 are predicted. Conclusion SSEM is a useful tool for inferring genetic networks, and the
Inference making ability and the function of inferences in reading comprehension
Directory of Open Access Journals (Sweden)
Salih Özenici
2011-05-01
Full Text Available The aim of this study is to explain the relation between reading comprehension and inference. The main target of reading process is to create a coherent mental representation of the text, therefore it is necessary to recognize relations between different parts of the texts and to relate them to one another. During reading process, to complete the missing information in the text or to add new information is necessary. All these processes require inference making ability beyond the information in the text. When the readers use such active reading strategies as monitoring the comprehension, prediction, inferring and background knowledge, they learn a lot more from the text and understand it better. In reading comprehension, making inference is a constructive thinking process, because it is a cognitive process in order to form the meaning. When reading comprehension models are considered, it can be easily seen that linguistics elements cannot explain these processes by themselves, therefore the ability of thinking and inference making is needed. During reading process, general world knowledge is necessary to form coherent relations between sentences. Information which comes from context of the text will not be adequate to understand the text. In order to overcome this deficiency and to integrate the meanings from different sentences witch each other, it is necessary to make inference. Readers make inference in order to completely understand what the writer means, to interpret the sentences and also to form the combinations and relations between them.
Inference making ability and the function of inferences in reading comprehension
Directory of Open Access Journals (Sweden)
Salih Özenici
2011-05-01
Full Text Available The aim of this study is to explain the relation of reading comprehension and inference. The main target of reading process is to create a coherent mental representation of the text, therefore it is necessary to recognize relations between different parts of the texts and to relate them to one another. During reading process, to complete the missing information in the text or to add new information is necessary. All these processes require inference making ability beyond the information in the text. When the readers use such active reading strategies as monitoring the comprehension, prediction, inferring and background knowledge, they learn a lot more from the text and understand it better. In reading comprehension, making inference is a constructive thinking process, because it is a cognitive process in order to form the meaning. When reading comprehension models are considered, it can be easily seen that linguistics elements cannot explain these processes by themselves, therefore the ability of thinking and inference making is needed. During reading process, general world knowledge is necessary to form coherent relations between sentences. Information which comes from context of the text will not be adequate to understand the text. In order to overcome this deficiency and to integrate the meanings from different sentences witch each other, it is necessary to make inference. Readers make inference in order to completely understand what the writer means, to interpret the sentences and also to form the combinations and relations between them.
Skiena, Steven S
2008-01-01
Explaining designing algorithms, and analyzing their efficacy and efficiency, this book covers combinatorial algorithms technology, stressing design over analysis. It presents instruction on methods for designing and analyzing computer algorithms. It contains the catalog of algorithmic resources, implementations and a bibliography