G.T. Post (Thierry)
2003-01-01
textabstractThis paper discusses statistical inference on the second-order stochastic dominance (SSD) efficiency of a given portfolio relative to all portfolios formed from a set of assets. We derive the asymptotic sampling distribution of the Post test statistic for SSD efficiency. Unfortunately, a
Bayesian statistical inference
Directory of Open Access Journals (Sweden)
Bruno De Finetti
2017-04-01
Full Text Available This work was translated into English and published in the volume: Bruno De Finetti, Induction and Probability, Biblioteca di Statistica, eds. P. Monari, D. Cocchi, Clueb, Bologna, 1993.Bayesian statistical Inference is one of the last fundamental philosophical papers in which we can find the essential De Finetti's approach to the statistical inference.
An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks
Bayer, Christian
2016-01-06
In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.
Statistical inference from imperfect photon detection
International Nuclear Information System (INIS)
Audenaert, Koenraad M R; Scheel, Stefan
2009-01-01
We consider the statistical properties of photon detection with imperfect detectors that exhibit dark counts and less than unit efficiency, in the context of tomographic reconstruction. In this context, the detectors are used to implement certain positive operator-valued measures (POVMs) that would allow us to reconstruct the quantum state or quantum process under consideration. Here we look at the intermediate step of inferring outcome probabilities from measured outcome frequencies, and show how this inference can be performed in a statistically sound way in the presence of detector imperfections. Merging outcome probabilities for different sets of POVMs into a consistent quantum state picture has been treated elsewhere (Audenaert and Scheel 2009 New J. Phys. 11 023028). Single-photon pulsed measurements as well as continuous wave measurements are covered.
Geometric statistical inference
International Nuclear Information System (INIS)
Periwal, Vipul
1999-01-01
A reparametrization-covariant formulation of the inverse problem of probability is explicitly solved for finite sample sizes. The inferred distribution is explicitly continuous for finite sample size. A geometric solution of the statistical inference problem in higher dimensions is outlined
Introductory statistical inference
Mukhopadhyay, Nitis
2014-01-01
This gracefully organized text reveals the rigorous theory of probability and statistical inference in the style of a tutorial, using worked examples, exercises, figures, tables, and computer simulations to develop and illustrate concepts. Drills and boxed summaries emphasize and reinforce important ideas and special techniques.Beginning with a review of the basic concepts and methods in probability theory, moments, and moment generating functions, the author moves to more intricate topics. Introductory Statistical Inference studies multivariate random variables, exponential families of dist
Statistical Inference at Work: Statistical Process Control as an Example
Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia
2008-01-01
To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…
Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro
2016-01-01
then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve
Rohatgi, Vijay K
2003-01-01
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Statistical inference an integrated Bayesianlikelihood approach
Aitkin, Murray
2010-01-01
Filling a gap in current Bayesian theory, Statistical Inference: An Integrated Bayesian/Likelihood Approach presents a unified Bayesian treatment of parameter inference and model comparisons that can be used with simple diffuse prior specifications. This novel approach provides new solutions to difficult model comparison problems and offers direct Bayesian counterparts of frequentist t-tests and other standard statistical methods for hypothesis testing.After an overview of the competing theories of statistical inference, the book introduces the Bayes/likelihood approach used throughout. It pre
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Statistical inference based on divergence measures
Pardo, Leandro
2005-01-01
The idea of using functionals of Information Theory, such as entropies or divergences, in statistical inference is not new. However, in spite of the fact that divergence statistics have become a very good alternative to the classical likelihood ratio test and the Pearson-type statistic in discrete models, many statisticians remain unaware of this powerful approach.Statistical Inference Based on Divergence Measures explores classical problems of statistical inference, such as estimation and hypothesis testing, on the basis of measures of entropy and divergence. The first two chapters form an overview, from a statistical perspective, of the most important measures of entropy and divergence and study their properties. The author then examines the statistical analysis of discrete multivariate data with emphasis is on problems in contingency tables and loglinear models using phi-divergence test statistics as well as minimum phi-divergence estimators. The final chapter looks at testing in general populations, prese...
Parametric statistical inference basic theory and modern approaches
Zacks, Shelemyahu; Tsokos, C P
1981-01-01
Parametric Statistical Inference: Basic Theory and Modern Approaches presents the developments and modern trends in statistical inference to students who do not have advanced mathematical and statistical preparation. The topics discussed in the book are basic and common to many fields of statistical inference and thus serve as a jumping board for in-depth study. The book is organized into eight chapters. Chapter 1 provides an overview of how the theory of statistical inference is presented in subsequent chapters. Chapter 2 briefly discusses statistical distributions and their properties. Chapt
Bayer, Christian
2016-02-20
© 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
Vilanova, Pedro
2016-01-07
In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.
An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks
Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro
2016-01-01
In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem
Inferring Demographic History Using Two-Locus Statistics.
Ragsdale, Aaron P; Gutenkunst, Ryan N
2017-06-01
Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Probability and Statistical Inference
Prosper, Harrison B.
2006-01-01
These lectures introduce key concepts in probability and statistical inference at a level suitable for graduate students in particle physics. Our goal is to paint as vivid a picture as possible of the concepts covered.
On quantum statistical inference
Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.
2003-01-01
Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, developments in the theory of quantum measurements have
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Statistical learning and selective inference.
Taylor, Jonathan; Tibshirani, Robert J
2015-06-23
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Data-driven inference for the spatial scan statistic
Directory of Open Access Journals (Sweden)
Duczmal Luiz H
2011-08-01
Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
International Conference on Trends and Perspectives in Linear Statistical Inference
Rosen, Dietrich
2018-01-01
This volume features selected contributions on a variety of topics related to linear statistical inference. The peer-reviewed papers from the International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat 2016) held in Istanbul, Turkey, 22-25 August 2016, cover topics in both theoretical and applied statistics, such as linear models, high-dimensional statistics, computational statistics, the design of experiments, and multivariate analysis. The book is intended for statisticians, Ph.D. students, and professionals who are interested in statistical inference. .
Statistical inference via fiducial methods
Salomé, Diemer
1998-01-01
In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary
Statistical inference a short course
Panik, Michael J
2012-01-01
A concise, easily accessible introduction to descriptive and inferential techniques Statistical Inference: A Short Course offers a concise presentation of the essentials of basic statistics for readers seeking to acquire a working knowledge of statistical concepts, measures, and procedures. The author conducts tests on the assumption of randomness and normality, provides nonparametric methods when parametric approaches might not work. The book also explores how to determine a confidence interval for a population median while also providing coverage of ratio estimation, randomness, and causal
Nonparametric predictive inference in statistical process control
Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.
2000-01-01
New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
Energy Technology Data Exchange (ETDEWEB)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; the analysis of variance; quality control procedures; and linear regression analysis.
Reasoning about Informal Statistical Inference: One Statistician's View
Rossman, Allan J.
2008-01-01
This paper identifies key concepts and issues associated with the reasoning of informal statistical inference. I focus on key ideas of inference that I think all students should learn, including at secondary level as well as tertiary. I argue that a fundamental component of inference is to go beyond the data at hand, and I propose that statistical…
Models for probability and statistical inference theory and applications
Stapleton, James H
2007-01-01
This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...
Statistical inference for financial engineering
Taniguchi, Masanobu; Ogata, Hiroaki; Taniai, Hiroyuki
2014-01-01
This monograph provides the fundamentals of statistical inference for financial engineering and covers some selected methods suitable for analyzing financial time series data. In order to describe the actual financial data, various stochastic processes, e.g. non-Gaussian linear processes, non-linear processes, long-memory processes, locally stationary processes etc. are introduced and their optimal estimation is considered as well. This book also includes several statistical approaches, e.g., discriminant analysis, the empirical likelihood method, control variate method, quantile regression, realized volatility etc., which have been recently developed and are considered to be powerful tools for analyzing the financial data, establishing a new bridge between time series and financial engineering. This book is well suited as a professional reference book on finance, statistics and statistical financial engineering. Readers are expected to have an undergraduate-level knowledge of statistics.
Principles for statistical inference on big spatio-temporal data from climate models
Castruccio, Stefano; Genton, Marc G.
2018-01-01
The vast increase in size of modern spatio-temporal datasets has prompted statisticians working in environmental applications to develop new and efficient methodologies that are still able to achieve inference for nontrivial models within an affordable time. Climate model outputs push the limits of inference for Gaussian processes, as their size can easily be larger than 10 billion data points. Drawing from our experience in a set of previous work, we provide three principles for the statistical analysis of such large datasets that leverage recent methodological and computational advances. These principles emphasize the need of embedding distributed and parallel computing in the inferential process.
Principles for statistical inference on big spatio-temporal data from climate models
Castruccio, Stefano
2018-02-24
The vast increase in size of modern spatio-temporal datasets has prompted statisticians working in environmental applications to develop new and efficient methodologies that are still able to achieve inference for nontrivial models within an affordable time. Climate model outputs push the limits of inference for Gaussian processes, as their size can easily be larger than 10 billion data points. Drawing from our experience in a set of previous work, we provide three principles for the statistical analysis of such large datasets that leverage recent methodological and computational advances. These principles emphasize the need of embedding distributed and parallel computing in the inferential process.
Statistical inference an integrated approach
Migon, Helio S; Louzada, Francisco
2014-01-01
Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the bookElements of Inference Common statistical modelsLikelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theoryBayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniquesAsymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testingBayesian hypothesis testing Hypothesis testing and confidence intervalsAsymptotic tests Prediction...
Statistical inferences for bearings life using sudden death test
Directory of Open Access Journals (Sweden)
Morariu Cristin-Olimpiu
2017-01-01
Full Text Available In this paper we propose a calculus method for reliability indicators estimation and a complete statistical inferences for three parameters Weibull distribution of bearings life. Using experimental values regarding the durability of bearings tested on stands by the sudden death tests involves a series of particularities of the estimation using maximum likelihood method and statistical inference accomplishment. The paper detailing these features and also provides an example calculation.
Ignorability in Statistical and Probabilistic Inference
DEFF Research Database (Denmark)
Jaeger, Manfred
2005-01-01
When dealing with incomplete data in statistical learning, or incomplete observations in probabilistic inference, one needs to distinguish the fact that a certain event is observed from the fact that the observed event has happened. Since the modeling and computational complexities entailed...
Thermodynamics of statistical inference by cells.
Lang, Alex H; Fisher, Charles K; Mora, Thierry; Mehta, Pankaj
2014-10-03
The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.
On Quantum Statistical Inference, II
Barndorff-Nielsen, O. E.; Gill, R. D.; Jupp, P. E.
2003-01-01
Interest in problems of statistical inference connected to measurements of quantum systems has recently increased substantially, in step with dramatic new developments in experimental techniques for studying small quantum systems. Furthermore, theoretical developments in the theory of quantum measurements have brought the basic mathematical framework for the probability calculations much closer to that of classical probability theory. The present paper reviews this field and proposes and inte...
All of statistics a concise course in statistical inference
Wasserman, Larry
2004-01-01
This book is for people who want to learn probability and statistics quickly It brings together many of the main ideas in modern statistics in one place The book is suitable for students and researchers in statistics, computer science, data mining and machine learning This book covers a much wider range of topics than a typical introductory text on mathematical statistics It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses The reader is assumed to know calculus and a little linear algebra No previous knowledge of probability and statistics is required The text can be used at the advanced undergraduate and graduate level Larry Wasserman is Professor of Statistics at Carnegie Mellon University He is also a member of the Center for Automated Learning and Discovery in the School of Computer Science His research areas include nonparametric inference, asymptotic theory, causality, and applications to astrophysics, bi...
Statistical inference for imperfect maintenance models with missing data
International Nuclear Information System (INIS)
Dijoux, Yann; Fouladirad, Mitra; Nguyen, Dinh Tuan
2016-01-01
The paper considers complex industrial systems with incomplete maintenance history. A corrective maintenance is performed after the occurrence of a failure and its efficiency is assumed to be imperfect. In maintenance analysis, the databases are not necessarily complete. Specifically, the observations are assumed to be window-censored. This situation arises relatively frequently after the purchase of a second-hand unit or in the absence of maintenance record during the burn-in phase. The joint assessment of the wear-out of the system and the maintenance efficiency is investigated under missing data. A review along with extensions of statistical inference procedures from an observation window are proposed in the case of perfect and minimal repair using the renewal and Poisson theories, respectively. Virtual age models are employed to model imperfect repair. In this framework, new estimation procedures are developed. In particular, maximum likelihood estimation methods are derived for the most classical virtual age models. The benefits of the new estimation procedures are highlighted by numerical simulations and an application to a real data set. - Highlights: • New estimation procedures for window-censored observations and imperfect repair. • Extensions of inference methods for perfect and minimal repair with missing data. • Overview of maximum likelihood method with complete and incomplete observations. • Benefits of the new procedures highlighted by simulation studies and real application.
Subjective randomness as statistical inference.
Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B
2018-06-01
Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.
Inference and the Introductory Statistics Course
Pfannkuch, Maxine; Regan, Matt; Wild, Chris; Budgett, Stephanie; Forbes, Sharleen; Harraway, John; Parsonage, Ross
2011-01-01
This article sets out some of the rationale and arguments for making major changes to the teaching and learning of statistical inference in introductory courses at our universities by changing from a norm-based, mathematical approach to more conceptually accessible computer-based approaches. The core problem of the inferential argument with its…
Statistical causal inferences and their applications in public health research
Wu, Pan; Chen, Ding-Geng
2016-01-01
This book compiles and presents new developments in statistical causal inference. The accompanying data and computer programs are publicly available so readers may replicate the model development and data analysis presented in each chapter. In this way, methodology is taught so that readers may implement it directly. The book brings together experts engaged in causal inference research to present and discuss recent issues in causal inference methodological development. This is also a timely look at causal inference applied to scenarios that range from clinical trials to mediation and public health research more broadly. In an academic setting, this book will serve as a reference and guide to a course in causal inference at the graduate level (Master's or Doctorate). It is particularly relevant for students pursuing degrees in Statistics, Biostatistics and Computational Biology. Researchers and data analysts in public health and biomedical research will also find this book to be an important reference.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Efficient algorithms for conditional independence inference
Czech Academy of Sciences Publication Activity Database
Bouckaert, R.; Hemmecke, R.; Lindner, S.; Studený, Milan
2010-01-01
Roč. 11, č. 1 (2010), s. 3453-3479 ISSN 1532-4435 R&D Projects: GA ČR GA201/08/0539; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : conditional independence inference * linear programming approach Subject RIV: BA - General Mathematics Impact factor: 2.949, year: 2010 http://library.utia.cas.cz/separaty/2010/MTR/studeny-efficient algorithms for conditional independence inference.pdf
Statistical inference on residual life
Jeong, Jong-Hyeon
2014-01-01
This is a monograph on the concept of residual life, which is an alternative summary measure of time-to-event data, or survival data. The mean residual life has been used for many years under the name of life expectancy, so it is a natural concept for summarizing survival or reliability data. It is also more interpretable than the popular hazard function, especially for communications between patients and physicians regarding the efficacy of a new drug in the medical field. This book reviews existing statistical methods to infer the residual life distribution. The review and comparison includes existing inference methods for mean and median, or quantile, residual life analysis through medical data examples. The concept of the residual life is also extended to competing risks analysis. The targeted audience includes biostatisticians, graduate students, and PhD (bio)statisticians. Knowledge in survival analysis at an introductory graduate level is advisable prior to reading this book.
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Statistical inference and visualization in scale-space for spatially dependent images
Vaughan, Amy
2012-03-01
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests for spatially dependent images. The spatial SiZer utilizes a family of kernel estimates of the image and provides not only exploratory data analysis but also statistical inference with spatial correlation taken into account. It is also capable of comparing the observed image with a specific null model being tested by adjusting the statistical inference using an assumed covariance structure. Pixel locations having statistically significant differences between the image and a given null model are highlighted by arrows. The spatial SiZer is compared with the existing independent SiZer via the analysis of simulated data with and without signal on both planar and spherical domains. We apply the spatial SiZer method to the decadal temperature change over some regions of the Earth. © 2011 The Korean Statistical Society.
Statistical Inference and Patterns of Inequality in the Global North
Moran, Timothy Patrick
2006-01-01
Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory
2016-05-12
Distribution Unlimited UU UU UU UU 12-05-2016 15-May-2014 14-Feb-2015 Final Report: Statistical Inference on Memory Structure of Processes and Its Applications ...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics ; time series; Markov chains; random...journals: Final Report: Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory Report Title Three areas
GWIS: Genome-Wide Inferred Statistics for Functions of Multiple Phenotypes
Nieuwboer, H.A.; Pool, R.; Dolan, C.V.; Boomsma, D.I.; Nivard, M.G.
2016-01-01
Here we present a method of genome-wide inferred study (GWIS) that provides an approximation of genome-wide association study (GWAS) summary statistics for a variable that is a function of phenotypes for which GWAS summary statistics, phenotypic means, and covariances are available. A GWIS can be
Increased Statistical Efficiency in a Lognormal Mean Model
Directory of Open Access Journals (Sweden)
Grant H. Skrepnek
2014-01-01
Full Text Available Within the context of clinical and other scientific research, a substantial need exists for an accurate determination of the point estimate in a lognormal mean model, given that highly skewed data are often present. As such, logarithmic transformations are often advocated to achieve the assumptions of parametric statistical inference. Despite this, existing approaches that utilize only a sample’s mean and variance may not necessarily yield the most efficient estimator. The current investigation developed and tested an improved efficient point estimator for a lognormal mean by capturing more complete information via the sample’s coefficient of variation. Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 absolute percentage points above the efficient estimator presented by Shen and colleagues (2006. The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.
Efficient statistical tests to compare Youden index: accounting for contingency correlation.
Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan
2015-04-30
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
Efficient Bayesian inference for ARFIMA processes
Graves, T.; Gramacy, R. B.; Franzke, C. L. E.; Watkins, N. W.
2015-03-01
Many geophysical quantities, like atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long-range dependence (LRD). LRD means that these quantities experience non-trivial temporal memory, which potentially enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LRD. In this paper we present a modern and systematic approach to the inference of LRD. Rather than Mandelbrot's fractional Gaussian noise, we use the more flexible Autoregressive Fractional Integrated Moving Average (ARFIMA) model which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LRD, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g. short memory effects) can be integrated over in order to focus on long memory parameters, and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data, with favorable comparison to the standard estimators.
Fisher information and statistical inference for phase-type distributions
DEFF Research Database (Denmark)
Bladt, Mogens; Esparza, Luz Judith R; Nielsen, Bo Friis
2011-01-01
This paper is concerned with statistical inference for both continuous and discrete phase-type distributions. We consider maximum likelihood estimation, where traditionally the expectation-maximization (EM) algorithm has been employed. Certain numerical aspects of this method are revised and we...
Saputra, K. V. I.; Cahyadi, L.; Sembiring, U. A.
2018-01-01
Start in this paper, we assess our traditional elementary statistics education and also we introduce elementary statistics with simulation-based inference. To assess our statistical class, we adapt the well-known CAOS (Comprehensive Assessment of Outcomes in Statistics) test that serves as an external measure to assess the student’s basic statistical literacy. This test generally represents as an accepted measure of statistical literacy. We also introduce a new teaching method on elementary statistics class. Different from the traditional elementary statistics course, we will introduce a simulation-based inference method to conduct hypothesis testing. From the literature, it has shown that this new teaching method works very well in increasing student’s understanding of statistics.
Vali Ahmadi, Mohammad; Doostparast, Mahdi; Ahmadi, Jafar
2015-04-01
In manufacturing industries, the lifetime of an item is usually characterised by a random variable X and considered to be satisfactory if X exceeds a given lower lifetime limit L. The probability of a satisfactory item is then ηL := P(X ≥ L), called conforming rate. In industrial companies, however, the lifetime performance index, proposed by Montgomery and denoted by CL, is widely used as a process capability index instead of the conforming rate. Assuming a parametric model for the random variable X, we show that there is a connection between the conforming rate and the lifetime performance index. Consequently, the statistical inferences about ηL and CL are equivalent. Hence, we restrict ourselves to statistical inference for CL based on generalised order statistics, which contains several ordered data models such as usual order statistics, progressively Type-II censored data and records. Various point and interval estimators for the parameter CL are obtained and optimal critical regions for the hypothesis testing problems concerning CL are proposed. Finally, two real data-sets on the lifetimes of insulating fluid and ball bearings, due to Nelson (1982) and Caroni (2002), respectively, and a simulated sample are analysed.
Bayesian Information Criterion as an Alternative way of Statistical Inference
Directory of Open Access Journals (Sweden)
Nadejda Yu. Gubanova
2012-05-01
Full Text Available The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.
Practical Statistics for LHC Physicists: Bayesian Inference (3/3)
CERN. Geneva
2015-01-01
These lectures cover those principles and practices of statistics that are most relevant for work at the LHC. The first lecture discusses the basic ideas of descriptive statistics, probability and likelihood. The second lecture covers the key ideas in the frequentist approach, including confidence limits, profile likelihoods, p-values, and hypothesis testing. The third lecture covers inference in the Bayesian approach. Throughout, real-world examples will be used to illustrate the practical application of the ideas. No previous knowledge is assumed.
Practical Statistics for LHC Physicists: Frequentist Inference (2/3)
CERN. Geneva
2015-01-01
These lectures cover those principles and practices of statistics that are most relevant for work at the LHC. The first lecture discusses the basic ideas of descriptive statistics, probability and likelihood. The second lecture covers the key ideas in the frequentist approach, including confidence limits, profile likelihoods, p-values, and hypothesis testing. The third lecture covers inference in the Bayesian approach. Throughout, real-world examples will be used to illustrate the practical application of the ideas. No previous knowledge is assumed.
Efficient Exact Inference With Loss Augmented Objective in Structured Learning.
Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert
2016-08-19
Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.
Large scale statistical inference of signaling pathways from RNAi and microarray data
Directory of Open Access Journals (Sweden)
Poustka Annemarie
2007-10-01
Full Text Available Abstract Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.
Statistical inference for a class of multivariate negative binomial distributions
DEFF Research Database (Denmark)
Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Energy Technology Data Exchange (ETDEWEB)
Ducasse, Q. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); CEA-Cadarache, DEN/DER/SPRC/LEPh, 13108 Saint Paul lez Durance (France); Jurado, B., E-mail: jurado@cenbg.in2p3.fr [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); Mathieu, L.; Marini, P. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France); Morillon, B. [CEA DAM DIF, 91297 Arpajon (France); Aiche, M.; Tsekhanovich, I. [CENBG, CNRS/IN2P3-Université de Bordeaux, Chemin du Solarium B.P. 120, 33175 Gradignan (France)
2016-08-01
The study of transfer-induced gamma-decay probabilities is very useful for understanding the surrogate-reaction method and, more generally, for constraining statistical-model calculations. One of the main difficulties in the measurement of gamma-decay probabilities is the determination of the gamma-cascade detection efficiency. In Boutoux et al. (2013) [10] we developed the EXtrapolated Efficiency Method (EXEM), a new method to measure this quantity. In this work, we have applied, for the first time, the EXEM to infer the gamma-cascade detection efficiency in the actinide region. In particular, we have considered the {sup 238}U(d,p){sup 239}U and {sup 238}U({sup 3}He,d){sup 239}Np reactions. We have performed Hauser–Feshbach calculations to interpret our results and to verify the hypothesis on which the EXEM is based. The determination of fission and gamma-decay probabilities of {sup 239}Np below the neutron separation energy allowed us to validate the EXEM.
International Nuclear Information System (INIS)
Ducasse, Q.; Jurado, B.; Mathieu, L.; Marini, P.; Morillon, B.; Aiche, M.; Tsekhanovich, I.
2016-01-01
The study of transfer-induced gamma-decay probabilities is very useful for understanding the surrogate-reaction method and, more generally, for constraining statistical-model calculations. One of the main difficulties in the measurement of gamma-decay probabilities is the determination of the gamma-cascade detection efficiency. In Boutoux et al. (2013) [10] we developed the EXtrapolated Efficiency Method (EXEM), a new method to measure this quantity. In this work, we have applied, for the first time, the EXEM to infer the gamma-cascade detection efficiency in the actinide region. In particular, we have considered the "2"3"8U(d,p)"2"3"9U and "2"3"8U("3He,d)"2"3"9Np reactions. We have performed Hauser–Feshbach calculations to interpret our results and to verify the hypothesis on which the EXEM is based. The determination of fission and gamma-decay probabilities of "2"3"9Np below the neutron separation energy allowed us to validate the EXEM.
Statistical Inference for a Class of Multivariate Negative Binomial Distributions
DEFF Research Database (Denmark)
Rubak, Ege H.; Møller, Jesper; McCullagh, Peter
This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...
The Heuristic Value of p in Inductive Statistical Inference
Directory of Open Access Journals (Sweden)
Joachim I. Krueger
2017-06-01
Full Text Available Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis Significance Testing ([NH]ST is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Statistical inference for remote sensing-based estimates of net deforestation
Ronald E. McRoberts; Brian F. Walters
2012-01-01
Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Statistical inference and visualization in scale-space for spatially dependent images
Vaughan, Amy; Jun, Mikyoung; Park, Cheolwoo
2012-01-01
SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical scale-space visualization tool that allows for statistical inferences. In this paper we develop a spatial SiZer for finding significant features and conducting goodness-of-fit tests
Statistical Inference for Porous Materials using Persistent Homology.
Energy Technology Data Exchange (ETDEWEB)
Moon, Chul [Univ. of Georgia, Athens, GA (United States); Heath, Jason E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mitchell, Scott A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-12-01
We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability, anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G
2012-10-02
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Statistical Models for Inferring Vegetation Composition from Fossil Pollen
Paciorek, C.; McLachlan, J. S.; Shang, Z.
2011-12-01
Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.
Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie
2013-01-01
Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Inference with constrained hidden Markov models in PRISM
DEFF Research Database (Denmark)
Christiansen, Henning; Have, Christian Theil; Lassen, Ole Torp
2010-01-01
A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. De......_different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment.......A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference...
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.
Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping
2015-06-07
Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.
Directory of Open Access Journals (Sweden)
Jean-Michel eHupé
2015-02-01
Full Text Available Published studies using functional and structural MRI include many errors in the way data are analyzed and conclusions reported. This was observed when working on a comprehensive review of the neural bases of synesthesia, but these errors are probably endemic to neuroimaging studies. All studies reviewed had based their conclusions using Null Hypothesis Significance Tests (NHST. NHST have yet been criticized since their inception because they are more appropriate for taking decisions related to a Null hypothesis (like in manufacturing than for making inferences about behavioral and neuronal processes. Here I focus on a few key problems of NHST related to brain imaging techniques, and explain why or when we should not rely on significance tests. I also observed that, often, the ill-posed logic of NHST was even not correctly applied, and describe what I identified as common mistakes or at least problematic practices in published papers, in light of what could be considered as the very basics of statistical inference. MRI statistics also involve much more complex issues than standard statistical inference. Analysis pipelines vary a lot between studies, even for those using the same software, and there is no consensus which pipeline is the best. I propose a synthetic view of the logic behind the possible methodological choices, and warn against the usage and interpretation of two statistical methods popular in brain imaging studies, the false discovery rate (FDR procedure and permutation tests. I suggest that current models for the analysis of brain imaging data suffer from serious limitations and call for a revision taking into account the new statistics (confidence intervals logic.
Statistical inference for discrete-time samples from affine stochastic delay differential equations
DEFF Research Database (Denmark)
Küchler, Uwe; Sørensen, Michael
2013-01-01
Statistical inference for discrete time observations of an affine stochastic delay differential equation is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy to calculate in practice. A more general class of prediction-based estimating functions is investigated...
Recent Advances in System Reliability Signatures, Multi-state Systems and Statistical Inference
Frenkel, Ilia
2012-01-01
Recent Advances in System Reliability discusses developments in modern reliability theory such as signatures, multi-state systems and statistical inference. It describes the latest achievements in these fields, and covers the application of these achievements to reliability engineering practice. The chapters cover a wide range of new theoretical subjects and have been written by leading experts in reliability theory and its applications. The topics include: concepts and different definitions of signatures (D-spectra), their properties and applications to reliability of coherent systems and network-type structures; Lz-transform of Markov stochastic process and its application to multi-state system reliability analysis; methods for cost-reliability and cost-availability analysis of multi-state systems; optimal replacement and protection strategy; and statistical inference. Recent Advances in System Reliability presents many examples to illustrate the theoretical results. Real world multi-state systems...
Terminal-Dependent Statistical Inference for the FBSDEs Models
Directory of Open Access Journals (Sweden)
Yunquan Song
2014-01-01
Full Text Available The original stochastic differential equations (OSDEs and forward-backward stochastic differential equations (FBSDEs are often used to model complex dynamic process that arise in financial, ecological, and many other areas. The main difference between OSDEs and FBSDEs is that the latter is designed to depend on a terminal condition, which is a key factor in some financial and ecological circumstances. It is interesting but challenging to estimate FBSDE parameters from noisy data and the terminal condition. However, to the best of our knowledge, the terminal-dependent statistical inference for such a model has not been explored in the existing literature. We proposed a nonparametric terminal control variables estimation method to address this problem. The reason why we use the terminal control variables is that the newly proposed inference procedures inherit the terminal-dependent characteristic. Through this new proposed method, the estimators of the functional coefficients of the FBSDEs model are obtained. The asymptotic properties of the estimators are also discussed. Simulation studies show that the proposed method gives satisfying estimates for the FBSDE parameters from noisy data and the terminal condition. A simulation is performed to test the feasibility of our method.
Statistical inference involving binomial and negative binomial parameters.
García-Pérez, Miguel A; Núñez-Antón, Vicente
2009-05-01
Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.
Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic
Bekker, P.; Kleibergen, F.R.
2001-01-01
The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,
Finite-sample instrumental variables inference using an asymptotically pivotal statistic
Bekker, Paul A.; Kleibergen, Frank
2001-01-01
The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,
A model independent safeguard against background mismodeling for statistical inference
Energy Technology Data Exchange (ETDEWEB)
Priel, Nadav; Landsman, Hagar; Manfredini, Alessandro; Budnik, Ranny [Department of Particle Physics and Astrophysics, Weizmann Institute of Science, Herzl St. 234, Rehovot (Israel); Rauch, Ludwig, E-mail: nadav.priel@weizmann.ac.il, E-mail: rauch@mpi-hd.mpg.de, E-mail: hagar.landsman@weizmann.ac.il, E-mail: alessandro.manfredini@weizmann.ac.il, E-mail: ran.budnik@weizmann.ac.il [Teilchen- und Astroteilchenphysik, Max-Planck-Institut für Kernphysik, Saupfercheckweg 1, 69117 Heidelberg (Germany)
2017-05-01
We propose a safeguard procedure for statistical inference that provides universal protection against mismodeling of the background. The method quantifies and incorporates the signal-like residuals of the background model into the likelihood function, using information available in a calibration dataset. This prevents possible false discovery claims that may arise through unknown mismodeling, and corrects the bias in limit setting created by overestimated or underestimated background. We demonstrate how the method removes the bias created by an incomplete background model using three realistic case studies.
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie
2017-12-01
To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
Goyal, Ravi; De Gruttola, Victor
2018-01-30
Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Statistical inference for stochastic processes
National Research Council Canada - National Science Library
Basawa, Ishwar V; Prakasa Rao, B. L. S
1980-01-01
The aim of this monograph is to attempt to reduce the gap between theory and applications in the area of stochastic modelling, by directing the interest of future researchers to the inference aspects...
Serang, Oliver
2014-01-01
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234
Efficient modeling of vector hysteresis using fuzzy inference systems
International Nuclear Information System (INIS)
Adly, A.A.; Abd-El-Hafiz, S.K.
2008-01-01
Vector hysteresis models have always been regarded as important tools to determine which multi-dimensional magnetic field-media interactions may be predicted. In the past, considerable efforts have been focused on mathematical modeling methodologies of vector hysteresis. This paper presents an efficient approach based upon fuzzy inference systems for modeling vector hysteresis. Computational efficiency of the proposed approach stems from the fact that the basic non-local memory Preisach-type hysteresis model is approximated by a local memory model. The proposed computational low-cost methodology can be easily integrated in field calculation packages involving massive multi-dimensional discretizations. Details of the modeling methodology and its experimental testing are presented
Inference of missing data and chemical model parameters using experimental statistics
Casey, Tiernan; Najm, Habib
2017-11-01
A method for determining the joint parameter density of Arrhenius rate expressions through the inference of missing experimental data is presented. This approach proposes noisy hypothetical data sets from target experiments and accepts those which agree with the reported statistics, in the form of nominal parameter values and their associated uncertainties. The data exploration procedure is formalized using Bayesian inference, employing maximum entropy and approximate Bayesian computation methods to arrive at a joint density on data and parameters. The method is demonstrated in the context of reactions in the H2-O2 system for predictive modeling of combustion systems of interest. Work supported by the US DOE BES CSGB. Sandia National Labs is a multimission lab managed and operated by Nat. Technology and Eng'g Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell Intl, for the US DOE NCSA under contract DE-NA-0003525.
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael
2009-01-01
Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....
Designs and Methods for Association Studies and Population Size Inference in Statistical Genetics
DEFF Research Database (Denmark)
Waltoft, Berit Lindum
method provides a simple goodness of t test by comparing the observed SFS with the expected SFS under a given model of population size changes. By the use of Monte Carlo estimation the expected time between coalescent events can be estimated and the expected SFS can thereby be evaluated. Using......). The OR is interpreted as the eect of an exposure on the probability of being diseased at the end of follow-up, while the interpretation of the IRR is the eect of an exposure on the probability of becoming diseased. Through a simulation study, the OR from a classical case-control study is shown to be an inconsistent...... the classical chi-square statistics we are able to infer single parameter models. Multiple parameter models, e.g. multiple epochs, are harder to identify. By introducing the inference of population size back in time as an inverse problem, the second procedure applies the theory of smoothing splines to infer...
Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.
Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B
2018-05-01
Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.
Statistical inference using weak chaos and infinite memory
International Nuclear Information System (INIS)
Welling, Max; Chen Yutian
2010-01-01
We describe a class of deterministic weakly chaotic dynamical systems with infinite memory. These 'herding systems' combine learning and inference into one algorithm, where moments or data-items are converted directly into an arbitrarily long sequence of pseudo-samples. This sequence has infinite range correlations and as such is highly structured. We show that its information content, as measured by sub-extensive entropy, can grow as fast as K log T, which is faster than the usual 1/2 K log T for exchangeable sequences generated by random posterior sampling from a Bayesian model. In one dimension we prove that herding sequences are equivalent to Sturmian sequences which have complexity exactly log(T + 1). More generally, we advocate the application of the rich theoretical framework around nonlinear dynamical systems, chaos theory and fractal geometry to statistical learning.
Statistical inference using weak chaos and infinite memory
Energy Technology Data Exchange (ETDEWEB)
Welling, Max; Chen Yutian, E-mail: welling@ics.uci.ed, E-mail: yutian.chen@uci.ed [Donald Bren School of Information and Computer Science, University of California Irvine CA 92697-3425 (United States)
2010-06-01
We describe a class of deterministic weakly chaotic dynamical systems with infinite memory. These 'herding systems' combine learning and inference into one algorithm, where moments or data-items are converted directly into an arbitrarily long sequence of pseudo-samples. This sequence has infinite range correlations and as such is highly structured. We show that its information content, as measured by sub-extensive entropy, can grow as fast as K log T, which is faster than the usual 1/2 K log T for exchangeable sequences generated by random posterior sampling from a Bayesian model. In one dimension we prove that herding sequences are equivalent to Sturmian sequences which have complexity exactly log(T + 1). More generally, we advocate the application of the rich theoretical framework around nonlinear dynamical systems, chaos theory and fractal geometry to statistical learning.
A review of statistical modelling and inference for electrical capacitance tomography
International Nuclear Information System (INIS)
Watzenig, D; Fox, C
2009-01-01
Bayesian inference applied to electrical capacitance tomography, or other inverse problems, provides a framework for quantified model fitting. Estimation of unknown quantities of interest is based on the posterior distribution over the unknown permittivity and unobserved data, conditioned on measured data. Key components in this framework are a prior model requiring a parametrization of the permittivity and a normalizable prior density, the likelihood function that follows from a decomposition of measurements into deterministic and random parts, and numerical simulation of noise-free measurements. Uncertainty in recovered permittivities arises from measurement noise, measurement sensitivities, model inaccuracy, discretization error and a priori uncertainty; each of these sources may be accounted for and in some cases taken advantage of. Estimates or properties of the permittivity can be calculated as summary statistics over the posterior distribution using Markov chain Monte Carlo sampling. Several modified Metropolis–Hastings algorithms are available to speed up this computationally expensive step. The bias in estimates that is induced by the representation of unknowns may be avoided by design of a prior density. The differing purpose of applications means that there is no single 'Bayesian' analysis. Further, differing solutions will use different modelling choices, perhaps influenced by the need for computational efficiency. We solve a reference problem of recovering the unknown shape of a constant permittivity inclusion in an otherwise uniform background. Statistics calculated in the reference problem give accurate estimates of inclusion area, and other properties, when using measured data. The alternatives available for structuring inferential solutions in other applications are clarified by contrasting them against the choice we made in our reference solution. (topical review)
Orhan, A Emin; Ma, Wei Ji
2017-07-26
Animals perform near-optimal probabilistic inference in a wide range of psychophysical tasks. Probabilistic inference requires trial-to-trial representation of the uncertainties associated with task variables and subsequent use of this representation. Previous work has implemented such computations using neural networks with hand-crafted and task-dependent operations. We show that generic neural networks trained with a simple error-based learning rule perform near-optimal probabilistic inference in nine common psychophysical tasks. In a probabilistic categorization task, error-based learning in a generic network simultaneously explains a monkey's learning curve and the evolution of qualitative aspects of its choice behavior. In all tasks, the number of neurons required for a given level of performance grows sublinearly with the input population size, a substantial improvement on previous implementations of probabilistic inference. The trained networks develop a novel sparsity-based probabilistic population code. Our results suggest that probabilistic inference emerges naturally in generic neural networks trained with error-based learning rules.Behavioural tasks often require probability distributions to be inferred about task specific variables. Here, the authors demonstrate that generic neural networks can be trained using a simple error-based learning rule to perform such probabilistic computations efficiently without any need for task specific operations.
Probability, statistics, and computational science.
Beerenwinkel, Niko; Siebourg, Juliane
2012-01-01
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
Hincks, Ian; Granade, Christopher; Cory, David G.
2018-01-01
The analysis of photon count data from the standard nitrogen vacancy (NV) measurement process is treated as a statistical inference problem. This has applications toward gaining better and more rigorous error bars for tasks such as parameter estimation (e.g. magnetometry), tomography, and randomized benchmarking. We start by providing a summary of the standard phenomenological model of the NV optical process in terms of Lindblad jump operators. This model is used to derive random variables describing emitted photons during measurement, to which finite visibility, dark counts, and imperfect state preparation are added. NV spin-state measurement is then stated as an abstract statistical inference problem consisting of an underlying biased coin obstructed by three Poisson rates. Relevant frequentist and Bayesian estimators are provided, discussed, and quantitatively compared. We show numerically that the risk of the maximum likelihood estimator is well approximated by the Cramér-Rao bound, for which we provide a simple formula. Of the estimators, we in particular promote the Bayes estimator, owing to its slightly better risk performance, and straightforward error propagation into more complex experiments. This is illustrated on experimental data, where quantum Hamiltonian learning is performed and cross-validated in a fully Bayesian setting, and compared to a more traditional weighted least squares fit.
Statistical Inference on the Canadian Middle Class
Directory of Open Access Journals (Sweden)
Russell Davidson
2018-03-01
Full Text Available Conventional wisdom says that the middle classes in many developed countries have recently suffered losses, in terms of both the share of the total population belonging to the middle class, and also their share in total income. Here, distribution-free methods are developed for inference on these shares, by means of deriving expressions for their asymptotic variances of sample estimates, and the covariance of the estimates. Asymptotic inference can be undertaken based on asymptotic normality. Bootstrap inference can be expected to be more reliable, and appropriate bootstrap procedures are proposed. As an illustration, samples of individual earnings drawn from Canadian census data are used to test various hypotheses about the middle-class shares, and confidence intervals for them are computed. It is found that, for the earlier censuses, sample sizes are large enough for asymptotic and bootstrap inference to be almost identical, but that, in the twenty-first century, the bootstrap fails on account of a strange phenomenon whereby many presumably different incomes in the data are rounded to one and the same value. Another difference between the centuries is the appearance of heavy right-hand tails in the income distributions of both men and women.
Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F
2015-01-01
This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...
Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D
2017-12-01
Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for
Statistical perspectives on inverse problems
DEFF Research Database (Denmark)
Andersen, Kim Emil
of the interior of an object from electrical boundary measurements. One part of this thesis concerns statistical approaches for solving, possibly non-linear, inverse problems. Thus inverse problems are recasted in a form suitable for statistical inference. In particular, a Bayesian approach for regularisation...... problem is given in terms of probability distributions. Posterior inference is obtained by Markov chain Monte Carlo methods and new, powerful simulation techniques based on e.g. coupled Markov chains and simulated tempering is developed to improve the computational efficiency of the overall simulation......Inverse problems arise in many scientific disciplines and pertain to situations where inference is to be made about a particular phenomenon from indirect measurements. A typical example, arising in diffusion tomography, is the inverse boundary value problem for non-invasive reconstruction...
Statistical modelling for ship propulsion efficiency
DEFF Research Database (Denmark)
Petersen, Jóan Petur; Jacobsen, Daniel J.; Winther, Ole
2012-01-01
This paper presents a state-of-the-art systems approach to statistical modelling of fuel efficiency in ship propulsion, and also a novel and publicly available data set of high quality sensory data. Two statistical model approaches are investigated and compared: artificial neural networks...
Statistical inference of seabed sound-speed structure in the Gulf of Oman Basin.
Sagers, Jason D; Knobles, David P
2014-06-01
Addressed is the statistical inference of the sound-speed depth profile of a thick soft seabed from broadband sound propagation data recorded in the Gulf of Oman Basin in 1977. The acoustic data are in the form of time series signals recorded on a sparse vertical line array and generated by explosive sources deployed along a 280 km track. The acoustic data offer a unique opportunity to study a deep-water bottom-limited thickly sedimented environment because of the large number of time series measurements, very low seabed attenuation, and auxiliary measurements. A maximum entropy method is employed to obtain a conditional posterior probability distribution (PPD) for the sound-speed ratio and the near-surface sound-speed gradient. The multiple data samples allow for a determination of the average error constraint value required to uniquely specify the PPD for each data sample. Two complicating features of the statistical inference study are addressed: (1) the need to develop an error function that can both utilize the measured multipath arrival structure and mitigate the effects of data errors and (2) the effect of small bathymetric slopes on the structure of the bottom interacting arrivals.
International Nuclear Information System (INIS)
Hadjisawas, Nicolas.
1982-01-01
After a critical study of the logical quantum mechanics formulations of Jauch and Piron, classical and quantum versions of statistical inference are studied. In order to do this, the significance of the Jaynes and Kulback principles (maximum likelihood, least squares principles) is revealed from the theorems established. In the quantum mechanics inference problem, a ''distance'' between states is defined. This concept is used to solve the quantum equivalent of the classical problem studied by Kulback. The ''projection postulate'' proposition is subsequently deduced [fr
Moraes, Alvaro
2015-01-01
Epidemics have shaped, sometimes more than wars and natural disasters, demo- graphic aspects of human populations around the world, their health habits and their economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and current examples of potential hazards at planetary scale. During the spread of an epidemic disease, there are phenomena, like the sudden extinction of the epidemic, that can not be captured by deterministic models. As a consequence, stochastic models have been proposed during the last decades. A typical forward problem in the stochastic setting could be the approximation of the expected number of infected individuals found in one month from now. On the other hand, a typical inverse problem could be, given a discretely observed set of epidemiological data, infer the transmission rate of the epidemic or its basic reproduction number. Markovian epidemic models are stochastic models belonging to a wide class of pure jump processes known as Stochastic Reaction Networks (SRNs), that are intended to describe the time evolution of interacting particle systems where one particle interacts with the others through a finite set of reaction channels. SRNs have been mainly developed to model biochemical reactions but they also have applications in neural networks, virus kinetics, and dynamics of social networks, among others. 4 This PhD thesis is focused on novel fast simulation algorithms and statistical inference methods for SRNs. Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide accurate estimates of expected values of a given observable of SRNs at a prescribed final time. They are designed to control the global approximation error up to a user-selected accuracy and up to a certain confidence level, and with near optimal computational work. We also present novel dual-weighted residual expansions for fast estimation of weak and strong errors arising from the MLMC methodology. Regarding the statistical inference
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad
2016-05-01
Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert
BIG-DATA and the Challenges for Statistical Inference and Economics Teaching and Learning
Directory of Open Access Journals (Sweden)
J.L. Peñaloza Figueroa
2017-04-01
Full Text Available The increasing automation in data collection, either in structured or unstructured formats, as well as the development of reading, concatenation and comparison algorithms and the growing analytical skills which characterize the era of Big Data, cannot not only be considered a technological achievement, but an organizational, methodological and analytical challenge for knowledge as well, which is necessary to generate opportunities and added value. In fact, exploiting the potential of Big-Data includes all fields of community activity; and given its ability to extract behaviour patterns, we are interested in the challenges for the field of teaching and learning, particularly in the field of statistical inference and economic theory. Big-Data can improve the understanding of concepts, models and techniques used in both statistical inference and economic theory, and it can also generate reliable and robust short and long term predictions. These facts have led to the demand for analytical capabilities, which in turn encourages teachers and students to demand access to massive information produced by individuals, companies and public and private organizations in their transactions and inter- relationships. Mass data (Big Data is changing the way people access, understand and organize knowledge, which in turn is causing a shift in the approach to statistics and economics teaching, considering them as a real way of thinking rather than just operational and technical disciplines. Hence, the question is how teachers can use automated collection and analytical skills to their advantage when teaching statistics and economics; and whether it will lead to a change in what is taught and how it is taught.
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-08-31
Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.
Statistical fluctuations of an ocean surface inferred from shoes and ships
Lerche, Ian; Maubeuge, Frédéric
1995-12-01
This paper shows that it is possible to roughly estimate some ocean properties using simple time-dependent statistical models of ocean fluctuations. Based on a real incident, the loss by a vessel of a Nike shoes container in the North Pacific Ocean, a statistical model was tested on data sets consisting of the Nike shoes found by beachcombers a few months later. This statistical treatment of the shoes' motion allows one to infer velocity trends of the Pacific Ocean, together with their fluctuation strengths. The idea is to suppose that there is a mean bulk flow speed that can depend on location on the ocean surface and time. The fluctuations of the surface flow speed are then treated as statistically random. The distribution of shoes is described in space and time using Markov probability processes related to the mean and fluctuating ocean properties. The aim of the exercise is to provide some of the properties of the Pacific Ocean that are otherwise calculated using a sophisticated numerical model, OSCURS, where numerous data are needed. Relevant quantities are sharply estimated, which can be useful to (1) constrain output results from OSCURS computations, and (2) elucidate the behavior patterns of ocean flow characteristics on long time scales.
Davison, Anthony C.
2015-04-10
Statistics of extremes concerns inference for rare events. Often the events have never yet been observed, and their probabilities must therefore be estimated by extrapolation of tail models fitted to available data. Because data concerning the event of interest may be very limited, efficient methods of inference play an important role. This article reviews this domain, emphasizing current research topics. We first sketch the classical theory of extremes for maxima and threshold exceedances of stationary series. We then review multivariate theory, distinguishing asymptotic independence and dependence models, followed by a description of models for spatial and spatiotemporal extreme events. Finally, we discuss inference and describe two applications. Animations illustrate some of the main ideas. © 2015 by Annual Reviews. All rights reserved.
Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng
2018-02-01
Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.
Inverse Statistics and Asset Allocation Efficiency
Bolgorian, Meysam
In this paper using inverse statistics analysis, the effect of investment horizon on the efficiency of portfolio selection is examined. Inverse statistics analysis is a general tool also known as probability distribution of exit time that is used for detecting the distribution of the time in which a stochastic process exits from a zone. This analysis was used in Refs. 1 and 2 for studying the financial returns time series. This distribution provides an optimal investment horizon which determines the most likely horizon for gaining a specific return. Using samples of stocks from Tehran Stock Exchange (TSE) as an emerging market and S&P 500 as a developed market, effect of optimal investment horizon in asset allocation is assessed. It is found that taking into account the optimal investment horizon in TSE leads to more efficiency for large size portfolios while for stocks selected from S&P 500, regardless of portfolio size, this strategy does not only not produce more efficient portfolios, but also longer investment horizons provides more efficiency.
Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S
2015-02-01
With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.
A statistical method for lung tumor segmentation uncertainty in PET images based on user inference.
Zheng, Chaojie; Wang, Xiuying; Feng, Dagan
2015-01-01
PET has been widely accepted as an effective imaging modality for lung tumor diagnosis and treatment. However, standard criteria for delineating tumor boundary from PET are yet to develop largely due to relatively low quality of PET images, uncertain tumor boundary definition, and variety of tumor characteristics. In this paper, we propose a statistical solution to segmentation uncertainty on the basis of user inference. We firstly define the uncertainty segmentation band on the basis of segmentation probability map constructed from Random Walks (RW) algorithm; and then based on the extracted features of the user inference, we use Principle Component Analysis (PCA) to formulate the statistical model for labeling the uncertainty band. We validated our method on 10 lung PET-CT phantom studies from the public RIDER collections [1] and 16 clinical PET studies where tumors were manually delineated by two experienced radiologists. The methods were validated using Dice similarity coefficient (DSC) to measure the spatial volume overlap. Our method achieved an average DSC of 0.878 ± 0.078 on phantom studies and 0.835 ± 0.039 on clinical studies.
Truth, possibility and probability new logical foundations of probability and statistical inference
Chuaqui, R
1991-01-01
Anyone involved in the philosophy of science is naturally drawn into the study of the foundations of probability. Different interpretations of probability, based on competing philosophical ideas, lead to different statistical techniques, and frequently to mutually contradictory consequences. This unique book presents a new interpretation of probability, rooted in the traditional interpretation that was current in the 17th and 18th centuries. Mathematical models are constructed based on this interpretation, and statistical inference and decision theory are applied, including some examples in artificial intelligence, solving the main foundational problems. Nonstandard analysis is extensively developed for the construction of the models and in some of the proofs. Many nonstandard theorems are proved, some of them new, in particular, a representation theorem that asserts that any stochastic process can be approximated by a process defined over a space with equiprobable outcomes.
International Nuclear Information System (INIS)
Dashti, Imad
2003-01-01
This paper uses a Bayesian stochastic frontier model to obtain confidence intervals on firm efficiency measures of electric utilities rather than the point estimates reported in most previous studies. Results reveal that the stochastic frontier model yields imprecise measures of firm efficiency. However, the application produces much more precise inference on pairwise efficiency comparisons of firms due to a sometimes strong positive covariance of efficiency measures across firms. In addition, we examine the sensitivity to functional form by repeating the analysis for Cobb-Douglas, translog and Fourier frontiers, with and without imposing monotonicity and concavity
Challenges and Approaches to Statistical Design and Inference in High Dimensional Investigations
Garrett, Karen A.; Allison, David B.
2015-01-01
Summary Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other “omic” data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology, and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative. PMID:19588106
Challenges and approaches to statistical design and inference in high-dimensional investigations.
Gadbury, Gary L; Garrett, Karen A; Allison, David B
2009-01-01
Advances in modern technologies have facilitated high-dimensional experiments (HDEs) that generate tremendous amounts of genomic, proteomic, and other "omic" data. HDEs involving whole-genome sequences and polymorphisms, expression levels of genes, protein abundance measurements, and combinations thereof have become a vanguard for new analytic approaches to the analysis of HDE data. Such situations demand creative approaches to the processes of statistical inference, estimation, prediction, classification, and study design. The novel and challenging biological questions asked from HDE data have resulted in many specialized analytic techniques being developed. This chapter discusses some of the unique statistical challenges facing investigators studying high-dimensional biology and describes some approaches being developed by statistical scientists. We have included some focus on the increasing interest in questions involving testing multiple propositions simultaneously, appropriate inferential indicators for the types of questions biologists are interested in, and the need for replication of results across independent studies, investigators, and settings. A key consideration inherent throughout is the challenge in providing methods that a statistician judges to be sound and a biologist finds informative.
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Simple simulation of diffusion bridges with application to likelihood inference for diffusions
DEFF Research Database (Denmark)
Bladt, Mogens; Sørensen, Michael
2014-01-01
the accuracy and efficiency of the approximate method and compare it to exact simulation methods. In the study, our method provides a very good approximation to the distribution of a diffusion bridge for bridges that are likely to occur in applications to statistical inference. To illustrate the usefulness......With a view to statistical inference for discretely observed diffusion models, we propose simple methods of simulating diffusion bridges, approximately and exactly. Diffusion bridge simulation plays a fundamental role in likelihood and Bayesian inference for diffusion processes. First a simple......-dimensional diffusions and is applicable to all one-dimensional diffusion processes with finite speed-measure. One advantage of the new approach is that simple simulation methods like the Milstein scheme can be applied to bridge simulation. Another advantage over previous bridge simulation methods is that the proposed...
Frank, Laurence Emmanuelle
2006-01-01
Feature Network Models (FNM) are graphical structures that represent proximity data in a discrete space with the use of features. A statistical inference theory is introduced, based on the additivity properties of networks and the linear regression framework. Considering features as predictor
Statistical inference for the additive hazards model under outcome-dependent sampling.
Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo
2015-09-01
Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.
Directory of Open Access Journals (Sweden)
Melissa Coulson
2010-07-01
Full Text Available A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST, or confidence intervals (CIs. Authors of articles published in psychology, behavioural neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
Inferring properties of disordered chains from FRET transfer efficiencies
Zheng, Wenwei; Zerze, Gül H.; Borgia, Alessandro; Mittal, Jeetain; Schuler, Benjamin; Best, Robert B.
2018-03-01
Förster resonance energy transfer (FRET) is a powerful tool for elucidating both structural and dynamic properties of unfolded or disordered biomolecules, especially in single-molecule experiments. However, the key observables, namely, the mean transfer efficiency and fluorescence lifetimes of the donor and acceptor chromophores, are averaged over a broad distribution of donor-acceptor distances. The inferred average properties of the ensemble therefore depend on the form of the model distribution chosen to describe the distance, as has been widely recognized. In addition, while the distribution for one type of polymer model may be appropriate for a chain under a given set of physico-chemical conditions, it may not be suitable for the same chain in a different environment so that even an apparently consistent application of the same model over all conditions may distort the apparent changes in chain dimensions with variation of temperature or solution composition. Here, we present an alternative and straightforward approach to determining ensemble properties from FRET data, in which the polymer scaling exponent is allowed to vary with solution conditions. In its simplest form, it requires either the mean FRET efficiency or fluorescence lifetime information. In order to test the accuracy of the method, we have utilized both synthetic FRET data from implicit and explicit solvent simulations for 30 different protein sequences, and experimental single-molecule FRET data for an intrinsically disordered and a denatured protein. In all cases, we find that the inferred radii of gyration are within 10% of the true values, thus providing higher accuracy than simpler polymer models. In addition, the scaling exponents obtained by our procedure are in good agreement with those determined directly from the molecular ensemble. Our approach can in principle be generalized to treating other ensemble-averaged functions of intramolecular distances from experimental data.
Kim, Daesang; El Gharamti, Iman; Bisetti, Fabrizio; Farooq, Aamir; Knio, Omar
2016-01-01
A new Bayesian inference method has been developed and applied to Furan shock tube experimental data for efficient statistical inferences of the Arrhenius parameters of two OH radical consumption reactions. The collected experimental data, which
Statistical inference of the nuclear accidents occurrence number for the next decade
International Nuclear Information System (INIS)
Felizia, E.R.
1987-01-01
This paper aims to give a response using the classical statistical and bayesian inference techniques regarding the common characteristic in the Harrisburg and Chernobyl nuclear accidents: in both reactors, core fusion occurred. In relation to the last mentioned techniques, the most recent developments were applied, based on the decision theory of uncertainty; among others, the principle of maximum entropy. Besides, as a preliminar information on the accidents occurrence frequency with core fusion, the German risk analysis results were used. The estimations predicted for the next decade an average between one or two accidents with core fusion and low possibilities for the 'no accident' event in the same period. (Author)
High-dimensional statistical inference: From vector to matrix
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The
High-order Composite Likelihood Inference for Max-Stable Distributions and Processes
Castruccio, Stefano; Huser, Raphaë l; Genton, Marc G.
2015-01-01
In multivariate or spatial extremes, inference for max-stable processes observed at a large collection of locations is a very challenging problem in computational statistics, and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. In this work, we explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely-used multivariate or spatial extreme models, we discuss how to choose composite likelihood truncation to improve the efficiency, and we also provide recommendations for practitioners. This article has supplementary material online.
High-order Composite Likelihood Inference for Max-Stable Distributions and Processes
Castruccio, Stefano
2015-09-29
In multivariate or spatial extremes, inference for max-stable processes observed at a large collection of locations is a very challenging problem in computational statistics, and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. In this work, we explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely-used multivariate or spatial extreme models, we discuss how to choose composite likelihood truncation to improve the efficiency, and we also provide recommendations for practitioners. This article has supplementary material online.
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
Tropical geometry of statistical models.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.
Bayesian inference on proportional elections.
Directory of Open Access Journals (Sweden)
Gabriel Hideki Vatanabe Brunello
Full Text Available Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.
Höhna, Sebastian; May, Michael R; Moore, Brian R
2016-03-01
Many fundamental questions in evolutionary biology entail estimating rates of lineage diversification (speciation-extinction) that are modeled using birth-death branching processes. We leverage recent advances in branching-process theory to develop a flexible Bayesian framework for specifying diversification models-where rates are constant, vary continuously, or change episodically through time-and implement numerical methods to estimate parameters of these models from molecular phylogenies, even when species sampling is incomplete. We enable both statistical inference and efficient simulation under these models. We also provide robust methods for comparing the relative and absolute fit of competing branching-process models to a given tree, thereby providing rigorous tests of biological hypotheses regarding patterns and processes of lineage diversification. The source code for TESS is freely available at http://cran.r-project.org/web/packages/TESS/ CONTACT: Sebastian.Hoehna@gmail.com. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bailer-Jones, Coryn A. L.
2017-04-01
Preface; 1. Probability basics; 2. Estimation and uncertainty; 3. Statistical models and inference; 4. Linear models, least squares, and maximum likelihood; 5. Parameter estimation: single parameter; 6. Parameter estimation: multiple parameters; 7. Approximating distributions; 8. Monte Carlo methods for inference; 9. Parameter estimation: Markov chain Monte Carlo; 10. Frequentist hypothesis testing; 11. Model comparison; 12. Dealing with more complicated problems; References; Index.
Multivariate Statistical Inference of Lightning Occurrence, and Using Lightning Observations
Boccippio, Dennis
2004-01-01
Two classes of multivariate statistical inference using TRMM Lightning Imaging Sensor, Precipitation Radar, and Microwave Imager observation are studied, using nonlinear classification neural networks as inferential tools. The very large and globally representative data sample provided by TRMM allows both training and validation (without overfitting) of neural networks with many degrees of freedom. In the first study, the flashing / or flashing condition of storm complexes is diagnosed using radar, passive microwave and/or environmental observations as neural network inputs. The diagnostic skill of these simple lightning/no-lightning classifiers can be quite high, over land (above 80% Probability of Detection; below 20% False Alarm Rate). In the second, passive microwave and lightning observations are used to diagnose radar reflectivity vertical structure. A priori diagnosis of hydrometeor vertical structure is highly important for improved rainfall retrieval from either orbital radars (e.g., the future Global Precipitation Mission "mothership") or radiometers (e.g., operational SSM/I and future Global Precipitation Mission passive microwave constellation platforms), we explore the incremental benefit to such diagnosis provided by lightning observations.
Statistical theory and inference
Olive, David J
2014-01-01
This text is for a one semester graduate course in statistical theory and covers minimal and complete sufficient statistics, maximum likelihood estimators, method of moments, bias and mean square error, uniform minimum variance estimators and the Cramer-Rao lower bound, an introduction to large sample theory, likelihood ratio tests and uniformly most powerful tests and the Neyman Pearson Lemma. A major goal of this text is to make these topics much more accessible to students by using the theory of exponential families. Exponential families, indicator functions and the support of the distribution are used throughout the text to simplify the theory. More than 50 ``brand name" distributions are used to illustrate the theory with many examples of exponential families, maximum likelihood estimators and uniformly minimum variance unbiased estimators. There are many homework problems with over 30 pages of solutions.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
Inference in hybrid Bayesian networks
International Nuclear Information System (INIS)
Langseth, Helge; Nielsen, Thomas D.; Rumi, Rafael; Salmeron, Antonio
2009-01-01
Since the 1980s, Bayesian networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability techniques (like fault trees and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (the so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability.
On quantum statistical inference
Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E.
2001-01-01
Recent developments in the mathematical foundations of quantum mechanics have brought the theory closer to that of classical probability and statistics. On the other hand, the unique character of quantum physics sets many of the questions addressed apart from those met classically in stochastics.
Stotts, Steven A; Koch, Robert A
2017-08-01
In this paper an approach is presented to estimate the constraint required to apply maximum entropy (ME) for statistical inference with underwater acoustic data from a single track segment. Previous algorithms for estimating the ME constraint require multiple source track segments to determine the constraint. The approach is relevant for addressing model mismatch effects, i.e., inaccuracies in parameter values determined from inversions because the propagation model does not account for all acoustic processes that contribute to the measured data. One effect of model mismatch is that the lowest cost inversion solution may be well outside a relatively well-known parameter value's uncertainty interval (prior), e.g., source speed from track reconstruction or towed source levels. The approach requires, for some particular parameter value, the ME constraint to produce an inferred uncertainty interval that encompasses the prior. Motivating this approach is the hypothesis that the proposed constraint determination procedure would produce a posterior probability density that accounts for the effect of model mismatch on inferred values of other inversion parameters for which the priors might be quite broad. Applications to both measured and simulated data are presented for model mismatch that produces minimum cost solutions either inside or outside some priors.
Watson, Jane
2007-01-01
Inference, or decision making, is seen in curriculum documents as the final step in a statistical investigation. For a formal statistical enquiry this may be associated with sophisticated tests involving probability distributions. For young students without the mathematical background to perform such tests, it is still possible to draw informal…
Variations on Bayesian Prediction and Inference
2016-05-09
inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
Targeted estimation of nuisance parameters to obtain valid statistical inference.
van der Laan, Mark J
2014-01-01
In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special
Kroese, A.H.; van der Meulen, E.A.; Poortema, Klaas; Schaafsma, W.
1995-01-01
The making of statistical inferences in distributional form is conceptionally complicated because the epistemic 'probabilities' assigned are mixtures of fact and fiction. In this respect they are essentially different from 'physical' or 'frequency-theoretic' probabilities. The distributional form is
Examples in parametric inference with R
Dixit, Ulhas Jayram
2016-01-01
This book discusses examples in parametric inference with R. Combining basic theory with modern approaches, it presents the latest developments and trends in statistical inference for students who do not have an advanced mathematical and statistical background. The topics discussed in the book are fundamental and common to many fields of statistical inference and thus serve as a point of departure for in-depth study. The book is divided into eight chapters: Chapter 1 provides an overview of topics on sufficiency and completeness, while Chapter 2 briefly discusses unbiased estimation. Chapter 3 focuses on the study of moments and maximum likelihood estimators, and Chapter 4 presents bounds for the variance. In Chapter 5, topics on consistent estimator are discussed. Chapter 6 discusses Bayes, while Chapter 7 studies some more powerful tests. Lastly, Chapter 8 examines unbiased and other tests. Senior undergraduate and graduate students in statistics and mathematics, and those who have taken an introductory cou...
Bayesian Inference in Statistical Analysis
Box, George E P
2011-01-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Rob
Logical inference and evaluation
International Nuclear Information System (INIS)
Perey, F.G.
1981-01-01
Most methodologies of evaluation currently used are based upon the theory of statistical inference. It is generally perceived that this theory is not capable of dealing satisfactorily with what are called systematic errors. Theories of logical inference should be capable of treating all of the information available, including that not involving frequency data. A theory of logical inference is presented as an extension of deductive logic via the concept of plausibility and the application of group theory. Some conclusions, based upon the application of this theory to evaluation of data, are also given
DEFF Research Database (Denmark)
Møller, Jesper
2010-01-01
Chapter 9: This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods based on a maximum likelihood or Bayesian approach combined with markov chain Monte Carlo...... (MCMC) techniques. Due to space limitations the focus is on spatial point processes....
DEFF Research Database (Denmark)
Korneliussen, Thorfinn Sand
Due to the recent advances in DNA sequencing technology genomic data are being generated at an unprecedented rate and we are gaining access to entire genomes at population level. The technology does, however, not give direct access to the genetic variation and the many levels of preprocessing...... that is required before being able to make inferences from the data introduces multiple levels of uncertainty, especially for low-depth data. Therefore methods that take into account the inherent uncertainty are needed for being able to make robust inferences in the downstream analysis of such data. This poses...... a problem for a range of key summary statistics within populations genetics where existing methods are based on the assumption that the true genotypes are known. Motivated by this I present: 1) a new method for the estimation of relatedness between pairs of individuals, 2) a new method for estimating...
Inference in models with adaptive learning
Chevillon, G.; Massmann, M.; Mavroeidis, S.
2010-01-01
Identification of structural parameters in models with adaptive learning can be weak, causing standard inference procedures to become unreliable. Learning also induces persistent dynamics, and this makes the distribution of estimators and test statistics non-standard. Valid inference can be
Nolte, Ingmar; Voev, Valeri
2009-01-01
The expected value of sums of squared intraday returns (realized variance)gives rise to a least squares regression which adapts itself to the assumptions ofthe noise process and allows for a joint inference on integrated volatility (IV),noise moments and price-noise relations. In the iid noise case we derive theasymptotic variance of the regression parameter estimating the IV, show thatit is consistent and compare its asymptotic efficiency against alternative consistentIV measures. In case of...
Vilanova, Pedro
2016-01-01
reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ
Sampling, Probability Models and Statistical Reasoning Statistical
Indian Academy of Sciences (India)
Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...
Balanovskаya, Tetiana Ivanovna; Boretska, Zoreslava Petrovna
2014-01-01
Application of fuzzy inference system to increase efficiency of management decision- making in agricultural enterprises. Theoretical and methodological issues, practical recommendations on improvement of management decision-making in agricultural enterprises to increase their competitiveness have been intensified and developed in the article. A simulation example of a quality management system for agricultural products on the basis of the theory of fuzzy sets and fuzzy logic has been proposed...
Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun
2015-12-01
Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.
Statistical inference for template aging
Schuckers, Michael E.
2006-04-01
A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Price limits and stock market efficiency: Evidence from rolling bicorrelation test statistic
International Nuclear Information System (INIS)
Lim, Kian-Ping; Brooks, Robert D.
2009-01-01
Using the rolling bicorrelation test statistic, the present paper compares the efficiency of stock markets from China, Korea and Taiwan in selected sub-periods with different price limits regimes. The statistical results do not support the claims that restrictive price limits and price limits per se are jeopardizing market efficiency. However, the evidence does not imply that price limits have no effect on the price discovery process but rather suggesting that market efficiency is not merely determined by price limits.
Uncertainty in prediction and in inference
International Nuclear Information System (INIS)
Hilgevoord, J.; Uffink, J.
1991-01-01
The concepts of uncertainty in prediction and inference are introduced and illustrated using the diffraction of light as an example. The close relationship between the concepts of uncertainty in inference and resolving power is noted. A general quantitative measure of uncertainty in inference can be obtained by means of the so-called statistical distance between probability distributions. When applied to quantum mechanics, this distance leads to a measure of the distinguishability of quantum states, which essentially is the absolute value of the matrix element between the states. The importance of this result to the quantum mechanical uncertainty principle is noted. The second part of the paper provides a derivation of the statistical distance on the basis of the so-called method of support
Efficient Parallel Statistical Model Checking of Biochemical Networks
Directory of Open Access Journals (Sweden)
Paolo Ballarini
2009-12-01
Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
On quantum statistical inference
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Gill, Richard D.; Jupp, Peter E.
Recent developments in the mathematical foundations of quantum mechanics have brought the theory closer to that of classical probability and statistics. On the other hand, the unique character of quantum physics sets many of the questions addressed apart from those met classically in stochastics....... Furthermore, concurrent advances in experimental techniques and in the theory of quantum computation have led to a strong interest in questions of quantum information, in particular in the sense of the amount of information about unknown parameters in given observational data or accessible through various...
The statistical-inference approach to generalized thermodynamics
International Nuclear Information System (INIS)
Lavenda, B.H.; Scherer, C.
1987-01-01
Limit theorems, such as the central-limit theorem and the weak law of large numbers, are applicable to statistical thermodynamics for sufficiently large sample size of indipendent and identically distributed observations performed on extensive thermodynamic (chance) variables. The estimation of the intensive thermodynamic quantities is a problem in parametric statistical estimation. The normal approximation to the Gibbs' distribution is justified by the analysis of large deviations. Statistical thermodynamics is generalized to include the statistical estimation of variance as well as mean values
Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P
2017-03-01
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Phylogenetic Inference of HIV Transmission Clusters
Directory of Open Access Journals (Sweden)
Vlad Novitsky
2017-10-01
Full Text Available Better understanding the structure and dynamics of HIV transmission networks is essential for designing the most efficient interventions to prevent new HIV transmissions, and ultimately for gaining control of the HIV epidemic. The inference of phylogenetic relationships and the interpretation of results rely on the definition of the HIV transmission cluster. The definition of the HIV cluster is complex and dependent on multiple factors, including the design of sampling, accuracy of sequencing, precision of sequence alignment, evolutionary models, the phylogenetic method of inference, and specified thresholds for cluster support. While the majority of studies focus on clusters, non-clustered cases could also be highly informative. A new dimension in the analysis of the global and local HIV epidemics is the concept of phylogenetically distinct HIV sub-epidemics. The identification of active HIV sub-epidemics reveals spreading viral lineages and may help in the design of targeted interventions.HIVclustering can also be affected by sampling density. Obtaining a proper sampling density may increase statistical power and reduce sampling bias, so sampling density should be taken into account in study design and in interpretation of phylogenetic results. Finally, recent advances in long-range genotyping may enable more accurate inference of HIV transmission networks. If performed in real time, it could both inform public-health strategies and be clinically relevant (e.g., drug-resistance testing.
Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R
2017-01-01
Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.
Aggelopoulos, Nikolaos C
2015-08-01
Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.
The R Package MitISEM: Efficient and Robust Simulation Procedures for Bayesian Inference
Directory of Open Access Journals (Sweden)
Nalan Baştürk
2017-07-01
Full Text Available This paper presents the R package MitISEM (mixture of t by importance sampling weighted expectation maximization which provides an automatic and flexible two-stage method to approximate a non-elliptical target density kernel - typically a posterior density kernel - using an adaptive mixture of Student t densities as approximating density. In the first stage a mixture of Student t densities is fitted to the target using an expectation maximization algorithm where each step of the optimization procedure is weighted using importance sampling. In the second stage this mixture density is a candidate density for efficient and robust application of importance sampling or the Metropolis-Hastings (MH method to estimate properties of the target distribution. The package enables Bayesian inference and prediction on model parameters and probabilities, in particular, for models where densities have multi-modal or other non-elliptical shapes like curved ridges. These shapes occur in research topics in several scientific fields. For instance, analysis of DNA data in bio-informatics, obtaining loans in the banking sector by heterogeneous groups in financial economics and analysis of education's effect on earned income in labor economics. The package MitISEM provides also an extended algorithm, 'sequential MitISEM', which substantially decreases computation time when the target density has to be approximated for increasing data samples. This occurs when the posterior or predictive density is updated with new observations and/or when one computes model probabilities using predictive likelihoods. We illustrate the MitISEM algorithm using three canonical statistical and econometric models that are characterized by several types of non-elliptical posterior shapes and that describe well-known data patterns in econometrics and finance. We show that MH using the candidate density obtained by MitISEM outperforms, in terms of numerical efficiency, MH using a simpler
Combining statistical inference and decisions in ecology
Williams, Perry J.; Hooten, Mevin B.
2016-01-01
Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Shot Group Statistics for Small Arms Applications
2017-06-01
if its probability distribution is known with sufficient accuracy, then it can be used to make a sound statistical inference on the unknown... statistical inference on the unknown, population standard deviations of the x and y impact-point positions. The dispersion measures treated in this report...known with sufficient accuracy, then it can be used to make a sound statistical inference on the unknown, population standard deviations of the x and y
Serang, Oliver
2015-08-01
Observations depending on sums of random variables are common throughout many fields; however, no efficient solution is currently known for performing max-product inference on these sums of general discrete distributions (max-product inference can be used to obtain maximum a posteriori estimates). The limiting step to max-product inference is the max-convolution problem (sometimes presented in log-transformed form and denoted as "infimal convolution," "min-convolution," or "convolution on the tropical semiring"), for which no O(k log(k)) method is currently known. Presented here is an O(k log(k)) numerical method for estimating the max-convolution of two nonnegative vectors (e.g., two probability mass functions), where k is the length of the larger vector. This numerical max-convolution method is then demonstrated by performing fast max-product inference on a convolution tree, a data structure for performing fast inference given information on the sum of n discrete random variables in O(nk log(nk)log(n)) steps (where each random variable has an arbitrary prior distribution on k contiguous possible states). The numerical max-convolution method can be applied to specialized classes of hidden Markov models to reduce the runtime of computing the Viterbi path from nk(2) to nk log(k), and has potential application to the all-pairs shortest paths problem.
Statistical Efficiency of Double-Bounded Dichotomous Choice Contingent Valuation
Michael Hanemann; John Loomis; Barbara Kanninen
1991-01-01
The statistical efficiency of conventional dichotomous choice contingent valuation surveys can be improved by asking each respondent a second dichotomous choice question which depends on the response to the first question—if the first response is "yes," the second bid is some amount greater than the first bid; while, if the first response is "no," the second bid is some amount smaller. This "double-bounded" approach is shown to be asymptotically more efficient than the conventional, "singlebo...
Wang, S.; Huang, G. H.; Baetz, B. W.; Huang, W.
2015-11-01
This paper presents a polynomial chaos ensemble hydrologic prediction system (PCEHPS) for an efficient and robust uncertainty assessment of model parameters and predictions, in which possibilistic reasoning is infused into probabilistic parameter inference with simultaneous consideration of randomness and fuzziness. The PCEHPS is developed through a two-stage factorial polynomial chaos expansion (PCE) framework, which consists of an ensemble of PCEs to approximate the behavior of the hydrologic model, significantly speeding up the exhaustive sampling of the parameter space. Multiple hypothesis testing is then conducted to construct an ensemble of reduced-dimensionality PCEs with only the most influential terms, which is meaningful for achieving uncertainty reduction and further acceleration of parameter inference. The PCEHPS is applied to the Xiangxi River watershed in China to demonstrate its validity and applicability. A detailed comparison between the HYMOD hydrologic model, the ensemble of PCEs, and the ensemble of reduced PCEs is performed in terms of accuracy and efficiency. Results reveal temporal and spatial variations in parameter sensitivities due to the dynamic behavior of hydrologic systems, and the effects (magnitude and direction) of parametric interactions depending on different hydrological metrics. The case study demonstrates that the PCEHPS is capable not only of capturing both expert knowledge and probabilistic information in the calibration process, but also of implementing an acceleration of more than 10 times faster than the hydrologic model without compromising the predictive accuracy.
International Nuclear Information System (INIS)
Morfeldt, Johannes; Silveira, Semida
2014-01-01
Energy efficiency indicators used for evaluating industrial activities at the national level are often based on statistics reported in international databases. In the case of the Swedish iron and steel sector, energy consumption statistics published by Odyssee, Eurostat, the IEA (International Energy Agency), and the United Nations differ, resulting in diverging energy efficiency indicators. For certain years, the specific energy consumption for steel is twice as high if based on Odyssee statistics instead of statistics from the IEA. The analysis revealed that the assumptions behind the allocation of coal and coke used in blast furnaces as energy consumption or energy transformation are the major cause for these differences. Furthermore, the differences are also related to errors in the statistical data resulting from two different surveys that support the data. The allocation of coal and coke has implications when promoting resource as well as energy efficiency at the systems level. Eurostat's definition of energy consumption is more robust compared to the definitions proposed by other organisations. Nevertheless, additional data and improved energy efficiency indicators are needed to fully monitor the iron and steel sector's energy system and promote improvements towards a greener economy at large. - Highlights: • Energy statistics for the iron and steel sector diverge in international databases. • Varying methods have implications when monitoring energy and resource efficiency. • Allocation of blast furnaces as transformation activities is behind the differences. • Different statistical surveys and human error also contribute to diverging results
Using Alien Coins to Test Whether Simple Inference Is Bayesian
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
DEFF Research Database (Denmark)
Møller, Jesper
(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.1 with the ......(This text written by Jesper Møller, Aalborg University, is submitted for the collection ‘Stochastic Geometry: Highlights, Interactions and New Perspectives', edited by Wilfrid S. Kendall and Ilya Molchanov, to be published by ClarendonPress, Oxford, and planned to appear as Section 4.......1 with the title ‘Inference'.) This contribution concerns statistical inference for parametric models used in stochastic geometry and based on quick and simple simulation free procedures as well as more comprehensive methods using Markov chain Monte Carlo (MCMC) simulations. Due to space limitations the focus...
Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.
Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo
2016-11-01
We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.
Directory of Open Access Journals (Sweden)
A Zareei
2017-05-01
Full Text Available Introduction Spiral conveyors effectively carry solid masses as free or partly free flow of materials. They create good throughput and they are the perfect solution to solve the problems of transport, due to their simple structure, high efficiency and low maintenance costs. This study aims to investigate the performance characteristics of conveyors as function of auger diameter, rotational speed and handling inclination angle. The performance characteristic was investigated according to volumetric efficiency. In another words, the purpose of this study was obtaining a suitable model for volumetric efficiency changes of steep auger to transfer agricultural products. Three different diameters of auger, five levels of rotational speed and three slope angles were used to investigate the effects of changes in these parameters on volumetric efficiency of auger. The used method is novel in this area and the results show that performance by ANFIS models is much better than common statistical models. Materials and Methods The experiments were conducted in Department of Mechanical Engineering of Agricultural Machinery in Urmia University. In this study, SAYOS cultivar of wheat was used. This cultivar of wheat had hard seeds and the humidity was 12% (based on wet. Before testing, all foreign material was separated from the wheat such as stone, dust, plant residues and green seeds. Bulk density of wheat was 790 kg m-3. The auger shaft of the spiral conveyor was received its rotational force through belt and electric motor and its rotation leading to transfer the product to the output. In this study, three conveyors at diameters of 13, 17.5, and 22.5 cm, five levels of rotational speed at 100, 200, 300, 400, and 500 rpm and three handling angles of 10, 20, and 30º were tested. Adaptive Nero-fuzzy inference system (ANFIS is the combination of fuzzy systems and artificial neural network, so it has both benefits. This system is useful to solve the complex non
Some challenges with statistical inference in adaptive designs.
Hung, H M James; Wang, Sue-Jane; Yang, Peiling
2014-01-01
Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Ensemble stacking mitigates biases in inference of synaptic connectivity
Directory of Open Access Journals (Sweden)
Brendan Chambers
2018-03-01
Full Text Available A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches. Mapping the routing of spikes through local circuitry is crucial for understanding neocortical computation. Under appropriate experimental conditions, these maps can be used to infer likely patterns of synaptic recruitment, linking activity to underlying anatomical connections. Such inferences help to reveal the synaptic implementation of population dynamics and computation. We compare a number of standard functional measures to infer underlying connectivity. We find that regularization impacts measures
Statistical detection of EEG synchrony using empirical bayesian inference.
Directory of Open Access Journals (Sweden)
Archana K Singh
Full Text Available There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001 for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Maximum entropy approach to statistical inference for an ocean acoustic waveguide.
Knobles, D P; Sagers, J D; Koch, R A
2012-02-01
A conditional probability distribution suitable for estimating the statistical properties of ocean seabed parameter values inferred from acoustic measurements is derived from a maximum entropy principle. The specification of the expectation value for an error function constrains the maximization of an entropy functional. This constraint determines the sensitivity factor (β) to the error function of the resulting probability distribution, which is a canonical form that provides a conservative estimate of the uncertainty of the parameter values. From the conditional distribution, marginal distributions for individual parameters can be determined from integration over the other parameters. The approach is an alternative to obtaining the posterior probability distribution without an intermediary determination of the likelihood function followed by an application of Bayes' rule. In this paper the expectation value that specifies the constraint is determined from the values of the error function for the model solutions obtained from a sparse number of data samples. The method is applied to ocean acoustic measurements taken on the New Jersey continental shelf. The marginal probability distribution for the values of the sound speed ratio at the surface of the seabed and the source levels of a towed source are examined for different geoacoustic model representations. © 2012 Acoustical Society of America
International Nuclear Information System (INIS)
McDonald, L.L.; Erickson, W.P.; Strickland, M.D.
1995-01-01
The objective of the Coastal Habitat Injury Assessment study was to document and quantify injury to biota of the shallow subtidal, intertidal, and supratidal zones throughout the shoreline affected by oil or cleanup activity associated with the Exxon Valdez oil spill. The results of these studies were to be used to support the Trustee's Type B Natural Resource Damage Assessment under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA). A probability based stratified random sample of shoreline segments was selected with probability proportional to size from each of 15 strata (5 habitat types crossed with 3 levels of potential oil impact) based on those data available in July, 1989. Three study regions were used: Prince William Sound, Cook Inlet/Kenai Peninsula, and Kodiak/Alaska Peninsula. A Geographic Information System was utilized to combine oiling and habitat data and to select the probability sample of study sites. Quasi-experiments were conducted where randomly selected oiled sites were compared to matched reference sites. Two levels of statistical inferences, philosophical bases, and limitations are discussed and illustrated with example data from the resulting studies. 25 refs., 4 figs., 1 tab
Lectures on algebraic statistics
Drton, Mathias; Sullivant, Seth
2009-01-01
How does an algebraic geometer studying secant varieties further the understanding of hypothesis tests in statistics? Why would a statistician working on factor analysis raise open problems about determinantal varieties? Connections of this type are at the heart of the new field of "algebraic statistics". In this field, mathematicians and statisticians come together to solve statistical inference problems using concepts from algebraic geometry as well as related computational and combinatorial techniques. The goal of these lectures is to introduce newcomers from the different camps to algebraic statistics. The introduction will be centered around the following three observations: many important statistical models correspond to algebraic or semi-algebraic sets of parameters; the geometry of these parameter spaces determines the behaviour of widely used statistical inference procedures; computational algebraic geometry can be used to study parameter spaces and other features of statistical models.
Lower complexity bounds for lifted inference
DEFF Research Database (Denmark)
Jaeger, Manfred
2015-01-01
instances of the model. Numerous approaches for such “lifted inference” techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show...... the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established earlier in Jaeger (2000; Jaeger, M. 2000. On the complexity of inference about...... that under the assumption that NETIME≠ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier-, and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference and for knowledge bases not containing...
An Efficient Graph-based Method for Long-term Land-use Change Statistics
Directory of Open Access Journals (Sweden)
Yipeng Zhang
2015-12-01
Full Text Available Statistical analysis of land-use change plays an important role in sustainable land management and has received increasing attention from scholars and administrative departments. However, the statistical process involving spatial overlay analysis remains difficult and needs improvement to deal with mass land-use data. In this paper, we introduce a spatio-temporal flow network model to reveal the hidden relational information among spatio-temporal entities. Based on graph theory, the constant condition of saturated multi-commodity flow is derived. A new method based on a network partition technique of spatio-temporal flow network are proposed to optimize the transition statistical process. The effectiveness and efficiency of the proposed method is verified through experiments using land-use data in Hunan from 2009 to 2014. In the comparison among three different land-use change statistical methods, the proposed method exhibits remarkable superiority in efficiency.
Outcome-Dependent Sampling Design and Inference for Cox’s Proportional Hazards Model
Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P.; Zhou, Haibo
2016-01-01
We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study. PMID:28090134
STATISTICAL RELATIONAL LEARNING AND SCRIPT INDUCTION FOR TEXTUAL INFERENCE
2017-12-01
compensate for parser errors. We replace deterministic conjunction by an average combiner, which encodes causal independence. Our framework was the...sentence similarity (STS) and sentence paraphrasing, but not Textual Entailment, where deeper inferences are required. As the formula for conjunction ...When combined, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the
EI: A Program for Ecological Inference
Directory of Open Access Journals (Sweden)
Gary King
2004-09-01
Full Text Available The program EI provides a method of inferring individual behavior from aggregate data. It implements the statistical procedures, diagnostics, and graphics from the book A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (King 1997. Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological" data to infer discrete individual-level relationships of interest when individual-level data are not available. Ecological inferences are required in political science research when individual-level surveys are unavailable (e.g., local or comparative electoral politics, unreliable (racial politics, insufficient (political geography, or infeasible (political history. They are also required in numerous areas of ma jor significance in public policy (e.g., for applying the Voting Rights Act and other academic disciplines ranging from epidemiology and marketing to sociology and quantitative history.
On principles of inductive inference
Kostecki, Ryszard Paweł
2011-01-01
We propose an intersubjective epistemic approach to foundations of probability theory and statistical inference, based on relative entropy and category theory, and aimed to bypass the mathematical and conceptual problems of existing foundational approaches.
Ensemble stacking mitigates biases in inference of synaptic connectivity.
Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N
2018-01-01
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
Order statistics & inference estimation methods
Balakrishnan, N
1991-01-01
The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Chambaz, Antoine; Zheng, Wenjing; van der Laan, Mark J
2017-01-01
This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the non-exceptional case, i.e. , assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal TR and under the optimal TR itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference. As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. A simulation study illustrates the procedure. One of the corner-stones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral.
Quantum-Like Representation of Non-Bayesian Inference
Asano, M.; Basieva, I.; Khrennikov, A.; Ohya, M.; Tanaka, Y.
2013-01-01
This research is related to the problem of "irrational decision making or inference" that have been discussed in cognitive psychology. There are some experimental studies, and these statistical data cannot be described by classical probability theory. The process of decision making generating these data cannot be reduced to the classical Bayesian inference. For this problem, a number of quantum-like coginitive models of decision making was proposed. Our previous work represented in a natural way the classical Bayesian inference in the frame work of quantum mechanics. By using this representation, in this paper, we try to discuss the non-Bayesian (irrational) inference that is biased by effects like the quantum interference. Further, we describe "psychological factor" disturbing "rationality" as an "environment" correlating with the "main system" of usual Bayesian inference.
Statistical physics inspired energy-efficient coded-modulation for optical communications.
Djordjevic, Ivan B; Xu, Lei; Wang, Ting
2012-04-15
Because Shannon's entropy can be obtained by Stirling's approximation of thermodynamics entropy, the statistical physics energy minimization methods are directly applicable to the signal constellation design. We demonstrate that statistical physics inspired energy-efficient (EE) signal constellation designs, in combination with large-girth low-density parity-check (LDPC) codes, significantly outperform conventional LDPC-coded polarization-division multiplexed quadrature amplitude modulation schemes. We also describe an EE signal constellation design algorithm. Finally, we propose the discrete-time implementation of D-dimensional transceiver and corresponding EE polarization-division multiplexed system. © 2012 Optical Society of America
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets
Sun, Ying; Stein, Michael L.
2014-01-01
For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets
Sun, Ying
2014-11-07
For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Statistical aspects of determinantal point processes
DEFF Research Database (Denmark)
Lavancier, Frédéric; Møller, Jesper; Rubak, Ege Holger
The statistical aspects of determinantal point processes (DPPs) seem largely unexplored. We review the appealing properties of DDPs, demonstrate that they are useful models for repulsiveness, detail a simulation procedure, and provide freely available software for simulation and statistical...... inference. We pay special attention to stationary DPPs, where we give a simple condition ensuring their existence, construct parametric models, describe how they can be well approximated so that the likelihood can be evaluated and realizations can be simulated, and discuss how statistical inference...
Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation
Matsumoto, Takumi; Sagawa, Takahiro
2018-04-01
A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.
Towards Bayesian Inference of the Fast-Ion Distribution Function
DEFF Research Database (Denmark)
Stagner, L.; Heidbrink, W.W.; Salewski, Mirko
2012-01-01
sensitivity of the measurements are incorporated into Bayesian likelihood probabilities, while prior probabilities enforce physical constraints. As an initial step, this poster uses Bayesian statistics to infer the DIII-D electron density profile from multiple diagnostic measurements. Likelihood functions....... However, when theory and experiment disagree (for one or more diagnostics), it is unclear how to proceed. Bayesian statistics provides a framework to infer the DF, quantify errors, and reconcile discrepant diagnostic measurements. Diagnostic errors and ``weight functions" that describe the phase space...
Parametric statistical inference for discretely observed diffusion processes
DEFF Research Database (Denmark)
Pedersen, Asger Roer
Part 1: Theoretical results Part 2: Statistical applications of Gaussian diffusion processes in freshwater ecology......Part 1: Theoretical results Part 2: Statistical applications of Gaussian diffusion processes in freshwater ecology...
Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods
National Research Council Canada - National Science Library
Chang, LiWu
2003-01-01
.... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...
PREFACE: ELC International Meeting on Inference, Computation, and Spin Glasses (ICSG2013)
Kabashima, Yoshiyuki; Hukushima, Koji; Inoue, Jun-ichi; Tanaka, Toshiyuki; Watanabe, Osamu
2013-12-01
The close relationship between probability-based inference and statistical mechanics of disordered systems has been noted for some time. This relationship has provided researchers with a theoretical foundation in various fields of information processing for analytical performance evaluation and construction of efficient algorithms based on message-passing or Monte Carlo sampling schemes. The ELC International Meeting on 'Inference, Computation, and Spin Glasses (ICSG2013)', was held in Sapporo 28-30 July 2013. The meeting was organized as a satellite meeting of STATPHYS25 in order to offer a forum where concerned researchers can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies between statistical mechanics and information sciences. Financial support from Grant-in-Aid for Scientific Research on Innovative Areas, MEXT, Japan 'Exploring the Limits of Computation (ELC)' is gratefully acknowledged. We are pleased to publish 23 papers contributed by invited speakers of ICSG2013 in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of this highly vigorous interdisciplinary field between statistical mechanics and information/computer science. Editors and ICSG2013 Organizing Committee: Koji Hukushima Jun-ichi Inoue (Local Chair of ICSG2013) Yoshiyuki Kabashima (Editor-in-Chief) Toshiyuki Tanaka Osamu Watanabe (General Chair of ICSG2013)
Polynomial Chaos Surrogates for Bayesian Inference
Le Maitre, Olivier
2016-01-06
The Bayesian inference is a popular probabilistic method to solve inverse problems, such as the identification of field parameter in a PDE model. The inference rely on the Bayes rule to update the prior density of the sought field, from observations, and derive its posterior distribution. In most cases the posterior distribution has no explicit form and has to be sampled, for instance using a Markov-Chain Monte Carlo method. In practice the prior field parameter is decomposed and truncated (e.g. by means of Karhunen- Lo´eve decomposition) to recast the inference problem into the inference of a finite number of coordinates. Although proved effective in many situations, the Bayesian inference as sketched above faces several difficulties requiring improvements. First, sampling the posterior can be a extremely costly task as it requires multiple resolutions of the PDE model for different values of the field parameter. Second, when the observations are not very much informative, the inferred parameter field can highly depends on its prior which can be somehow arbitrary. These issues have motivated the introduction of reduced modeling or surrogates for the (approximate) determination of the parametrized PDE solution and hyperparameters in the description of the prior field. Our contribution focuses on recent developments in these two directions: the acceleration of the posterior sampling by means of Polynomial Chaos expansions and the efficient treatment of parametrized covariance functions for the prior field. We also discuss the possibility of making such approach adaptive to further improve its efficiency.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Beginning statistics with data analysis
Mosteller, Frederick; Rourke, Robert EK
2013-01-01
This introduction to the world of statistics covers exploratory data analysis, methods for collecting data, formal statistical inference, and techniques of regression and analysis of variance. 1983 edition.
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Mixed normal inference on multicointegration
Boswijk, H.P.
2009-01-01
Asymptotic likelihood analysis of cointegration in I(2) models, see Johansen (1997, 2006), Boswijk (2000) and Paruolo (2000), has shown that inference on most parameters is mixed normal, implying hypothesis test statistics with an asymptotic 2 null distribution. The asymptotic distribution of the
Spurious correlations and inference in landscape genetics
Samuel A. Cushman; Erin L. Landguth
2010-01-01
Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...
Inference for shared-frailty survival models with left-truncated data
van den Berg, G.J.; Drepper, B.
2016-01-01
Shared-frailty survival models specify that systematic unobserved determinants of duration outcomes are identical within groups of individuals. We consider random-effects likelihood-based statistical inference if the duration data are subject to left-truncation. Such inference with left-truncated
On the criticality of inferred models
Mastromatteo, Iacopo; Marsili, Matteo
2011-10-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality.
On the criticality of inferred models
International Nuclear Information System (INIS)
Mastromatteo, Iacopo; Marsili, Matteo
2011-01-01
Advanced inference techniques allow one to reconstruct a pattern of interaction from high dimensional data sets, from probing simultaneously thousands of units of extended systems—such as cells, neural tissues and financial markets. We focus here on the statistical properties of inferred models and argue that inference procedures are likely to yield models which are close to singular values of parameters, akin to critical points in physics where phase transitions occur. These are points where the response of physical systems to external perturbations, as measured by the susceptibility, is very large and diverges in the limit of infinite size. We show that the reparameterization invariant metrics in the space of probability distributions of these models (the Fisher information) are directly related to the susceptibility of the inferred model. As a result, distinguishable models tend to accumulate close to critical points, where the susceptibility diverges in infinite systems. This region is the one where the estimate of inferred parameters is most stable. In order to illustrate these points, we discuss inference of interacting point processes with application to financial data and show that sensible choices of observation time scales naturally yield models which are close to criticality
Walker, Martin; Basáñez, María-Gloria; Ouédraogo, André Lin; Hermsen, Cornelus; Bousema, Teun; Churcher, Thomas S
2015-01-16
Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens.
Working with sample data exploration and inference
Chaffe-Stengel, Priscilla
2014-01-01
Managers and analysts routinely collect and examine key performance measures to better understand their operations and make good decisions. Being able to render the complexity of operations data into a coherent account of significant events requires an understanding of how to work well with raw data and to make appropriate inferences. Although some statistical techniques for analyzing data and making inferences are sophisticated and require specialized expertise, there are methods that are understandable and applicable by anyone with basic algebra skills and the support of a spreadsheet package. By applying these fundamental methods themselves rather than turning over both the data and the responsibility for analysis and interpretation to an expert, managers will develop a richer understanding and potentially gain better control over their environment. This text is intended to describe these fundamental statistical techniques to managers, data analysts, and students. Statistical analysis of sample data is enh...
Multi-Agent Inference in Social Networks: A Finite Population Learning Approach.
Fan, Jianqing; Tong, Xin; Zeng, Yao
When people in a society want to make inference about some parameter, each person may want to use data collected by other people. Information (data) exchange in social networks is usually costly, so to make reliable statistical decisions, people need to trade off the benefits and costs of information acquisition. Conflicts of interests and coordination problems will arise in the process. Classical statistics does not consider people's incentives and interactions in the data collection process. To address this imperfection, this work explores multi-agent Bayesian inference problems with a game theoretic social network model. Motivated by our interest in aggregate inference at the societal level, we propose a new concept, finite population learning , to address whether with high probability, a large fraction of people in a given finite population network can make "good" inference. Serving as a foundation, this concept enables us to study the long run trend of aggregate inference quality as population grows.
Kim, Daesang
2016-01-06
A new Bayesian inference method has been developed and applied to Furan shock tube experimental data for efficient statistical inferences of the Arrhenius parameters of two OH radical consumption reactions. The collected experimental data, which consist of time series signals of OH radical concentrations of 14 shock tube experiments, may require several days for MCMC computations even with the support of a fast surrogate of the combustion simulation model, while the new method reduces it to several hours by splitting the process into two steps of MCMC: the first inference of rate constants and the second inference of the Arrhenius parameters. Each step has low dimensional parameter spaces and the second step does not need the executions of the combustion simulation. Furthermore, the new approach has more flexibility in choosing the ranges of the inference parameters, and the higher speed and flexibility enable the more accurate inferences and the analyses of the propagation of errors in the measured temperatures and the alignment of the experimental time to the inference results.
Bansal, Ravi; Peterson, Bradley S
2018-06-01
Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal
Kernel methods and flexible inference for complex stochastic dynamics
Capobianco, Enrico
2008-07-01
Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.
QInfer: Statistical inference software for quantum applications
Directory of Open Access Journals (Sweden)
Christopher Granade
2017-04-01
Full Text Available Characterizing quantum systems through experimental data is critical to applications as diverse as metrology and quantum computing. Analyzing this experimental data in a robust and reproducible manner is made challenging, however, by the lack of readily-available software for performing principled statistical analysis. We improve the robustness and reproducibility of characterization by introducing an open-source library, QInfer, to address this need. Our library makes it easy to analyze data from tomography, randomized benchmarking, and Hamiltonian learning experiments either in post-processing, or online as data is acquired. QInfer also provides functionality for predicting the performance of proposed experimental protocols from simulated runs. By delivering easy-to-use characterization tools based on principled statistical analysis, QInfer helps address many outstanding challenges facing quantum technology.
Approximation and inference methods for stochastic biochemical kinetics—a tutorial review
International Nuclear Information System (INIS)
Schnoerr, David; Grima, Ramon; Sanguinetti, Guido
2017-01-01
Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the chemical master equation. Despite its simple structure, no analytic solutions to the chemical master equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics. (topical review)
Sileshi, G
2006-10-01
Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
Jacob, Bridgette L.
2013-01-01
The difficulties introductory statistics students have with formal statistical inference are well known in the field of statistics education. "Informal" statistical inference has been studied as a means to introduce inferential reasoning well before and without the formalities of formal statistical inference. This mixed methods study…
TYPE Ia SUPERNOVA LIGHT-CURVE INFERENCE: HIERARCHICAL BAYESIAN ANALYSIS IN THE NEAR-INFRARED
International Nuclear Information System (INIS)
Mandel, Kaisey S.; Friedman, Andrew S.; Kirshner, Robert P.; Wood-Vasey, W. Michael
2009-01-01
We present a comprehensive statistical analysis of the properties of Type Ia supernova (SN Ia) light curves in the near-infrared using recent data from Peters Automated InfraRed Imaging TELescope and the literature. We construct a hierarchical Bayesian framework, incorporating several uncertainties including photometric error, peculiar velocities, dust extinction, and intrinsic variations, for principled and coherent statistical inference. SN Ia light-curve inferences are drawn from the global posterior probability of parameters describing both individual supernovae and the population conditioned on the entire SN Ia NIR data set. The logical structure of the hierarchical model is represented by a directed acyclic graph. Fully Bayesian analysis of the model and data is enabled by an efficient Markov Chain Monte Carlo algorithm exploiting the conditional probabilistic structure using Gibbs sampling. We apply this framework to the JHK s SN Ia light-curve data. A new light-curve model captures the observed J-band light-curve shape variations. The marginal intrinsic variances in peak absolute magnitudes are σ(M J ) = 0.17 ± 0.03, σ(M H ) = 0.11 ± 0.03, and σ(M Ks ) = 0.19 ± 0.04. We describe the first quantitative evidence for correlations between the NIR absolute magnitudes and J-band light-curve shapes, and demonstrate their utility for distance estimation. The average residual in the Hubble diagram for the training set SNe at cz > 2000kms -1 is 0.10 mag. The new application of bootstrap cross-validation to SN Ia light-curve inference tests the sensitivity of the statistical model fit to the finite sample and estimates the prediction error at 0.15 mag. These results demonstrate that SN Ia NIR light curves are as effective as corrected optical light curves, and, because they are less vulnerable to dust absorption, they have great potential as precise and accurate cosmological distance indicators.
Yucel, Abdulkadir C.; Bagci, Hakan; Michielssen, Eric
2015-01-01
An efficient method for statistically characterizing multiconductor transmission line (MTL) networks subject to a large number of manufacturing uncertainties is presented. The proposed method achieves its efficiency by leveraging a high
Stochastic processes inference theory
Rao, Malempati M
2014-01-01
This is the revised and enlarged 2nd edition of the authors’ original text, which was intended to be a modest complement to Grenander's fundamental memoir on stochastic processes and related inference theory. The present volume gives a substantial account of regression analysis, both for stochastic processes and measures, and includes recent material on Ridge regression with some unexpected applications, for example in econometrics. The first three chapters can be used for a quarter or semester graduate course on inference on stochastic processes. The remaining chapters provide more advanced material on stochastic analysis suitable for graduate seminars and discussions, leading to dissertation or research work. In general, the book will be of interest to researchers in probability theory, mathematical statistics and electrical and information theory.
DEFF Research Database (Denmark)
Andersen, J.S.; Bedaux, J.J.M.; Kooijman, S.A.L.M.
2000-01-01
This paper describes the influence of design characteristics on the statistical inference for an ecotoxicological hazard-based model using simulated survival data. The design characteristics of interest are the number and spacing of observations (counts) in time, the number and spacing of exposure...... concentrations (within c(min) and c(max)), and the initial number of individuals at time 0 in each concentration. A comparison of the coverage probabilities for confidence limits arising from the profile-likelihood approach and the Wald-based approach is carried out. The Wald-based approach is very sensitive...
Integrating distributed Bayesian inference and reinforcement learning for sensor management
Grappiolo, C.; Whiteson, S.; Pavlin, G.; Bakker, B.
2009-01-01
This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically
Efficient design and inference in distributed Bayesian networks: an overview
de Oude, P.; Groen, F.C.A.; Pavlin, G.; Bezhanishvili, N.; Löbner, S.; Schwabe, K.; Spada, L.
2011-01-01
This paper discusses an approach to distributed Bayesian modeling and inference, which is relevant for an important class of contemporary real world situation assessment applications. By explicitly considering the locality of causal relations, the presented approach (i) supports coherent distributed
Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression.
Houpt, Joseph W; Bittner, Jennifer L
2018-05-10
Ideal observer analysis is a fundamental tool used widely in vision science for analyzing the efficiency with which a cognitive or perceptual system uses available information. The performance of an ideal observer provides a formal measure of the amount of information in a given experiment. The ratio of human to ideal performance is then used to compute efficiency, a construct that can be directly compared across experimental conditions while controlling for the differences due to the stimuli and/or task specific demands. In previous research using ideal observer analysis, the effects of varying experimental conditions on efficiency have been tested using ANOVAs and pairwise comparisons. In this work, we present a model that combines Bayesian estimates of psychometric functions with hierarchical logistic regression for inference about both unadjusted human performance metrics and efficiencies. Our approach improves upon the existing methods by constraining the statistical analysis using a standard model connecting stimulus intensity to human observer accuracy and by accounting for variability in the estimates of human and ideal observer performance scores. This allows for both individual and group level inferences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Forward and backward inference in spatial cognition.
Directory of Open Access Journals (Sweden)
Will D Penny
Full Text Available This paper shows that the various computations underlying spatial cognition can be implemented using statistical inference in a single probabilistic model. Inference is implemented using a common set of 'lower-level' computations involving forward and backward inference over time. For example, to estimate where you are in a known environment, forward inference is used to optimally combine location estimates from path integration with those from sensory input. To decide which way to turn to reach a goal, forward inference is used to compute the likelihood of reaching that goal under each option. To work out which environment you are in, forward inference is used to compute the likelihood of sensory observations under the different hypotheses. For reaching sensory goals that require a chaining together of decisions, forward inference can be used to compute a state trajectory that will lead to that goal, and backward inference to refine the route and estimate control signals that produce the required trajectory. We propose that these computations are reflected in recent findings of pattern replay in the mammalian brain. Specifically, that theta sequences reflect decision making, theta flickering reflects model selection, and remote replay reflects route and motor planning. We also propose a mapping of the above computational processes onto lateral and medial entorhinal cortex and hippocampus.
Cycle-Based Cluster Variational Method for Direct and Inverse Inference
Furtlehner, Cyril; Decelle, Aurélien
2016-08-01
Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to 10^5 are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.
Is Cognitive Activity of Speech Based On Statistical Independence?
DEFF Research Database (Denmark)
Feng, Ling; Hansen, Lars Kai
2008-01-01
This paper explores the generality of COgnitive Component Analysis (COCA), which is defined as the process of unsupervised grouping of data such that the ensuing group structure is well-aligned with that resulting from human cognitive activity. The hypothesis of {COCA} is ecological......: the essentially independent features in a context defined ensemble can be efficiently coded using a sparse independent component representation. Our devised protocol aims at comparing the performance of supervised learning (invoking cognitive activity) and unsupervised learning (statistical regularities) based...... on similar representations, and the only difference lies in the human inferred labels. Inspired by the previous research on COCA, we introduce a new pair of models, which directly employ the independent hypothesis. Statistical regularities are revealed at multiple time scales on phoneme, gender, age...
Statistical inference based on latent ability estimates
Hoijtink, H.J.A.; Boomsma, A.
The quality of approximations to first and second order moments (e.g., statistics like means, variances, regression coefficients) based on latent ability estimates is being discussed. The ability estimates are obtained using either the Rasch, oi the two-parameter logistic model. Straightforward use
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-03-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data-space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper we use massive asymptotically-optimal data compression to reduce the dimensionality of the data-space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parameterized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate Density Estimation Likelihood-Free Inference with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological datasets.
Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben
2017-09-15
Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
International Nuclear Information System (INIS)
Norman, S.
1992-04-01
The origin of this study was to find a good, or even the best, stochastic model for the hydraulic conductivity field at the Finnsjoe site. The conductivity field in question are regularized, that is upscaled. The reason for performing regularization of measurement data is primarily the need for long correlation scales. This is needed in order to model reasonably large domains that can be used when describing regional groundwater flow accurately. A theory of regularization is discussed in this report. In order to find the best model, jacknifing is employed to compare different stochastic models. The theory for this method is described. In the act of doing so we also take a look at linear predictor theory, so called kriging, and include a general discussion of stochastic functions and intrinsic random functions. The statistical inference methods for finding the models are also described, in particular regression, iterative generalized regression (IGLSE) and non-parametric variogram estimators. A large amount of results is presented for a regularization scale of 36 metre. (30 refs.) (au)
Problem solving and inference mechanisms
Energy Technology Data Exchange (ETDEWEB)
Furukawa, K; Nakajima, R; Yonezawa, A; Goto, S; Aoyama, A
1982-01-01
The heart of the fifth generation computer will be powerful mechanisms for problem solving and inference. A deduction-oriented language is to be designed, which will form the core of the whole computing system. The language is based on predicate logic with the extended features of structuring facilities, meta structures and relational data base interfaces. Parallel computation mechanisms and specialized hardware architectures are being investigated to make possible efficient realization of the language features. The project includes research into an intelligent programming system, a knowledge representation language and system, and a meta inference system to be built on the core. 30 references.
Directory of Open Access Journals (Sweden)
Long Ho
2018-03-01
Full Text Available Emerging global threats, such as climate change, urbanization and water depletion, are driving forces for finding a feasible substitute for low cost-effective conventional activated sludge (AS technology. On the other hand, given their low cost and easy operation, nature-based systems such as constructed wetlands (CWs and waste stabilization ponds (WSPs appear to be viable options. To examine these systems, a 210-day experiment with 31 days of peak load scenario was performed. Particularly, we conducted a deliberate strategy of experimentation, which includes applying a preliminary study, preliminary models, hypothetical tests and power analysis to compare their removal efficiencies and resilience capacities. In contrast to comparable high removal efficiencies of organic matter—around 90%—both natural systems showed moderate nutrient removal efficiencies, which inferred the necessity for further treatment to ensure their compliance with environmental standards. During the peak period, the pond treatment systems appeared to be the most robust as they indicated a higher strength to withstanding the organic matter and nitrogen shock load and were able to recover within a short period. However, high demand of land—2.5 times larger than that of AS—is a major concern of the applicability of WSPs despite their lower operation and maintenance (O&M costs. It is also worth noting that initial efforts on systematic experimentation appeared to have an essential impact on ensuring statistically and practically meaningful results in this comparison study.
On Maximum Entropy and Inference
Directory of Open Access Journals (Sweden)
Luigi Gresele
2017-11-01
Full Text Available Maximum entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics directly from the data, once a model is identified by Bayesian model selection. We explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.
Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A
2012-01-01
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Meaningful mediation analysis : Plausible causal inference and informative communication
Pieters, Rik
2017-01-01
Statistical mediation analysis has become the technique of choice in consumer research to make causal inferences about the influence of a treatment on an outcome via one or more mediators. This tutorial aims to strengthen two weak links that impede statistical mediation analysis from reaching its
Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr
2012-05-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Population-based statistical inference for temporal sequence of somatic mutations in cancer genomes.
Rhee, Je-Keun; Kim, Tae-Min
2018-04-20
It is well recognized that accumulation of somatic mutations in cancer genomes plays a role in carcinogenesis; however, the temporal sequence and evolutionary relationship of somatic mutations remain largely unknown. In this study, we built a population-based statistical framework to infer the temporal sequence of acquisition of somatic mutations. Using the model, we analyzed the mutation profiles of 1954 tumor specimens across eight tumor types. As a result, we identified tumor type-specific directed networks composed of 2-15 cancer-related genes (nodes) and their mutational orders (edges). The most common ancestors identified in pairwise comparison of somatic mutations were TP53 mutations in breast, head/neck, and lung cancers. The known relationship of KRAS to TP53 mutations in colorectal cancers was identified, as well as potential ancestors of TP53 mutation such as NOTCH1, EGFR, and PTEN mutations in head/neck, lung and endometrial cancers, respectively. We also identified apoptosis-related genes enriched with ancestor mutations in lung cancers and a relationship between APC hotspot mutations and TP53 mutations in colorectal cancers. While evolutionary analysis of cancers has focused on clonal versus subclonal mutations identified in individual genomes, our analysis aims to further discriminate ancestor versus descendant mutations in population-scale mutation profiles that may help select cancer drivers with clinical relevance.
Borsboom, D.; Haig, B.D.
2013-01-01
Unlike most other statistical frameworks, Bayesian statistical inference is wedded to a particular approach in the philosophy of science (see Howson & Urbach, 2006); this approach is called Bayesianism. Rather than being concerned with model fitting, this position in the philosophy of science
Nonparametric predictive inference in statistical process control
Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.
2004-01-01
Statistical process control (SPC) is used to decide when to stop a process as confidence in the quality of the next item(s) is low. Information to specify a parametric model is not always available, and as SPC is of a predictive nature, we present a control chart developed using nonparametric
Inference in partially identified models with many moment inequalities using Lasso
DEFF Research Database (Denmark)
Bugni, Federico A.; Caner, Mehmet; Kock, Anders Bredahl
This paper considers the problem of inference in a partially identified moment (in)equality model with possibly many moment inequalities. Our contribution is to propose a novel two-step new inference method based on the combination of two ideas. On the one hand, our test statistic and critical...
Davison, Anthony C.; Huser, Raphaë l
2015-01-01
Statistics of extremes concerns inference for rare events. Often the events have never yet been observed, and their probabilities must therefore be estimated by extrapolation of tail models fitted to available data. Because data concerning the event
Information Geometry, Inference Methods and Chaotic Energy Levels Statistics
Cafaro, Carlo
2008-01-01
In this Letter, we propose a novel information-geometric characterization of chaotic (integrable) energy level statistics of a quantum antiferromagnetic Ising spin chain in a tilted (transverse) external magnetic field. Finally, we conjecture our results might find some potential physical applications in quantum energy level statistics.
A statistical approach to the analysis of merger and acquisition efficiency in the Russian industry
Directory of Open Access Journals (Sweden)
Karelina M.
2017-01-01
Full Text Available At present, the success of economic institution transformations, as well as creating an efficient economic system with a fundamental new nature of corporate relationships are impossible without the statistical recording of factors contributing to the efficiency of merger and acquisition transactions in the Russian industry. The paper proposes a method for analyzing the efficiency of merger and acquisition transactions of enterprises in the industrial sector of the Russian economy, based on simulation methods. The methodical approach developed to analyze the efficiency of the integration transactions of Russian industrial companies allows one to consider individual preferences of investors, as well as to give a complex statistical evaluation of the strategic economic benefits from M&A transactions. This method enables to evaluate the probability and stability of the synergistic effect values within the increase of competitiveness of Russian industrial enterprises on the domestic and foreign markets.
Finite-sample instrumental variables inference using an asymptotically pivotal statistic
Bekker, P; Kleibergen, F
2003-01-01
We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a
Eight challenges in phylodynamic inference
Directory of Open Access Journals (Sweden)
Simon D.W. Frost
2015-03-01
Full Text Available The field of phylodynamics, which attempts to enhance our understanding of infectious disease dynamics using pathogen phylogenies, has made great strides in the past decade. Basic epidemiological and evolutionary models are now well characterized with inferential frameworks in place. However, significant challenges remain in extending phylodynamic inference to more complex systems. These challenges include accounting for evolutionary complexities such as changing mutation rates, selection, reassortment, and recombination, as well as epidemiological complexities such as stochastic population dynamics, host population structure, and different patterns at the within-host and between-host scales. An additional challenge exists in making efficient inferences from an ever increasing corpus of sequence data.
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
Estimation and inference in the same-different test
DEFF Research Database (Denmark)
Christensen, Rune Haubo Bojesen; Brockhoff, Per B.
2009-01-01
as well as similarity. We show that the likelihood root statistic is equivalent to the well known G(2) likelihood ratio statistic for tests of no difference. As an additional practical tool, we introduce the profile likelihood curve to provide a convenient graphical summary of the information in the data......Inference for the Thurstonian delta in the same-different protocol via the well known Wald statistic is shown to be inappropriate in a wide range of situations. We introduce the likelihood root statistic as an alternative to the Wald statistic to produce CIs and p-values for assessing difference...
Content-based VLE designs improve learning efficiency in constructivist statistics education.
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific-purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under investigation. The findings demonstrate that a
Content-based VLE designs improve learning efficiency in constructivist statistics education.
Directory of Open Access Journals (Sweden)
Patrick Wessa
Full Text Available BACKGROUND: We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses, which required us to develop a specific-purpose Statistical Learning Environment (SLE based on Reproducible Computing and newly developed Peer Review (PR technology. OBJECTIVES: The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. METHODS: Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. RESULTS: The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student
Content-Based VLE Designs Improve Learning Efficiency in Constructivist Statistics Education
Wessa, Patrick; De Rycker, Antoon; Holliday, Ian Edward
2011-01-01
Background We introduced a series of computer-supported workshops in our undergraduate statistics courses, in the hope that it would help students to gain a deeper understanding of statistical concepts. This raised questions about the appropriate design of the Virtual Learning Environment (VLE) in which such an approach had to be implemented. Therefore, we investigated two competing software design models for VLEs. In the first system, all learning features were a function of the classical VLE. The second system was designed from the perspective that learning features should be a function of the course's core content (statistical analyses), which required us to develop a specific–purpose Statistical Learning Environment (SLE) based on Reproducible Computing and newly developed Peer Review (PR) technology. Objectives The main research question is whether the second VLE design improved learning efficiency as compared to the standard type of VLE design that is commonly used in education. As a secondary objective we provide empirical evidence about the usefulness of PR as a constructivist learning activity which supports non-rote learning. Finally, this paper illustrates that it is possible to introduce a constructivist learning approach in large student populations, based on adequately designed educational technology, without subsuming educational content to technological convenience. Methods Both VLE systems were tested within a two-year quasi-experiment based on a Reliable Nonequivalent Group Design. This approach allowed us to draw valid conclusions about the treatment effect of the changed VLE design, even though the systems were implemented in successive years. The methodological aspects about the experiment's internal validity are explained extensively. Results The effect of the design change is shown to have substantially increased the efficiency of constructivist, computer-assisted learning activities for all cohorts of the student population under
Reichstein, M.; Beer, C.; Kuglitsch, F.; Papale, D.; Soussana, J. A.; Janssens, I.; Ciais, P.; Baldocchi, D.; Buchmann, N.; Verbeeck, H.; Ceulemans, R.; Moors, E.; Köstner, B.; Schulze, D.; Knohl, A.; Law, B. E.
2007-12-01
In this presentation we discuss ways to infer and to interpret water-use efficiency at ecosystem level (WUEe) from eddy covariance flux data and possibilities for scaling these patterns to regional and continental scale. In particular we convey the following: WUEe may be computed as a ratio of integrated fluxes or as the slope of carbon versus water fluxes offering different chances for interpretation. If computed from net ecosystem exchange and evapotranspiration on has to take of counfounding effects of respiration and soil evaporation. WUEe time-series at diurnal and seasonal scale is a valuable ecosystem physiological diagnostic for example about ecosystem-level responses to drought. Most often WUEe decreases during dry periods. The mean growing season ecosystem water-use efficiency of gross carbon uptake (WUEGPP) is highest in temperate broad-leaved deciduous forests, followed by temperate mixed forests, temperate evergreen conifers, Mediterranean broad-leaved deciduous forests, Mediterranean broad-leaved evergreen forests and Mediterranean evergreen conifers and boreal, grassland and tundra ecosystems. Water-use efficiency exhibits a temporally quite conservative relation with atmospheric water vapor pressure deficit (VPD) that is modified between sites by leaf area index (LAI) and soil quality, such that WUEe increases with LAI and soil water holding capacity which is related to texture. This property and tight coupling between carbon and water cycles is used to estimate catchment-scale water-use efficiency and primary productivity by integration of space-borne earth observation and river discharge data.
IMAGINE: Interstellar MAGnetic field INference Engine
Steininger, Theo
2018-03-01
IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
Introduction to statistical inference and its applications with R
Trosset, Michael W
2009-01-01
ExperimentsExamples Randomization The Importance of Probability Games of Chance Mathematical Preliminaries Sets Counting Functions Limits Probability Interpretations of Probability Axioms of Probability Finite Sample Spaces Conditional Probability Random VariablesCase Study: Padrolling in Milton Murayama's All I asking for is my bodyDiscrete Random VariablesBasic Concepts Examples Expectation Binomial DistributionsContinuous Random Variables A Motivating Example Basic Concepts Elementary Examples Normal Distributions Normal Sampling DistributionsQuantifying Population Attributes Symmetry Quantiles The Method of Least SquaresData The Plug-In Principle Plug-In Estimates of Mean and Variance Plug-In Estimates of Quantiles Kernel Density Estimates Case Study: Are Forearm Lengths Normally Distributed? TransformationsLots of Data Averaging Decreases Variation The Weak Law of Large Numbers The Central Limit TheoremInferenceA Motivating Example Point EstimationHeuristics of Hypothesis Testing Testing Hypotheses about...
Mocapy++ - a toolkit for inference and learning in dynamic Bayesian networks
DEFF Research Database (Denmark)
Paluszewski, Martin; Hamelryck, Thomas Wim
2010-01-01
Background Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations...
Directory of Open Access Journals (Sweden)
Marco Antonio Cunha de Oliveira
2009-06-01
Full Text Available O problema de como alocar recursos de forma eficiente tem sido uma das questões fundamentais em Finanças. Se os fatores domésticos tendem a fazer com que os ativos num determinado mercado se movimentem em conjunto, os investidores procuram diversificar o risco nacional pela aplicação em outros mercados. Este tema tem sido tipicamente analisado no contexto retorno-risco, entretanto, um dos maiores problemas é por não reconhecer a incerteza nos parâmetros de entrada, dando origem ao risco de estimação. Este trabalho analisa se a alocação em ações de outros países da América Latina permite melhorar a fronteira eficiente sob o ponto de vista do investidor brasileiro. É utilizada a combinação de inferências estatísticas propostas por Britten-Jones (1999 para portfólios de tangência e Kempf e Memmel (2006 para o portfólio de risco mínimo global. Os resultados permitiram verificar que a adição do investimento em ações de outros países latinos melhoraria a fronteira eficiente sob o ponto de vista do investidor local, com pesos estatisticamente significantes.Allocating resources efficiently has been one of the major issues in Finance. If domestic factors are the key reasons for local assets to move together, the investor should search for other markets in order to diversify the local risk. This topic has been analyzed considering the risk-return tradeoff. However, one of the main problems is not taking the uncertainty input parameters into account triggering estimation risk concerns. This article analyzes whether the inclusion of stocks from other Latin American countries improve the efficient frontier from a Brazilian investor's point of view. The combination of inferences about tangency portfolio (Britten-Jones, 1999 and global minimum risk portfolio (Kempf e Memmel, 2006 was implemented. From the results, it can be concluded that the inclusion of other Latin American stocks would improve the efficient frontier for local
Structured statistical models of inductive reasoning.
Kemp, Charles; Tenenbaum, Joshua B
2009-01-01
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
Statistical aspects of determinantal point processes
DEFF Research Database (Denmark)
Lavancier, Frédéric; Møller, Jesper; Rubak, Ege
The statistical aspects of determinantal point processes (DPPs) seem largely unexplored. We review the appealing properties of DDPs, demonstrate that they are useful models for repulsiveness, detail a simulation procedure, and provide freely available software for simulation and statistical infer...
Maximum Likelihood Estimation and Inference With Examples in R, SAS and ADMB
Millar, Russell B
2011-01-01
This book takes a fresh look at the popular and well-established method of maximum likelihood for statistical estimation and inference. It begins with an intuitive introduction to the concepts and background of likelihood, and moves through to the latest developments in maximum likelihood methodology, including general latent variable models and new material for the practical implementation of integrated likelihood using the free ADMB software. Fundamental issues of statistical inference are also examined, with a presentation of some of the philosophical debates underlying the choice of statis
Jansma, J Martijn; de Zwart, Jacco A; van Gelderen, Peter; Duyn, Jeff H; Drevets, Wayne C; Furey, Maura L
2013-05-15
Technical developments in MRI have improved signal to noise, allowing use of analysis methods such as Finite impulse response (FIR) of rapid event related functional MRI (er-fMRI). FIR is one of the most informative analysis methods as it determines onset and full shape of the hemodynamic response function (HRF) without any a priori assumptions. FIR is however vulnerable to multicollinearity, which is directly related to the distribution of stimuli over time. Efficiency can be optimized by simplifying a design, and restricting stimuli distribution to specific sequences, while more design flexibility necessarily reduces efficiency. However, the actual effect of efficiency on fMRI results has never been tested in vivo. Thus, it is currently difficult to make an informed choice between protocol flexibility and statistical efficiency. The main goal of this study was to assign concrete fMRI signal to noise values to the abstract scale of FIR statistical efficiency. Ten subjects repeated a perception task with five random and m-sequence based protocol, with varying but, according to literature, acceptable levels of multicollinearity. Results indicated substantial differences in signal standard deviation, while the level was a function of multicollinearity. Experiment protocols varied up to 55.4% in standard deviation. Results confirm that quality of fMRI in an FIR analysis can significantly and substantially vary with statistical efficiency. Our in vivo measurements can be used to aid in making an informed decision between freedom in protocol design and statistical efficiency. Published by Elsevier B.V.
Causal inference in biology networks with integrated belief propagation.
Chang, Rui; Karr, Jonathan R; Schadt, Eric E
2015-01-01
Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.
Shaikh, Muhammad Mujtaba; Memon, Abdul Jabbar; Hussain, Manzoor
2016-09-01
In this article, we describe details of the data used in the research paper "Confidence bounds for energy conservation in electric motors: An economical solution using statistical techniques" [1]. The data presented in this paper is intended to show benefits of high efficiency electric motors over the standard efficiency motors of similar rating in the industrial sector of Pakistan. We explain how the data was collected and then processed by means of formulas to show cost effectiveness of energy efficient motors in terms of three important parameters: annual energy saving, cost saving and payback periods. This data can be further used to construct confidence bounds for the parameters using statistical techniques as described in [1].
Inference and Analysis of Population Structure Using Genetic Data and Network Theory.
Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli
2016-04-01
Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
Human Inferences about Sequences: A Minimal Transition Probability Model.
Directory of Open Access Journals (Sweden)
Florent Meyniel
2016-12-01
Full Text Available The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge.
Topics in theoretical and applied statistics
Giommi, Andrea
2016-01-01
This book highlights the latest research findings from the 46th International Meeting of the Italian Statistical Society (SIS) in Rome, during which both methodological and applied statistical research was discussed. This selection of fully peer-reviewed papers, originally presented at the meeting, addresses a broad range of topics, including the theory of statistical inference; data mining and multivariate statistical analysis; survey methodologies; analysis of social, demographic and health data; and economic statistics and econometrics.
Fuzzy statistical decision-making theory and applications
Kabak, Özgür
2016-01-01
This book offers a comprehensive reference guide to fuzzy statistics and fuzzy decision-making techniques. It provides readers with all the necessary tools for making statistical inference in the case of incomplete information or insufficient data, where classical statistics cannot be applied. The respective chapters, written by prominent researchers, explain a wealth of both basic and advanced concepts including: fuzzy probability distributions, fuzzy frequency distributions, fuzzy Bayesian inference, fuzzy mean, mode and median, fuzzy dispersion, fuzzy p-value, and many others. To foster a better understanding, all the chapters include relevant numerical examples or case studies. Taken together, they form an excellent reference guide for researchers, lecturers and postgraduate students pursuing research on fuzzy statistics. Moreover, by extending all the main aspects of classical statistical decision-making to its fuzzy counterpart, the book presents a dynamic snapshot of the field that is expected to stimu...
Indian Academy of Sciences (India)
inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.
Humans make efficient use of natural image statistics when performing spatial interpolation.
D'Antona, Anthony D; Perry, Jeffrey S; Geisler, Wilson S
2013-12-16
Visual systems learn through evolution and experience over the lifespan to exploit the statistical structure of natural images when performing visual tasks. Understanding which aspects of this statistical structure are incorporated into the human nervous system is a fundamental goal in vision science. To address this goal, we measured human ability to estimate the intensity of missing image pixels in natural images. Human estimation accuracy is compared with various simple heuristics (e.g., local mean) and with optimal observers that have nearly complete knowledge of the local statistical structure of natural images. Human estimates are more accurate than those of simple heuristics, and they match the performance of an optimal observer that knows the local statistical structure of relative intensities (contrasts). This optimal observer predicts the detailed pattern of human estimation errors and hence the results place strong constraints on the underlying neural mechanisms. However, humans do not reach the performance of an optimal observer that knows the local statistical structure of the absolute intensities, which reflect both local relative intensities and local mean intensity. As predicted from a statistical analysis of natural images, human estimation accuracy is negligibly improved by expanding the context from a local patch to the whole image. Our results demonstrate that the human visual system exploits efficiently the statistical structure of natural images.
A human genome-wide library of local phylogeny predictions for whole-genome inference problems
Directory of Open Access Journals (Sweden)
Schwartz Russell
2008-08-01
Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.
Predictive models for PEM-electrolyzer performance using adaptive neuro-fuzzy inference systems
Energy Technology Data Exchange (ETDEWEB)
Becker, Steffen [University of Tasmania, Hobart 7001, Tasmania (Australia); Karri, Vishy [Australian College of Kuwait (Kuwait)
2010-09-15
Predictive models were built using neural network based Adaptive Neuro-Fuzzy Inference Systems for hydrogen flow rate, electrolyzer system-efficiency and stack-efficiency respectively. A comprehensive experimental database forms the foundation for the predictive models. It is argued that, due to the high costs associated with the hydrogen measuring equipment; these reliable predictive models can be implemented as virtual sensors. These models can also be used on-line for monitoring and safety of hydrogen equipment. The quantitative accuracy of the predictive models is appraised using statistical techniques. These mathematical models are found to be reliable predictive tools with an excellent accuracy of {+-}3% compared with experimental values. The predictive nature of these models did not show any significant bias to either over prediction or under prediction. These predictive models, built on a sound mathematical and quantitative basis, can be seen as a step towards establishing hydrogen performance prediction models as generic virtual sensors for wider safety and monitoring applications. (author)
Haneberg, W. C.
2017-12-01
Remote characterization of new landslides or areas of ongoing movement using differences in high resolution digital elevation models (DEMs) created through time, for example before and after major rains or earthquakes, is an attractive proposition. In the case of large catastrophic landslides, changes may be apparent enough that simple subtraction suffices. In other cases, statistical noise can obscure landslide signatures and place practical limits on detection. In ideal cases on land, GPS surveys of representative areas at the time of DEM creation can quantify the inherent errors. In less-than-ideal terrestrial cases and virtually all submarine cases, it may be impractical or impossible to independently estimate the DEM errors. Examining DEM difference statistics for areas reasonably inferred to have no change, however, can provide insight into the limits of detectability. Data from inferred no-change areas of airborne LiDAR DEM difference maps of the 2014 Oso, Washington landslide and landslide-prone colluvium slopes along the Ohio River valley in northern Kentucky, show that DEM difference maps can have non-zero mean and slope dependent error components consistent with published studies of DEM errors. Statistical thresholds derived from DEM difference error and slope data can help to distinguish between DEM differences that are likely real—and which may indicate landsliding—from those that are likely spurious or irrelevant. This presentation describes and compares two different approaches, one based upon a heuristic assumption about the proportion of the study area likely covered by new landslides and another based upon the amount of change necessary to ensure difference at a specified level of probability.
Yucel, Abdulkadir C.
2015-05-05
An efficient method for statistically characterizing multiconductor transmission line (MTL) networks subject to a large number of manufacturing uncertainties is presented. The proposed method achieves its efficiency by leveraging a high-dimensional model representation (HDMR) technique that approximates observables (quantities of interest in MTL networks, such as voltages/currents on mission-critical circuits) in terms of iteratively constructed component functions of only the most significant random variables (parameters that characterize the uncertainties in MTL networks, such as conductor locations and widths, and lumped element values). The efficiency of the proposed scheme is further increased using a multielement probabilistic collocation (ME-PC) method to compute the component functions of the HDMR. The ME-PC method makes use of generalized polynomial chaos (gPC) expansions to approximate the component functions, where the expansion coefficients are expressed in terms of integrals of the observable over the random domain. These integrals are numerically evaluated and the observable values at the quadrature/collocation points are computed using a fast deterministic simulator. The proposed method is capable of producing accurate statistical information pertinent to an observable that is rapidly varying across a high-dimensional random domain at a computational cost that is significantly lower than that of gPC or Monte Carlo methods. The applicability, efficiency, and accuracy of the method are demonstrated via statistical characterization of frequency-domain voltages in parallel wire, interconnect, and antenna corporate feed networks.
Making inference from wildlife collision data: inferring predator absence from prey strikes
Directory of Open Access Journals (Sweden)
Peter Caley
2017-02-01
Full Text Available Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.
Making inference from wildlife collision data: inferring predator absence from prey strikes.
Caley, Peter; Hosack, Geoffrey R; Barry, Simon C
2017-01-01
Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application.
Pelech, E. A.; McGrath, J.; Pederson, T.; Bernacchi, C.
2017-12-01
Increases in the global average temperature will consequently induce a higher occurrence of severe environmental conditions such as drought on arable land. To mitigate these threats, crops for fuel and food must be bred for higher water-use efficiencies (WUE). Defining genomic variation through high-throughput phenotypic analysis in field conditions has the potential to relieve the major bottleneck in linking desirable genetic traits to the associated phenotypic response. This can subsequently enable breeders to create new agricultural germplasm that supports the need for higher water-use efficient crops. From satellites to field-based aerial and ground sensors, the reflectance properties of vegetation measured by hyperspectral imaging is becoming a rapid high-throughput phenotyping technique. A variety of physiological traits can be inferred by regression analysis with leaf reflectance which is controlled by the properties and abundance of water, carbon, nitrogen and pigments. Although, given that the current established vegetation indices are designed to accentuate these properties from spectral reflectance, it becomes a challenge to infer relative measurements of WUE at a crop canopy scale without ground-truth data collection. This study aims to correlate established biomass and canopy-water-content indices with ground-truth data. Five bioenergy sorghum genotypes (Sorghum bicolor L. Moench) that have differences in WUE and wild-type Tobacco (Nicotiana tabacum var. Samsun) under irrigated and rainfed field conditions were examined. A linear regression analysis was conducted to determine if variation in canopy water content and biomass, driven by natural genotypic and artificial treatment influences, can be inferred using established vegetation indices. The results from this study will elucidate the ability of ground field-based hyperspectral imaging to assess variation in water content, biomass and water-use efficiency. This can lead to improved opportunities to
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Likelihood inference for unions of interacting discs
DEFF Research Database (Denmark)
Møller, Jesper; Helisova, K.
2010-01-01
This is probably the first paper which discusses likelihood inference for a random set using a germ-grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point...... process, where the germs are the centres and the marks are the associated radii of the discs. We propose to use a recent parametric class of interacting disc process models, where the minimal sufficient statistic depends on various geometric properties of the random set, and the density is specified......-based maximum likelihood inference and the effect of specifying different reference Poisson models....
HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR
International Nuclear Information System (INIS)
Schneider, Michael D.; Dawson, William A.; Hogg, David W.; Marshall, Philip J.; Bard, Deborah J.; Meyers, Joshua; Lang, Dustin
2015-01-01
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics
Inverse Ising inference with correlated samples
International Nuclear Information System (INIS)
Obermayer, Benedikt; Levine, Erel
2014-01-01
Correlations between two variables of a high-dimensional system can be indicative of an underlying interaction, but can also result from indirect effects. Inverse Ising inference is a method to distinguish one from the other. Essentially, the parameters of the least constrained statistical model are learned from the observed correlations such that direct interactions can be separated from indirect correlations. Among many other applications, this approach has been helpful for protein structure prediction, because residues which interact in the 3D structure often show correlated substitutions in a multiple sequence alignment. In this context, samples used for inference are not independent but share an evolutionary history on a phylogenetic tree. Here, we discuss the effects of correlations between samples on global inference. Such correlations could arise due to phylogeny but also via other slow dynamical processes. We present a simple analytical model to address the resulting inference biases, and develop an exact method accounting for background correlations in alignment data by combining phylogenetic modeling with an adaptive cluster expansion algorithm. We find that popular reweighting schemes are only marginally effective at removing phylogenetic bias, suggest a rescaling strategy that yields better results, and provide evidence that our conclusions carry over to the frequently used mean-field approach to the inverse Ising problem. (paper)
Bayesian structural inference for hidden processes
Strelioff, Christopher C.; Crutchfield, James P.
2014-04-01
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ɛ-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ɛ-machines, irrespective of estimated transition probabilities. Properties of ɛ-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.
Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J
2014-01-01
This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.
Approximate Inference and Deep Generative Models
CERN. Geneva
2018-01-01
Advances in deep generative models are at the forefront of deep learning research because of the promise they offer for allowing data-efficient learning, and for model-based reinforcement learning. In this talk I'll review a few standard methods for approximate inference and introduce modern approximations which allow for efficient large-scale training of a wide variety of generative models. Finally, I'll demonstrate several important application of these models to density estimation, missing data imputation, data compression and planning.
Pre-service primary school teachers’ knowledge of informal statistical inference
de Vetten, Arjen; Schoonenboom, Judith; Keijzer, Ronald; van Oers, Bert
2018-01-01
The ability to reason inferentially is increasingly important in today’s society. It is hypothesized here that engaging primary school students in informal statistical reasoning (ISI), defined as making generalizations without the use of formal statistical tests, will help them acquire the
Causal inference based on counterfactuals
Directory of Open Access Journals (Sweden)
Höfler M
2005-09-01
Full Text Available Abstract Background The counterfactual or potential outcome model has become increasingly standard for causal inference in epidemiological and medical studies. Discussion This paper provides an overview on the counterfactual and related approaches. A variety of conceptual as well as practical issues when estimating causal effects are reviewed. These include causal interactions, imperfect experiments, adjustment for confounding, time-varying exposures, competing risks and the probability of causation. It is argued that the counterfactual model of causal effects captures the main aspects of causality in health sciences and relates to many statistical procedures. Summary Counterfactuals are the basis of causal inference in medicine and epidemiology. Nevertheless, the estimation of counterfactual differences pose several difficulties, primarily in observational studies. These problems, however, reflect fundamental barriers only when learning from observations, and this does not invalidate the counterfactual concept.
The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees
DEFF Research Database (Denmark)
Sand, Andreas; Steel, Mike
2013-01-01
Evolutionary events such as incomplete lineage sorting and lateral gene transfers constitute major problems for inferring species trees from gene trees, as they can sometimes lead to gene trees which conflict with the underlying species tree. One particularly simple and efficient way to infer...... species trees from gene trees under such conditions is to combine three-taxon analyses for several genes using a majority vote approach. For incomplete lineage sorting this method is known to be statistically consistent; however, for lateral gene transfers it was recently shown that a zone...... of inconsistency exists for a specific four-taxon tree topology, and it was posed as an open question whether inconsistencies could exist for other four-taxon tree topologies? In this letter we analyze all remaining four-taxon topologies and show that no other inconsistencies exist....
Parametric inference for biological sequence analysis.
Pachter, Lior; Sturmfels, Bernd
2004-11-16
One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Directory of Open Access Journals (Sweden)
Matteo Convertino
richness, or [Formula: see text] diversity, based on the Shannon entropy of pixel intensity.To test our approach, we specifically use the green band of Landsat images for a water conservation area in the Florida Everglades. We validate our predictions against data of species occurrences for a twenty-eight years long period for both wet and dry seasons. Our method correctly predicts 73% of species richness. For species turnover, the newly proposed KL divergence prediction performance is near 100% accurate. This represents a significant improvement over the more conventional Shannon entropy difference, which provides 85% accuracy. Furthermore, we find that changes in soil and water patterns, as measured by fluctuations of the Shannon entropy for the red and blue bands respectively, are positively correlated with changes in vegetation. The fluctuations are smaller in the wet season when compared to the dry season. CONCLUSIONS/SIGNIFICANCE: Texture-based statistical multiresolution image analysis is a promising method for quantifying interseasonal differences and, consequently, the degree to which vegetation, soil, and water patterns vary. The proposed automated method for quantifying species richness and turnover can also provide analysis at higher spatial and temporal resolution than is currently obtainable from expensive monitoring campaigns, thus enabling more prompt, more cost effective inference and decision making support regarding anomalous variations in biodiversity. Additionally, a matrix-based visualization of the statistical multiresolution analysis is presented to facilitate both insight and quick recognition of anomalous data.
Estimating uncertainty of inference for validation
Energy Technology Data Exchange (ETDEWEB)
Booker, Jane M [Los Alamos National Laboratory; Langenbrunner, James R [Los Alamos National Laboratory; Hemez, Francois M [Los Alamos National Laboratory; Ross, Timothy J [UNM
2010-09-30
first in a series of inference uncertainty estimations. While the methods demonstrated are primarily statistical, these do not preclude the use of nonprobabilistic methods for uncertainty characterization. The methods presented permit accurate determinations for validation and eventual prediction. It is a goal that these methods establish a standard against which best practice may evolve for determining degree of validation.
Efficient computation of the joint sample frequency spectra for multiple populations.
Kamm, John A; Terhorst, Jonathan; Song, Yun S
2017-01-01
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
FUNSTAT and statistical image representations
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Direct Learning of Systematics-Aware Summary Statistics
CERN. Geneva
2018-01-01
Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...
Automatic physical inference with information maximizing neural networks
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.
The Role of the Sampling Distribution in Understanding Statistical Inference
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Understanding advanced statistical methods
Westfall, Peter
2013-01-01
Introduction: Probability, Statistics, and ScienceReality, Nature, Science, and ModelsStatistical Processes: Nature, Design and Measurement, and DataModelsDeterministic ModelsVariabilityParametersPurely Probabilistic Statistical ModelsStatistical Models with Both Deterministic and Probabilistic ComponentsStatistical InferenceGood and Bad ModelsUses of Probability ModelsRandom Variables and Their Probability DistributionsIntroductionTypes of Random Variables: Nominal, Ordinal, and ContinuousDiscrete Probability Distribution FunctionsContinuous Probability Distribution FunctionsSome Calculus-Derivatives and Least SquaresMore Calculus-Integrals and Cumulative Distribution FunctionsProbability Calculation and SimulationIntroductionAnalytic Calculations, Discrete and Continuous CasesSimulation-Based ApproximationGenerating Random NumbersIdentifying DistributionsIntroductionIdentifying Distributions from Theory AloneUsing Data: Estimating Distributions via the HistogramQuantiles: Theoretical and Data-Based Estimate...
Exact nonparametric inference for detection of nonlinear determinism
Luo, Xiaodong; Zhang, Jie; Small, Michael; Moroz, Irene
2005-01-01
We propose an exact nonparametric inference scheme for the detection of nonlinear determinism. The essential fact utilized in our scheme is that, for a linear stochastic process with jointly symmetric innovations, its ordinary least square (OLS) linear prediction error is symmetric about zero. Based on this viewpoint, a class of linear signed rank statistics, e.g. the Wilcoxon signed rank statistic, can be derived with the known null distributions from the prediction error. Thus one of the ad...
Milewski, Emil G
2012-01-01
REA's Essentials provide quick and easy access to critical information in a variety of different fields, ranging from the most basic to the most advanced. As its name implies, these concise, comprehensive study guides summarize the essentials of the field covered. Essentials are helpful when preparing for exams, doing homework and will remain a lasting reference source for students, teachers, and professionals. Statistics II discusses sampling theory, statistical inference, independent and dependent variables, correlation theory, experimental design, count data, chi-square test, and time se
A New Statistical Tool: Scalar Score Function
Czech Academy of Sciences Publication Activity Database
Fabián, Zdeněk
2011-01-01
Roč. 2, - (2011), s. 109-116 ISSN 1934-7332 R&D Projects: GA ČR GA205/09/1079 Institutional research plan: CEZ:AV0Z10300504 Keywords : statistics * inference function * data characteristics * point estimates * heavy tails Subject RIV: BB - Applied Statistics, Operational Research
Statistical network analysis for analyzing policy networks
DEFF Research Database (Denmark)
Robins, Garry; Lewis, Jenny; Wang, Peng
2012-01-01
and policy network methodology is the development of statistical modeling approaches that can accommodate such dependent data. In this article, we review three network statistical methods commonly used in the current literature: quadratic assignment procedures, exponential random graph models (ERGMs......To analyze social network data using standard statistical approaches is to risk incorrect inference. The dependencies among observations implied in a network conceptualization undermine standard assumptions of the usual general linear models. One of the most quickly expanding areas of social......), and stochastic actor-oriented models. We focus most attention on ERGMs by providing an illustrative example of a model for a strategic information network within a local government. We draw inferences about the structural role played by individuals recognized as key innovators and conclude that such an approach...
Campbell's and Rubin's Perspectives on Causal Inference
West, Stephen G.; Thoemmes, Felix
2010-01-01
Donald Campbell's approach to causal inference (D. T. Campbell, 1957; W. R. Shadish, T. D. Cook, & D. T. Campbell, 2002) is widely used in psychology and education, whereas Donald Rubin's causal model (P. W. Holland, 1986; D. B. Rubin, 1974, 2005) is widely used in economics, statistics, medicine, and public health. Campbell's approach focuses on…
Deep Learning for Population Genetic Inference.
Directory of Open Access Journals (Sweden)
Sara Sheehan
2016-03-01
Full Text Available Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data to the output (e.g., population genetic parameters of interest. We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history. Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
β-empirical Bayes inference and model diagnosis of microarray data
Directory of Open Access Journals (Sweden)
Hossain Mollah Mohammad
2012-06-01
Full Text Available Abstract Background Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. Results As an extension of empirical Bayes (EB procedures, we have developed the β-empirical Bayes (β-EB approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi- likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ. Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β0-likelihood by cross-validation. The proposed β-EB approach identified six significant (p−5 contaminated transcripts as differentially expressed (DE in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. Conclusions The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model
Nonparametric predictive inference in reliability
International Nuclear Information System (INIS)
Coolen, F.P.A.; Coolen-Schrijner, P.; Yan, K.J.
2002-01-01
We introduce a recently developed statistical approach, called nonparametric predictive inference (NPI), to reliability. Bounds for the survival function for a future observation are presented. We illustrate how NPI can deal with right-censored data, and discuss aspects of competing risks. We present possible applications of NPI for Bernoulli data, and we briefly outline applications of NPI for replacement decisions. The emphasis is on introduction and illustration of NPI in reliability contexts, detailed mathematical justifications are presented elsewhere
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
computational execution together form a comprehensive, widely- applicable paradigm for statistical graph inference. Approved for Public Release; Distribution...always involve challenging empirical modeling and implementation issues. Our project has propelled the mathematical development, statistical design...D. J., and Sussman, D. L., “A limit theorem for scaled eigenvectors of random dot product graphs,” Sankhya A. Mathemat - ical Statistics and
Improvement of Statistical Decisions under Parametric Uncertainty
Nechval, Nicholas A.; Nechval, Konstantin N.; Purgailis, Maris; Berzins, Gundars; Rozevskis, Uldis
2011-10-01
A large number of problems in production planning and scheduling, location, transportation, finance, and engineering design require that decisions be made in the presence of uncertainty. Decision-making under uncertainty is a central problem in statistical inference, and has been formally studied in virtually all approaches to inference. The aim of the present paper is to show how the invariant embedding technique, the idea of which belongs to the authors, may be employed in the particular case of finding the improved statistical decisions under parametric uncertainty. This technique represents a simple and computationally attractive statistical method based on the constructive use of the invariance principle in mathematical statistics. Unlike the Bayesian approach, an invariant embedding technique is independent of the choice of priors. It allows one to eliminate unknown parameters from the problem and to find the best invariant decision rule, which has smaller risk than any of the well-known decision rules. To illustrate the proposed technique, application examples are given.
Bayesian Inference for Functional Dynamics Exploring in fMRI Data
Directory of Open Access Journals (Sweden)
Xuan Guo
2016-01-01
Full Text Available This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM, Bayesian Connectivity Change Point Model (BCCPM, and Dynamic Bayesian Variable Partition Model (DBVPM, and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.
Practical Statistics for LHC Physicists: Descriptive Statistics, Probability and Likelihood (1/3)
CERN. Geneva
2015-01-01
These lectures cover those principles and practices of statistics that are most relevant for work at the LHC. The first lecture discusses the basic ideas of descriptive statistics, probability and likelihood. The second lecture covers the key ideas in the frequentist approach, including confidence limits, profile likelihoods, p-values, and hypothesis testing. The third lecture covers inference in the Bayesian approach. Throughout, real-world examples will be used to illustrate the practical application of the ideas. No previous knowledge is assumed.
Inference of Large Phylogenies Using Neighbour-Joining
DEFF Research Database (Denmark)
Simonsen, Martin; Mailund, Thomas; Pedersen, Christian Nørgaard Storm
2011-01-01
The neighbour-joining method is a widely used method for phylogenetic reconstruction which scales to thousands of taxa. However, advances in sequencing technology have made data sets with more than 10,000 related taxa widely available. Inference of such large phylogenies takes hours or days using...... the Neighbour-Joining method on a normal desktop computer because of the O(n^3) running time. RapidNJ is a search heuristic which reduce the running time of the Neighbour-Joining method significantly but at the cost of an increased memory consumption making inference of large phylogenies infeasible. We present...... two extensions for RapidNJ which reduce the memory requirements and \\makebox{allows} phylogenies with more than 50,000 taxa to be inferred efficiently on a desktop computer. Furthermore, an improved version of the search heuristic is presented which reduces the running time of RapidNJ on many data...
An application of an optimal statistic for characterizing relative orientations
Jow, Dylan L.; Hill, Ryley; Scott, Douglas; Soler, J. D.; Martin, P. G.; Devlin, M. J.; Fissel, L. M.; Poidevin, F.
2018-02-01
We present the projected Rayleigh statistic (PRS), a modification of the classic Rayleigh statistic, as a test for non-uniform relative orientation between two pseudo-vector fields. In the application here, this gives an effective way of investigating whether polarization pseudo-vectors (spin-2 quantities) are preferentially parallel or perpendicular to filaments in the interstellar medium. For example, there are other potential applications in astrophysics, e.g. when comparing small-scale orientations with larger scale shear patterns. We compare the efficiency of the PRS against histogram binning methods that have previously been used for characterizing the relative orientations of gas column density structures with the magnetic field projected on the plane of the sky. We examine data for the Vela C molecular cloud, where the column density is inferred from Herschel submillimetre observations, and the magnetic field from observations by the Balloon-borne Large-Aperture Submillimetre Telescope in the 250-, 350- and 500-μm wavelength bands. We find that the PRS has greater statistical power than approaches that bin the relative orientation angles, as it makes more efficient use of the information contained in the data. In particular, the use of the PRS to test for preferential alignment results in a higher statistical significance, in each of the four Vela C regions, with the greatest increase being by a factor 1.3 in the South-Nest region in the 250 - μ m band.
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Methodology for the inference of gene function from phenotype data.
Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A
2014-12-12
Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Dai, Qi; Yang, Yanchun; Wang, Tianming
2008-10-15
Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
Gross, Kevin; Rosenheim, Jay A
2011-10-01
Secondary pest outbreaks occur when the use of a pesticide to reduce densities of an unwanted target pest species triggers subsequent outbreaks of other pest species. Although secondary pest outbreaks are thought to be familiar in agriculture, their rigorous documentation is made difficult by the challenges of performing randomized experiments at suitable scales. Here, we quantify the frequency and monetary cost of secondary pest outbreaks elicited by early-season applications of broad-spectrum insecticides to control the plant bug Lygus spp. (primarily L. hesperus) in cotton grown in the San Joaquin Valley, California, USA. We do so by analyzing pest-control management practices for 969 cotton fields spanning nine years and 11 private ranches. Our analysis uses statistical methods to draw formal causal inferences from nonexperimental data that have become popular in public health and economics, but that are not yet widely known in ecology or agriculture. We find that, in fields that received an early-season broad-spectrum insecticide treatment for Lygus, 20.2% +/- 4.4% (mean +/- SE) of late-season pesticide costs were attributable to secondary pest outbreaks elicited by the early-season insecticide application for Lygus. In 2010 U.S. dollars, this equates to an additional $6.00 +/- $1.30 (mean +/- SE) per acre in management costs. To the extent that secondary pest outbreaks may be driven by eliminating pests' natural enemies, these figures place a lower bound on the monetary value of ecosystem services provided by native communities of arthropod predators and parasitoids in this agricultural system.
Statistically-Efficient Filtering in Impulsive Environments: Weighted Myriad Filters
Directory of Open Access Journals (Sweden)
Juan G. Gonzalez
2002-01-01
Full Text Available Linear filtering theory has been largely motivated by the characteristics of Gaussian signals. In the same manner, the proposed Myriad Filtering methods are motivated by the need for a flexible filter class with high statistical efficiency in non-Gaussian impulsive environments that can appear in practice. Myriad filters have a solid theoretical basis, are inherently more powerful than median filters, and are very general, subsuming traditional linear FIR filters. The foundation of the proposed filtering algorithms lies in the definition of the myriad as a tunable estimator of location derived from the theory of robust statistics. We prove several fundamental properties of this estimator and show its optimality in practical impulsive models such as the ÃŽÂ±-stable and generalized-t. We then extend the myriad estimation framework to allow the use of weights. In the same way as linear FIR filters become a powerful generalization of the mean filter, filters based on running myriads reach all of their potential when a weighting scheme is utilized. We derive the Ã¢Â€ÂœnormalÃ¢Â€Â equations for the optimal myriad filter, and introduce a suboptimal methodology for filter tuning and design. The strong potential of myriad filtering and estimation in impulsive environments is illustrated with several examples.
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Modeling and control of an unstable system using probabilistic fuzzy inference system
Directory of Open Access Journals (Sweden)
Sozhamadevi N.
2015-09-01
Full Text Available A new type Fuzzy Inference System is proposed, a Probabilistic Fuzzy Inference system which model and minimizes the effects of statistical uncertainties. The blend of two different concepts, degree of truth and probability of truth in a unique framework leads to this new concept. This combination is carried out both in Fuzzy sets and Fuzzy rules, which gives rise to Probabilistic Fuzzy Sets and Probabilistic Fuzzy Rules. Introducing these probabilistic elements, a distinctive probabilistic fuzzy inference system is developed and this involves fuzzification, inference and output processing. This integrated approach accounts for all of the uncertainty like rule uncertainties and measurement uncertainties present in the systems and has led to the design which performs optimally after training. In this paper a Probabilistic Fuzzy Inference System is applied for modeling and control of a highly nonlinear, unstable system and also proved its effectiveness.
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
SDG multiple fault diagnosis by real-time inverse inference
International Nuclear Information System (INIS)
Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng
2005-01-01
In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency
SDG multiple fault diagnosis by real-time inverse inference
Energy Technology Data Exchange (ETDEWEB)
Zhang Zhaoqian; Wu Chongguang; Zhang Beike; Xia Tao; Li Anfeng
2005-02-01
In the past 20 years, one of the qualitative simulation technologies, signed directed graph (SDG) has been widely applied in the field of chemical fault diagnosis. However, the assumption of single fault origin was usually used by many former researchers. As a result, this will lead to the problem of combinatorial explosion and has limited SDG to the realistic application on the real process. This is mainly because that most of the former researchers used forward inference engine in the commercial expert system software to carry out the inverse diagnosis inference on the SDG model which violates the internal principle of diagnosis mechanism. In this paper, we present a new SDG multiple faults diagnosis method by real-time inverse inference. This is a method of multiple faults diagnosis from the genuine significance and the inference engine use inverse mechanism. At last, we give an example of 65t/h furnace diagnosis system to demonstrate its applicability and efficiency.
Term Structure Examination of Indonesian Money Market: Some Efficiency Issue
Directory of Open Access Journals (Sweden)
Anggoro Budi Nugroho
2012-01-01
Full Text Available This paper examines efficiency of Indonesian term structure as imposed by the country’s central bank. The rate, widely understood as the Bank Indonesia (BI Rate varying from 30-day, 60-day, and 180-day, usually stated as the plain-vanilla cost of capital of interbank debt financing depending on their time length. In general, this rate will consequently impact various other sorts of interest rates in the country’s debt market as a whole. When dealing with market efficiency, statistical inference shows that short-term BI Rate’s is not the best predictor of its long-term one due to some uncertain asymmetric information. This finding may lead to further adjustment in risk management strategy for hedging with interest rate. Keywords: term structure, risk premia, expectation hypothesis (EH, market efficiency, cointegration, volatility spillover, expansionary monetary policy
Staged Inference using Conditional Deep Learning for energy efficient real-time smart diagnosis.
Parsa, Maryam; Panda, Priyadarshini; Sen, Shreyas; Roy, Kaushik
2017-07-01
Recent progress in biosensor technology and wearable devices has created a formidable opportunity for remote healthcare monitoring systems as well as real-time diagnosis and disease prevention. The use of data mining techniques is indispensable for analysis of the large pool of data generated by the wearable devices. Deep learning is among the promising methods for analyzing such data for healthcare applications and disease diagnosis. However, the conventional deep neural networks are computationally intensive and it is impractical to use them in real-time diagnosis with low-powered on-body devices. We propose Staged Inference using Conditional Deep Learning (SICDL), as an energy efficient approach for creating healthcare monitoring systems. For smart diagnostics, we observe that all diagnoses are not equally challenging. The proposed approach thus decomposes the diagnoses into preliminary analysis (such as healthy vs unhealthy) and detailed analysis (such as identifying the specific type of cardio disease). The preliminary diagnosis is conducted real-time with a low complexity neural network realized on the resource-constrained on-body device. The detailed diagnosis requires a larger network that is implemented remotely in cloud and is conditionally activated only for detailed diagnosis (unhealthy individuals). We evaluated the proposed approach using available physiological sensor data from Physionet databases, and achieved 38% energy reduction in comparison to the conventional deep learning approach.
Virtual reality and consciousness inference in dreaming.
Hobson, J Allan; Hong, Charles C-H; Friston, Karl J
2014-01-01
This article explores the notion that the brain is genetically endowed with an innate virtual reality generator that - through experience-dependent plasticity - becomes a generative or predictive model of the world. This model, which is most clearly revealed in rapid eye movement (REM) sleep dreaming, may provide the theater for conscious experience. Functional neuroimaging evidence for brain activations that are time-locked to rapid eye movements (REMs) endorses the view that waking consciousness emerges from REM sleep - and dreaming lays the foundations for waking perception. In this view, the brain is equipped with a virtual model of the world that generates predictions of its sensations. This model is continually updated and entrained by sensory prediction errors in wakefulness to ensure veridical perception, but not in dreaming. In contrast, dreaming plays an essential role in maintaining and enhancing the capacity to model the world by minimizing model complexity and thereby maximizing both statistical and thermodynamic efficiency. This perspective suggests that consciousness corresponds to the embodied process of inference, realized through the generation of virtual realities (in both sleep and wakefulness). In short, our premise or hypothesis is that the waking brain engages with the world to predict the causes of sensations, while in sleep the brain's generative model is actively refined so that it generates more efficient predictions during waking. We review the evidence in support of this hypothesis - evidence that grounds consciousness in biophysical computations whose neuronal and neurochemical infrastructure has been disclosed by sleep research.
Grouping preprocess for haplotype inference from SNP and CNV data
International Nuclear Information System (INIS)
Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Inoue, Masato; Kamatani, Naoyuki
2009-01-01
The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.
Grouping preprocess for haplotype inference from SNP and CNV data
Energy Technology Data Exchange (ETDEWEB)
Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Inoue, Masato [Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555 (Japan); Kamatani, Naoyuki, E-mail: masato.inoue@eb.waseda.ac.j [Institute of Rheumatology, Tokyo Women' s Medical University, 10-22, Kawada-cho, Shinjuku-ku, Tokyo 162-0054 (Japan)
2009-12-01
The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.
A unified framework for haplotype inference in nuclear families.
Iliadis, Alexandros; Anastassiou, Dimitris; Wang, Xiaodong
2012-07-01
Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds. © 2012 The Authors Annals of Human Genetics © 2012 Blackwell Publishing Ltd/University College London.
Kowalski, Jeanne
2008-01-01
A timely and applied approach to the newly discovered methods and applications of U-statisticsBuilt on years of collaborative research and academic experience, Modern Applied U-Statistics successfully presents a thorough introduction to the theory of U-statistics using in-depth examples and applications that address contemporary areas of study including biomedical and psychosocial research. Utilizing a "learn by example" approach, this book provides an accessible, yet in-depth, treatment of U-statistics, as well as addresses key concepts in asymptotic theory by integrating translational and cross-disciplinary research.The authors begin with an introduction of the essential and theoretical foundations of U-statistics such as the notion of convergence in probability and distribution, basic convergence results, stochastic Os, inference theory, generalized estimating equations, as well as the definition and asymptotic properties of U-statistics. With an emphasis on nonparametric applications when and where applic...
Statistical inference of level densities from resolved resonance parameters
International Nuclear Information System (INIS)
Froehner, F.H.
1983-08-01
Level densities are most directly obtained by counting the resonances observed in the resolved resonance range. Even in the measurements, however, weak levels are invariably missed so that one has to estimate their number and add it to the raw count. The main categories of missinglevel estimators are discussed in the present review, viz. (I) ladder methods including those based on the theory of Hamiltonian matrix ensembles (Dyson-Mehta statistics), (II) methods based on comparison with artificial cross section curves (Monte Carlo simulation, Garrison's autocorrelation method), (III) methods exploiting the observed neutron width distribution by means of Bayesian or more approximate procedures such as maximum-likelihood, least-squares or moment methods, with various recipes for the treatment of detection thresholds and resolution effects. The language of mathematical statistics is employed to clarify the basis of, and the relationship between, the various techniques. Recent progress in the treatment of resolution effects, detection thresholds and p-wave admixture is described. (orig.) [de
Probably not future prediction using probability and statistical inference
Dworsky, Lawrence N
2008-01-01
An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...
Directory of Open Access Journals (Sweden)
Yichen Wang
2016-01-01
Full Text Available In this paper, we develop the statistical delay quality-of-service (QoS provisioning framework for the energy-efficient spectrum-sharing based wireless ad hoc sensor network (WAHSN, which is characterized by the delay-bound violation probability. Based on the established delay QoS provisioning framework, we formulate the nonconvex optimization problem which aims at maximizing the average energy efficiency of the sensor node in the WAHSN while meeting PU’s statistical delay QoS requirement as well as satisfying sensor node’s average transmission rate, average transmitting power, and peak transmitting power constraints. By employing the theories of fractional programming, convex hull, and probabilistic transmission, we convert the original fractional-structured nonconvex problem to the additively structured parametric convex problem and obtain the optimal power allocation strategy under the given parameter via Lagrangian method. Finally, we derive the optimal average energy efficiency and corresponding optimal power allocation scheme by employing the Dinkelbach method. Simulation results show that our derived optimal power allocation strategy can be dynamically adjusted based on PU’s delay QoS requirement as well as the channel conditions. The impact of PU’s delay QoS requirement on sensor node’s energy efficiency is also illustrated.
Using statistical inference for decision making in best estimate analyses
International Nuclear Information System (INIS)
Sermer, P.; Weaver, K.; Hoppe, F.; Olive, C.; Quach, D.
2008-01-01
For broad classes of safety analysis problems, one needs to make decisions when faced with randomly varying quantities which are also subject to errors. The means for doing this involves a statistical approach which takes into account the nature of the physical problems, and the statistical constraints they impose. We describe the methodology for doing this which has been developed at Nuclear Safety Solutions, and we draw some comparisons to other methods which are commonly used in Canada and internationally. Our methodology has the advantages of being robust and accurate and compares favourably to other best estimate methods. (author)
Robust-BD Estimation and Inference for General Partially Linear Models
Directory of Open Access Journals (Sweden)
Chunming Zhang
2017-11-01
Full Text Available The classical quadratic loss for the partially linear model (PLM and the likelihood function for the generalized PLM are not resistant to outliers. This inspires us to propose a class of “robust-Bregman divergence (BD” estimators of both the parametric and nonparametric components in the general partially linear model (GPLM, which allows the distribution of the response variable to be partially specified, without being fully known. Using the local-polynomial function estimation method, we propose a computationally-efficient procedure for obtaining “robust-BD” estimators and establish the consistency and asymptotic normality of the “robust-BD” estimator of the parametric component β o . For inference procedures of β o in the GPLM, we show that the Wald-type test statistic W n constructed from the “robust-BD” estimators is asymptotically distribution free under the null, whereas the likelihood ratio-type test statistic Λ n is not. This provides an insight into the distinction from the asymptotic equivalence (Fan and Huang 2005 between W n and Λ n in the PLM constructed from profile least-squares estimators using the non-robust quadratic loss. Numerical examples illustrate the computational effectiveness of the proposed “robust-BD” estimators and robust Wald-type test in the appearance of outlying observations.
Royle, J. Andrew; Dorazio, Robert M.
2008-01-01
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
A combinatorial perspective of the protein inference problem.
Yang, Chao; He, Zengyou; Yu, Weichuan
2013-01-01
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
The research protocol VI: How to choose the appropriate statistical test. Inferential statistics
Directory of Open Access Journals (Sweden)
Eric Flores-Ruiz
2017-10-01
Full Text Available The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
DEFF Research Database (Denmark)
Jacobsen, Christian Robert Dahl; Møller, Jesper
2017-01-01
We introduce new estimation methods for a subclass of the Gaussian scale mixture models for wavelet trees by Wainwright, Simoncelli and Willsky that rely on modern results for composite likelihoods and approximate Bayesian inference. Our methodology is illustrated for denoising and edge detection...
Encryption of covert information into multiple statistical distributions
International Nuclear Information System (INIS)
Venkatesan, R.C.
2007-01-01
A novel strategy to encrypt covert information (code) via unitary projections into the null spaces of ill-conditioned eigenstructures of multiple host statistical distributions, inferred from incomplete constraints, is presented. The host pdf's are inferred using the maximum entropy principle. The projection of the covert information is dependent upon the pdf's of the host statistical distributions. The security of the encryption/decryption strategy is based on the extreme instability of the encoding process. A self-consistent procedure to derive keys for both symmetric and asymmetric cryptography is presented. The advantages of using a multiple pdf model to achieve encryption of covert information are briefly highlighted. Numerical simulations exemplify the efficacy of the model
Probability biases as Bayesian inference
Directory of Open Access Journals (Sweden)
Andre; C. R. Martins
2006-11-01
Full Text Available In this article, I will show how several observed biases in human probabilistic reasoning can be partially explained as good heuristics for making inferences in an environment where probabilities have uncertainties associated to them. Previous results show that the weight functions and the observed violations of coalescing and stochastic dominance can be understood from a Bayesian point of view. We will review those results and see that Bayesian methods should also be used as part of the explanation behind other known biases. That means that, although the observed errors are still errors under the be understood as adaptations to the solution of real life problems. Heuristics that allow fast evaluations and mimic a Bayesian inference would be an evolutionary advantage, since they would give us an efficient way of making decisions. %XX In that sense, it should be no surprise that humans reason with % probability as it has been observed.
Lies, damn lies and statistics
International Nuclear Information System (INIS)
Jones, M.D.
2001-01-01
Statistics are widely employed within archaeological research. This is becoming increasingly so as user friendly statistical packages make increasingly sophisticated analyses available to non statisticians. However, all statistical techniques are based on underlying assumptions of which the end user may be unaware. If statistical analyses are applied in ignorance of the underlying assumptions there is the potential for highly erroneous inferences to be drawn. This does happen within archaeology and here this is illustrated with the example of 'date pooling', a technique that has been widely misused in archaeological research. This misuse may have given rise to an inevitable and predictable misinterpretation of New Zealand's archaeological record. (author). 10 refs., 6 figs., 1 tab
Probing NWP model deficiencies by statistical postprocessing
DEFF Research Database (Denmark)
Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.
2016-01-01
The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational....... Based on the statistical model candidates inferred from the data, the lifted index NWP model diagnostic is consistently found among the NWP model predictors of the best performing statistical models across sites....
The Role of Working Memory in the Probabilistic Inference of Future Sensory Events.
Cashdollar, Nathan; Ruhnau, Philipp; Weisz, Nathan; Hasson, Uri
2017-05-01
The ability to represent the emerging regularity of sensory information from the external environment has been thought to allow one to probabilistically infer future sensory occurrences and thus optimize behavior. However, the underlying neural implementation of this process is still not comprehensively understood. Through a convergence of behavioral and neurophysiological evidence, we establish that the probabilistic inference of future events is critically linked to people's ability to maintain the recent past in working memory. Magnetoencephalography recordings demonstrated that when visual stimuli occurring over an extended time series had a greater statistical regularity, individuals with higher working-memory capacity (WMC) displayed enhanced slow-wave neural oscillations in the θ frequency band (4-8 Hz.) prior to, but not during stimulus appearance. This prestimulus neural activity was specifically linked to contexts where information could be anticipated and influenced the preferential sensory processing for this visual information after its appearance. A separate behavioral study demonstrated that this process intrinsically emerges during continuous perception and underpins a realistic advantage for efficient behavioral responses. In this way, WMC optimizes the anticipation of higher level semantic concepts expected to occur in the near future. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Wang, Yi; Wan, Jianwu; Guo, Jun; Cheung, Yiu-Ming; C Yuen, Pong
2017-07-14
Similarity search is essential to many important applications and often involves searching at scale on high-dimensional data based on their similarity to a query. In biometric applications, recent vulnerability studies have shown that adversarial machine learning can compromise biometric recognition systems by exploiting the biometric similarity information. Existing methods for biometric privacy protection are in general based on pairwise matching of secured biometric templates and have inherent limitations in search efficiency and scalability. In this paper, we propose an inference-based framework for privacy-preserving similarity search in Hamming space. Our approach builds on an obfuscated distance measure that can conceal Hamming distance in a dynamic interval. Such a mechanism enables us to systematically design statistically reliable methods for retrieving most likely candidates without knowing the exact distance values. We further propose to apply Montgomery multiplication for generating search indexes that can withstand adversarial similarity analysis, and show that information leakage in randomized Montgomery domains can be made negligibly small. Our experiments on public biometric datasets demonstrate that the inference-based approach can achieve a search accuracy close to the best performance possible with secure computation methods, but the associated cost is reduced by orders of magnitude compared to cryptographic primitives.
Directory of Open Access Journals (Sweden)
Browne Fiona
2006-12-01
Full Text Available Protein-protein interactions (PPI play a key role in many biological systems. Over the past few years, an explosion in availability of functional biological data obtained from high-throughput technologies to infer PPI has been observed. However, results obtained from such experiments show high rates of false positives and false negatives predictions as well as systematic predictive bias. Recent research has revealed that several machine and statistical learning methods applied to integrate relatively weak, diverse sources of large-scale functional data may provide improved predictive accuracy and coverage of PPI. In this paper we describe the effects of applying different computational, integrative methods to predict PPI in Saccharomyces cerevisiae. We investigated the predictive ability of combining different sets of relatively strong and weak predictive datasets. We analysed several genomic datasets ranging from mRNA co-expression to marginal essentiality. Moreover, we expanded an existing multi-source dataset from S. cerevisiae by constructing a new set of putative interactions extracted from Gene Ontology (GO- driven annotations in the Saccharomyces Genome Database. Different classification techniques: Simple Naive Bayesian (SNB, Multilayer Perceptron (MLP and K-Nearest Neighbors (KNN were evaluated. Relatively simple classification methods (i.e. less computing intensive and mathematically complex, such as SNB, have been proven to be proficient at predicting PPI. SNB produced the “highest” predictive quality obtaining an area under Receiver Operating Characteristic (ROC curve (AUC value of 0.99. The lowest AUC value of 0.90 was obtained by the KNN classifier. This assessment also demonstrates the strong predictive power of GO-driven models, which offered predictive performance above 0.90 using the different machine learning and statistical techniques. As the predictive power of single-source datasets became weaker MLP and SNB performed
Huser, Raphaël
2018-01-09
Extreme-value theory for stochastic processes has motivated the statistical use of max-stable models for spatial extremes. However, fitting such asymptotic models to maxima observed over finite blocks is problematic when the asymptotic stability of the dependence does not prevail in finite samples. This issue is particularly serious when data are asymptotically independent, such that the dependence strength weakens and eventually vanishes as events become more extreme. We here aim to provide flexible sub-asymptotic models for spatially indexed block maxima, which more realistically account for discrepancies between data and asymptotic theory. We develop models pertaining to the wider class of max-infinitely divisible processes, extending the class of max-stable processes while retaining dependence properties that are natural for maxima: max-id models are positively associated, and they yield a self-consistent family of models for block maxima defined over any time unit. We propose two parametric construction principles for max-id models, emphasizing a point process-based generalized spectral representation, that allows for asymptotic independence while keeping the max-stable extremal-$t$ model as a special case. Parameter estimation is efficiently performed by pairwise likelihood, and we illustrate our new modeling framework with an application to Dutch wind gust maxima calculated over different time units.
Bayesian inference for hybrid discrete-continuous stochastic kinetic models
International Nuclear Information System (INIS)
Sherlock, Chris; Golightly, Andrew; Gillespie, Colin S
2014-01-01
We consider the problem of efficiently performing simulation and inference for stochastic kinetic models. Whilst it is possible to work directly with the resulting Markov jump process (MJP), computational cost can be prohibitive for networks of realistic size and complexity. In this paper, we consider an inference scheme based on a novel hybrid simulator that classifies reactions as either ‘fast’ or ‘slow’ with fast reactions evolving as a continuous Markov process whilst the remaining slow reaction occurrences are modelled through a MJP with time-dependent hazards. A linear noise approximation (LNA) of fast reaction dynamics is employed and slow reaction events are captured by exploiting the ability to solve the stochastic differential equation driving the LNA. This simulation procedure is used as a proposal mechanism inside a particle MCMC scheme, thus allowing Bayesian inference for the model parameters. We apply the scheme to a simple application and compare the output with an existing hybrid approach and also a scheme for performing inference for the underlying discrete stochastic model. (paper)
Energy-efficient privacy protection for smart home environments using behavioral semantics.
Park, Homin; Basaran, Can; Park, Taejoon; Son, Sang Hyuk
2014-09-02
Research on smart environments saturated with ubiquitous computing devices is rapidly advancing while raising serious privacy issues. According to recent studies, privacy concerns significantly hinder widespread adoption of smart home technologies. Previous work has shown that it is possible to infer the activities of daily living within environments equipped with wireless sensors by monitoring radio fingerprints and traffic patterns. Since data encryption cannot prevent privacy invasions exploiting transmission pattern analysis and statistical inference, various methods based on fake data generation for concealing traffic patterns have been studied. In this paper, we describe an energy-efficient, light-weight, low-latency algorithm for creating dummy activities that are semantically similar to the observed phenomena. By using these cloaking activities, the amount of fake data transmissions can be flexibly controlled to support a trade-off between energy efficiency and privacy protection. According to the experiments using real data collected from a smart home environment, our proposed method can extend the lifetime of the network by more than 2× compared to the previous methods in the literature. Furthermore, the activity cloaking method supports low latency transmission of real data while also significantly reducing the accuracy of the wireless snooping attacks.
Information Geometric Complexity of a Trivariate Gaussian Statistical Model
Directory of Open Access Journals (Sweden)
Domenico Felice
2014-05-01
Full Text Available We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.
Statistical Challenges in Modeling Big Brain Signals
Yu, Zhaoxia; Pluta, Dustin; Shen, Tong; Chen, Chuansheng; Xue, Gui; Ombao, Hernando
2017-01-01
Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible
Inference of neuronal network spike dynamics and topology from calcium imaging data
Directory of Open Access Journals (Sweden)
Henry eLütcke
2013-12-01
Full Text Available Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP occurrence ('spike trains' from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties.
International Nuclear Information System (INIS)
Nakamura, Makoto
2009-01-01
It is important for Level 1 PSA to quantify input reliability parameters and their uncertainty. Bayesian methods for inference of system/component unavailability, however, are not well studied. At present practitioners allocate the uncertainty (i.e. error factor) of the unavailability based on engineering judgment. Systematic methods based on Bayesian statistics are needed for quantification of such uncertainty. In this study we have developed a new method for Bayesian inference of unavailability, where the posterior of system/component unavailability is described by the inverted gamma distribution. We show that the average of the posterior comes close to the point estimate of the unavailability as the number of outages goes to infinity. That indicates validity of the new method. Using plant data recorded in NUCIA, we have applied the new method to inference of system unavailability under unplanned outages due to violations of LCO at BWRs in Japan. According to the inference results, the unavailability is populated in the order of 10 -5 -10 -4 and the error factor is within 1-2. Thus, the new Bayesian method allows one to quantify magnitudes and widths (i.e. error factor) of uncertainty distributions of unavailability. (author)
Directory of Open Access Journals (Sweden)
Moura LMVR
2016-12-01
Full Text Available Lidia MVR Moura,1,2 M Brandon Westover,1,2 David Kwasnik,1 Andrew J Cole,1,2 John Hsu3–5 1Massachusetts General Hospital, Department of Neurology, Epilepsy Service, Boston, MA, USA; 2Harvard Medical School, Boston, MA, USA; 3Massachusetts General Hospital, Mongan Institute, Boston, MA, USA; 4Harvard Medical School, Department of Medicine, Boston, MA, USA; 5Harvard Medical School, Department of Health Care Policy, Boston, MA, USA Abstract: The elderly population faces an increasing number of cases of chronic neurological conditions, such as epilepsy and Alzheimer’s disease. Because the elderly with epilepsy are commonly excluded from randomized controlled clinical trials, there are few rigorous studies to guide clinical practice. When the elderly are eligible for trials, they either rarely participate or frequently have poor adherence to therapy, thus limiting both generalizability and validity. In contrast, large observational data sets are increasingly available, but are susceptible to bias when using common analytic approaches. Recent developments in causal inference-analytic approaches also introduce the possibility of emulating randomized controlled trials to yield valid estimates. We provide a practical example of the application of the principles of causal inference to a large observational data set of patients with epilepsy. This review also provides a framework for comparative-effectiveness research in chronic neurological conditions. Keywords: epilepsy, epidemiology, neurostatistics, causal inference
Efficient Partitioning of Large Databases without Query Statistics
Directory of Open Access Journals (Sweden)
Shahidul Islam KHAN
2016-11-01
Full Text Available An efficient way of improving the performance of a database management system is distributed processing. Distribution of data involves fragmentation or partitioning, replication, and allocation process. Previous research works provided partitioning based on empirical data about the type and frequency of the queries. These solutions are not suitable at the initial stage of a distributed database as query statistics are not available then. In this paper, I have presented a fragmentation technique, Matrix based Fragmentation (MMF, which can be applied at the initial stage as well as at later stages of distributed databases. Instead of using empirical data, I have developed a matrix, Modified Create, Read, Update and Delete (MCRUD, to partition a large database properly. Allocation of fragments is done simultaneously in my proposed technique. So using MMF, no additional complexity is added for allocating the fragments to the sites of a distributed database as fragmentation is synchronized with allocation. The performance of a DDBMS can be improved significantly by avoiding frequent remote access and high data transfer among the sites. Results show that proposed technique can solve the initial partitioning problem of large distributed databases.
Applied Statistics Using SPSS, STATISTICA, MATLAB and R
De Sá, Joaquim P Marques
2007-01-01
This practical reference provides a comprehensive introduction and tutorial on the main statistical analysis topics, demonstrating their solution with the most common software package. Intended for anyone needing to apply statistical analysis to a large variety of science and enigineering problems, the book explains and shows how to use SPSS, MATLAB, STATISTICA and R for analysis such as data description, statistical inference, classification and regression, factor analysis, survival data and directional statistics. It concisely explains key concepts and methods, illustrated by practical examp
Chen, Hua
2013-03-01
Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.
Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Improving statistical reasoning: theoretical models and practical implications
National Research Council Canada - National Science Library
Sedlmeier, Peter
1999-01-01
... in Psychology? 206 References 216 Author Index 230 Subject Index 235 v PrefacePreface Statistical literacy, the art of drawing reasonable inferences from an abundance of numbers provided daily by...
Inferring Characteristics of Sensorimotor Behavior by Quantifying Dynamics of Animal Locomotion
Leung, KaWai
Locomotion is one of the most well-studied topics in animal behavioral studies. Many fundamental and clinical research make use of the locomotion of an animal model to explore various aspects in sensorimotor behavior. In the past, most of these studies focused on population average of a specific trait due to limitation of data collection and processing power. With recent advance in computer vision and statistical modeling techniques, it is now possible to track and analyze large amounts of behavioral data. In this thesis, I present two projects that aim to infer the characteristics of sensorimotor behavior by quantifying the dynamics of locomotion of nematode Caenorhabditis elegans and fruit fly Drosophila melanogaster, shedding light on statistical dependence between sensing and behavior. In the first project, I investigate the possibility of inferring noxious sensory information from the behavior of Caenorhabditis elegans. I develop a statistical model to infer the heat stimulus level perceived by individual animals from their stereotyped escape responses after stimulation by an IR laser. The model allows quantification of analgesic-like effects of chemical agents or genetic mutations in the worm. At the same time, the method is able to differentiate perturbations of locomotion behavior that are beyond affecting the sensory system. With this model I propose experimental designs that allows statistically significant identification of analgesic-like effects. In the second project, I investigate the relationship of energy budget and stability of locomotion in determining the walking speed distribution of Drosophila melanogaster during aging. The locomotion stability at different age groups is estimated from video recordings using Floquet theory. I calculate the power consumption of different locomotion speed using a biomechanics model. In conclusion, the power consumption, not stability, predicts the locomotion speed distribution at different ages.
Application of Bayesian inference to stochastic analytic continuation
International Nuclear Information System (INIS)
Fuchs, S; Pruschke, T; Jarrell, M
2010-01-01
We present an algorithm for the analytic continuation of imaginary-time quantum Monte Carlo data. The algorithm is strictly based on principles of Bayesian statistical inference. It utilizes Monte Carlo simulations to calculate a weighted average of possible energy spectra. We apply the algorithm to imaginary-time quantum Monte Carlo data and compare the resulting energy spectra with those from a standard maximum entropy calculation.
Empirical inference festschrift in honor of Vladimir N. Vapnik
Schölkopf, Bernhard; Vovk, Vladimir
2013-01-01
This book honours the outstanding contributions of Vladimir Vapnik, a rare example of a scientist for whom the following statements hold true simultaneously: his work led to the inception of a new field of research, the theory of statistical learning and empirical inference; he has lived to see the field blossom; and he is still as active as ever.
Xu, Jason; Guttorp, Peter; Kato-Maeda, Midori; Minin, Vladimir N
2015-12-01
Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections. © 2015, The International Biometric Society.
International Nuclear Information System (INIS)
Saygin, D.; Worrell, E.; Tam, C.; Trudeau, N.; Gielen, D.J.; Weiss, M.; Patel, M.K.
2012-01-01
Analyzing the chemical industry’s energy use is challenging because of the sector’s complexity and the prevailing uncertainty in energy use and production data. We develop an advanced bottom-up model (PIE-Plus) which encompasses the energy use of the 139 most important chemical processes. We apply this model in a case study to analyze the German basic chemical industry’s energy use and energy efficiency improvements in the period between 1995 and 2008. We compare our results with data from the German Energy Balances and with data published by the International Energy Agency (IEA). We find that our model covers 88% of the basic chemical industry’s total final energy use (including non-energy use) as reported in the German Energy Balances. The observed energy efficiency improvements range between 2.2 and 3.5% per year, i.e., they are on the higher side of the values typically reported in literature. Our results point to uncertainties in the basic chemical industry’s final energy use as reported in the energy statistics and the specific energy consumption values. More efforts are required to improve the quality of the national and international energy statistics to make them useable for reliable monitoring of energy efficiency improvements of the chemical industry. -- Highlights: ► An advanced model was developed to estimate German chemical industry’s energy use. ► For the base year (2000), model covers 88% of the sector’s total final energy use. ► Sector’s energy efficiency improved between 2.2 and 3.5%/yr between 1995 and 2008. ► Improved energy statistics are required for accurate monitoring of improvements.
Inferring time derivatives including cell growth rates using Gaussian processes
Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta
2016-12-01
Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.
Applying Statistical Process Control to Clinical Data: An Illustration.
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
International Nuclear Information System (INIS)
Barbeito, Inés; Zaragoza, Sonia; Tarrío-Saavedra, Javier; Naya, Salvador
2017-01-01
Highlights: • Intelligent web platform development for energy efficiency management in buildings. • Controlling and supervising thermal comfort and energy consumption in buildings. • Statistical quality control procedure to deal with autocorrelated data. • Open source alternative using R software. - Abstract: In this paper, a case study of performing a reliable statistical procedure to evaluate the quality of HVAC systems in buildings using data retrieved from an ad hoc big data web energy platform is presented. The proposed methodology based on statistical quality control (SQC) is used to analyze the real state of thermal comfort and energy efficiency of the offices of the company FRIDAMA (Spain) in a reliable way. Non-conformities or alarms, and the actual assignable causes of these out of control states are detected. The capability to meet specification requirements is also analyzed. Tools and packages implemented in the open-source R software are employed to apply the different procedures. First, this study proposes to fit ARIMA time series models to CTQ variables. Then, the application of Shewhart and EWMA control charts to the time series residuals is proposed to control and monitor thermal comfort and energy consumption in buildings. Once thermal comfort and consumption variability are estimated, the implementation of capability indexes for autocorrelated variables is proposed to calculate the degree to which standards specifications are met. According with case study results, the proposed methodology has detected real anomalies in HVAC installation, helping to detect assignable causes and to make appropriate decisions. One of the goals is to perform and describe step by step this statistical procedure in order to be replicated by practitioners in a better way.
Statistical Analysis for High-Dimensional Data : The Abel Symposium 2014
Bühlmann, Peter; Glad, Ingrid; Langaas, Mette; Richardson, Sylvia; Vannucci, Marina
2016-01-01
This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on...
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework. © 2012 The British Psychological Society.
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by
Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data
DEFF Research Database (Denmark)
Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders
2018-01-01
The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness...... is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data calling genotypes accurately is not always possible, therefore the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring...... much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction...
An approach to build knowledge base for reactor accident diagnostic system using statistical method
International Nuclear Information System (INIS)
Kohsaka, Atsuo; Yokobayashi, Masao; Matsumoto, Kiyoshi; Fujii, Minoru
1988-01-01
In the development of a rule based expert system, one of key issues is how to build a knowledge base (KB). A systematic approach has been attempted for building an objective KB efficiently. The approach is based on the concept that a prototype KB should first be generated in a systematic way and then it is to be modified and/or improved by expert for practical use. The statistical method, Factor Analysis, was applied to build a prototype KB for the JAERI expert system DISKET using source information obtained from a PWR simulator. The prototype KB was obtained and the inference with this KB was performed against several types of transients. In each diagnosis, the transient type was well identified. From this study, it is concluded that the statistical method used is useful for building a prototype knowledge base. (author)
Cortical hierarchies perform Bayesian causal inference in multisensory perception.
Directory of Open Access Journals (Sweden)
Tim Rohe
2015-02-01
Full Text Available To form a veridical percept of the environment, the brain needs to integrate sensory signals from a common source but segregate those from independent sources. Thus, perception inherently relies on solving the "causal inference problem." Behaviorally, humans solve this problem optimally as predicted by Bayesian Causal Inference; yet, the underlying neural mechanisms are unexplored. Combining psychophysics, Bayesian modeling, functional magnetic resonance imaging (fMRI, and multivariate decoding in an audiovisual spatial localization task, we demonstrate that Bayesian Causal Inference is performed by a hierarchy of multisensory processes in the human brain. At the bottom of the hierarchy, in auditory and visual areas, location is represented on the basis that the two signals are generated by independent sources (= segregation. At the next stage, in posterior intraparietal sulcus, location is estimated under the assumption that the two signals are from a common source (= forced fusion. Only at the top of the hierarchy, in anterior intraparietal sulcus, the uncertainty about the causal structure of the world is taken into account and sensory signals are combined as predicted by Bayesian Causal Inference. Characterizing the computational operations of signal interactions reveals the hierarchical nature of multisensory perception in human neocortex. It unravels how the brain accomplishes Bayesian Causal Inference, a statistical computation fundamental for perception and cognition. Our results demonstrate how the brain combines information in the face of uncertainty about the underlying causal structure of the world.
Direct Evidence for a Dual Process Model of Deductive Inference
Markovits, Henry; Brunet, Marie-Laurence; Thompson, Valerie; Brisson, Janie
2013-01-01
In 2 experiments, we tested a strong version of a dual process theory of conditional inference (cf. Verschueren et al., 2005a, 2005b) that assumes that most reasoners have 2 strategies available, the choice of which is determined by situational variables, cognitive capacity, and metacognitive control. The statistical strategy evaluates inferences…
The relation between statistical power and inference in fMRI.
Directory of Open Access Journals (Sweden)
Henk R Cremers
Full Text Available Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects, and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial-especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20-30 display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate prediction methods and meta-analyses with related synthesis-oriented approaches.
Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum
Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan
2017-01-01
Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…
Statistical Challenges in Modeling Big Brain Signals
Yu, Zhaoxia
2017-11-01
Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.
Stilp, Christian E.; Kluender, Keith R.
2012-01-01
To the extent that sensorineural systems are efficient, redundancy should be extracted to optimize transmission of information, but perceptual evidence for this has been limited. Stilp and colleagues recently reported efficient coding of robust correlation (r = .97) among complex acoustic attributes (attack/decay, spectral shape) in novel sounds. Discrimination of sounds orthogonal to the correlation was initially inferior but later comparable to that of sounds obeying the correlation. These effects were attenuated for less-correlated stimuli (r = .54) for reasons that are unclear. Here, statistical properties of correlation among acoustic attributes essential for perceptual organization are investigated. Overall, simple strength of the principal correlation is inadequate to predict listener performance. Initial superiority of discrimination for statistically consistent sound pairs was relatively insensitive to decreased physical acoustic/psychoacoustic range of evidence supporting the correlation, and to more frequent presentations of the same orthogonal test pairs. However, increased range supporting an orthogonal dimension has substantial effects upon perceptual organization. Connectionist simulations and Eigenvalues from closed-form calculations of principal components analysis (PCA) reveal that perceptual organization is near-optimally weighted to shared versus unshared covariance in experienced sound distributions. Implications of reduced perceptual dimensionality for speech perception and plausible neural substrates are discussed. PMID:22292057
Population genetics inference for longitudinally-sampled mutants under strong selection.
Lacerda, Miguel; Seoighe, Cathal
2014-11-01
Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright-Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright-Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright-Fisher model. Copyright © 2014 by the Genetics Society of America.
Technical Note: How to use Winbugs to infer animal models
DEFF Research Database (Denmark)
Damgaard, Lars Holm
2007-01-01
This paper deals with Bayesian inferences of animal models using Gibbs sampling. First, we suggest a general and efficient method for updating additive genetic effects, in which the computational cost is independent of the pedigree depth and increases linearly only with the size of the pedigree....... Second, we show how this approach can be used to draw inferences from a wide range of animal models using the computer package Winbugs. Finally, we illustrate the approach in a simulation study, in which the data are generated and analyzed using Winbugs according to a linear model with i.i.d errors...... having Student's t distributions. In conclusion, Winbugs can be used to make inferences in small-sized, quantitative, genetic data sets applying a wide range of animal models that are not yet standard in the animal breeding literature...
Approximation Methods for Inference and Learning in Belief Networks: Progress and Future Directions
National Research Council Canada - National Science Library
Pazzan, Michael
1997-01-01
.... In this research project, we have investigated methods and implemented algorithms for efficiently making certain classes of inference in belief networks, and for automatically learning certain...
A Fast Iterative Bayesian Inference Algorithm for Sparse Channel Estimation
DEFF Research Database (Denmark)
Pedersen, Niels Lovmand; Manchón, Carles Navarro; Fleury, Bernard Henri
2013-01-01
representation of the Bessel K probability density function; a highly efficient, fast iterative Bayesian inference method is then applied to the proposed model. The resulting estimator outperforms other state-of-the-art Bayesian and non-Bayesian estimators, either by yielding lower mean squared estimation error...
Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data.
Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders
2018-02-02
The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Copyright © 2018 Soraggi et al.
Statistics for mathematicians a rigorous first course
Panaretos, Victor M
2016-01-01
This textbook provides a coherent introduction to the main concepts and methods of one-parameter statistical inference. Intended for students of Mathematics taking their first course in Statistics, the focus is on Statistics for Mathematicians rather than on Mathematical Statistics. The goal is not to focus on the mathematical/theoretical aspects of the subject, but rather to provide an introduction to the subject tailored to the mindset and tastes of Mathematics students, who are sometimes turned off by the informal nature of Statistics courses. This book can be used as the basis for an elementary semester-long first course on Statistics with a firm sense of direction that does not sacrifice rigor. The deeper goal of the text is to attract the attention of promising Mathematics students.
Caticha, Ariel
2011-03-01
In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.
A Review of Some Aspects of Robust Inference for Time Series.
1984-09-01
REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
International Nuclear Information System (INIS)
Ma Xiang; Zabaras, Nicholas
2009-01-01
A new approach to modeling inverse problems using a Bayesian inference method is introduced. The Bayesian approach considers the unknown parameters as random variables and seeks the probabilistic distribution of the unknowns. By introducing the concept of the stochastic prior state space to the Bayesian formulation, we reformulate the deterministic forward problem as a stochastic one. The adaptive hierarchical sparse grid collocation (ASGC) method is used for constructing an interpolant to the solution of the forward model in this prior space which is large enough to capture all the variability/uncertainty in the posterior distribution of the unknown parameters. This solution can be considered as a function of the random unknowns and serves as a stochastic surrogate model for the likelihood calculation. Hierarchical Bayesian formulation is used to derive the posterior probability density function (PPDF). The spatial model is represented as a convolution of a smooth kernel and a Markov random field. The state space of the PPDF is explored using Markov chain Monte Carlo algorithms to obtain statistics of the unknowns. The likelihood calculation is performed by directly sampling the approximate stochastic solution obtained through the ASGC method. The technique is assessed on two nonlinear inverse problems: source inversion and permeability estimation in flow through porous media
Conjunction analysis and propositional logic in fMRI data analysis using Bayesian statistics.
Rudert, Thomas; Lohmann, Gabriele
2008-12-01
To evaluate logical expressions over different effects in data analyses using the general linear model (GLM) and to evaluate logical expressions over different posterior probability maps (PPMs). In functional magnetic resonance imaging (fMRI) data analysis, the GLM was applied to estimate unknown regression parameters. Based on the GLM, Bayesian statistics can be used to determine the probability of conjunction, disjunction, implication, or any other arbitrary logical expression over different effects or contrast. For second-level inferences, PPMs from individual sessions or subjects are utilized. These PPMs can be combined to a logical expression and its probability can be computed. The methods proposed in this article are applied to data from a STROOP experiment and the methods are compared to conjunction analysis approaches for test-statistics. The combination of Bayesian statistics with propositional logic provides a new approach for data analyses in fMRI. Two different methods are introduced for propositional logic: the first for analyses using the GLM and the second for common inferences about different probability maps. The methods introduced extend the idea of conjunction analysis to a full propositional logic and adapt it from test-statistics to Bayesian statistics. The new approaches allow inferences that are not possible with known standard methods in fMRI. (c) 2008 Wiley-Liss, Inc.
Directory of Open Access Journals (Sweden)
Frank Emmert-Streib
2013-02-01
Full Text Available The inference of gene regulatory networks gained within recent years a considerable interest in the biology and biomedical community. The purpose of this paper is to investigate the influence that environmental conditions can exhibit on the inference performance of network inference algorithms. Specifically, we study five network inference methods, Aracne, BC3NET, CLR, C3NET and MRNET, and compare the results for three different conditions: (I observational gene expression data: normal environmental condition, (II interventional gene expression data: growth in rich media, (III interventional gene expression data: normal environmental condition interrupted by a positive spike-in stimulation. Overall, we find that different statistical inference methods lead to comparable, but condition-specific results. Further, our results suggest that non-steady-state data enhance the inferability of regulatory networks.
Design of uav robust autopilot based on adaptive neuro-fuzzy inference system
Directory of Open Access Journals (Sweden)
Mohand Achour Touat
2008-04-01
Full Text Available This paper is devoted to the application of adaptive neuro-fuzzy inference systems to the robust control of the UAV longitudinal motion. The adaptive neore-fuzzy inference system model needs to be trained by input/output data. This data were obtained from the modeling of a ”crisp” robust control system. The synthesis of this system is based on the separation theorem, which defines the structure and parameters of LQG-optimal controller, and further - robust optimization of this controller, based on the genetic algorithm. Such design procedure can define the rule base and parameters of fuzzyfication and defuzzyfication algorithms of the adaptive neore-fuzzy inference system controller, which ensure the robust properties of the control system. Simulation of the closed loop control system of UAV longitudinal motion with adaptive neore-fuzzy inference system controller demonstrates high efficiency of proposed design procedure.
Statistical methods for astronomical data analysis
Chattopadhyay, Asis Kumar
2014-01-01
This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.
Klinke, David J
2014-01-01
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
Energy-Efficient Privacy Protection for Smart Home Environments Using Behavioral Semantics
Directory of Open Access Journals (Sweden)
Homin Park
2014-09-01
Full Text Available Research on smart environments saturated with ubiquitous computing devices is rapidly advancing while raising serious privacy issues. According to recent studies, privacy concerns significantly hinder widespread adoption of smart home technologies. Previous work has shown that it is possible to infer the activities of daily living within environments equipped with wireless sensors by monitoring radio fingerprints and traffic patterns. Since data encryption cannot prevent privacy invasions exploiting transmission pattern analysis and statistical inference, various methods based on fake data generation for concealing traffic patterns have been studied. In this paper, we describe an energy-efficient, light-weight, low-latency algorithm for creating dummy activities that are semantically similar to the observed phenomena. By using these cloaking activities, the amount of fake data transmissions can be flexibly controlled to support a trade-off between energy efficiency and privacy protection. According to the experiments using real data collected from a smart home environment, our proposed method can extend the lifetime of the network by more than 2× compared to the previous methods in the literature. Furthermore, the activity cloaking method supports low latency transmission of real data while also significantly reducing the accuracy of the wireless snooping attacks.
Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization
Gelman, Andrew; Lee, Daniel; Guo, Jiqiang
2015-01-01
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…
A statistical perspective on association studies of psychiatric disorders
DEFF Research Database (Denmark)
Foldager, Leslie
2014-01-01
Gene-gene (GxG) and gene-environment (GxE) interactions likely play an important role in the aetiology of complex diseases like psychiatric disorders. Thus, we aim at investigating methodological aspects of and apply methods from statistical genetics taking interactions into account. In addition we...... genes and maternal infection by virus. Paper 3 presents the initial steps (mainly data construction) of an ongoing simulation study aiming at guiding decisions by comparing methods for GxE interaction analysis including both traditional two-step logistic regression, exhaustive searches using efficient...... these markers. However, the validity of the identified haplotypes is also checked by inferring phased haplotypes from genotypes. Haplotype analysis is also used in paper 5 which is otherwise an example of a focused approach to narrow down a previously found signal to search for more precise positions of disease...
Econometric models for distinguishing between market-driven and publicly-funded energy efficiency
International Nuclear Information System (INIS)
Horowitz, Marvin J.
2005-01-01
Central to the problem of estimating energy program benefits is the necessity to differentiate between changes in energy use that would have occurred in the absence of public programs versus declines in energy use that would not have occurred but for public programs. The former changes are often referred to as naturally-occurring or market-driven effects. They occur due to a combination of one or more independent variables, such as changes in prices, incomes, weather, and technology. For a rigorous, scientifically-valid program evaluation, it is essential to first control for these variables before making statistical inferences related to public program effects. This paper describes the economic and statistical issues surrounding quantitative studies of energy use, energy efficiency, and public programs. To illustrate the strengths and weaknesses of different impact evaluation approaches, this paper describes three new studies related to electricity use in the U. S. commercial buildings sector. Specification and estimation of time series and cross section econometric models are discussed, as are their capabilities for obtaining long-run estimates of the net impacts of energy efficiency programs
Bergtold, Jason S.; Yeager, Elizabeth A.; Featherstone, Allen M.
2011-01-01
The logistic regression models has been widely used in the social and natural sciences and results from studies using this model can have significant impact. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this study is to examine the impact of sample size on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model. A numbe...
Reliability of dose volume constraint inference from clinical data
Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.
2017-04-01
Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.
Introductory statistics for the behavioral sciences
Welkowitz, Joan; Cohen, Jacob
1971-01-01
Introductory Statistics for the Behavioral Sciences provides an introduction to statistical concepts and principles. This book emphasizes the robustness of parametric procedures wherein such significant tests as t and F yield accurate results even if such assumptions as equal population variances and normal population distributions are not well met.Organized into three parts encompassing 16 chapters, this book begins with an overview of the rationale upon which much of behavioral science research is based, namely, drawing inferences about a population based on data obtained from a samp
International Nuclear Information System (INIS)
Ali, S A; Kim, D-H; Cafaro, C; Giffin, A
2012-01-01
Information geometric techniques and inductive inference methods hold great promise for solving computational problems of interest in classical and quantum physics, especially with regard to complexity characterization of dynamical systems in terms of their probabilistic description on curved statistical manifolds. In this paper, we investigate the possibility of describing the macroscopic behavior of complex systems in terms of the underlying statistical structure of their microscopic degrees of freedom by the use of statistical inductive inference and information geometry. We review the maximum relative entropy formalism and the theoretical structure of the information geometrodynamical approach to chaos on statistical manifolds M S . Special focus is devoted to a description of the roles played by the sectional curvature K M S , the Jacobi field intensity J M S and the information geometrodynamical entropy S M S . These quantities serve as powerful information-geometric complexity measures of information-constrained dynamics associated with arbitrary chaotic and regular systems defined on M S . Finally, the application of such information-geometric techniques to several theoretical models is presented.
DEFF Research Database (Denmark)
Nolte, Ingmar; Voev, Valeri
The expected value of sums of squared intraday returns (realized variance) gives rise to a least squares regression which adapts itself to the assumptions of the noise process and allows for a joint inference on integrated volatility (IV), noise moments and price-noise relations. In the iid noise...
Sweller, Naomi; Hayes, Brett K
2010-08-01
Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.
Directory of Open Access Journals (Sweden)
Lester L. Yuan
2007-06-01
Full Text Available This paper provides a brief introduction to the R package bio.infer, a set of scripts that facilitates the use of maximum likelihood (ML methods for predicting environmental conditions from assemblage composition. Environmental conditions can often be inferred from only biological data, and these inferences are useful when other sources of data are unavailable. ML prediction methods are statistically rigorous and applicable to a broader set of problems than more commonly used weighted averaging techniques. However, ML methods require a substantially greater investment of time to program algorithms and to perform computations. This package is designed to reduce the effort required to apply ML prediction methods.
Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction
Imbens, Guido W.; Rubin, Donald B.
2015-01-01
Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding…
Constrained statistical inference : sample-size tables for ANOVA and regression
Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves
2015-01-01
Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and
DEFF Research Database (Denmark)
Andersen, Jesper
2009-01-01
Collateral evolution the problem of updating several library-using programs in response to API changes in the used library. In this dissertation we address the issue of understanding collateral evolutions by automatically inferring a high-level specification of the changes evident in a given set ...... specifications inferred by spdiff in Linux are shown. We find that the inferred specifications concisely capture the actual collateral evolution performed in the examples....
Extreme events in the Mediterranean area: A mixed deterministic-statistical approach
International Nuclear Information System (INIS)
Speranza, A.; Tartaglione, N.
2006-01-01
Statistical inference suffers for severe limitations when applied to extreme meteo-climatic events. A fundamental theorem proposes a constructive theory for a universal distribution law (the Generalized Extreme Value distribution) of extremes. Use of this theorem and of its derivations is nowadays quite common. However, when applying it, the selected events should be real extremes. In practical applications a major source of errors is the fact that there is no strict criterion for selecting extremes and, in order to fatten the statistical sample very mild selection criteria are often used. The theorem in question applies to stationary processes. When a trend is introduced, inference becomes even more problematic. Experience shows that any available a priori knowledge concerning the system can play a fundamental role in the analysis, in particular if it lowers the dimensionality of the parameter space to be explored. The inference procedures serve, then, the purpose of testing the reliability of inductive hypothesis, rather than proving them. Within the above general context, analysis of the hypothesis that the frequency and/or intensity of extreme weather events in the Mediterranean area may be changing is proposed. The analysis is based on a combined deterministic-statistical approach: dynamical analysis of intense perturbations is combined with statistical techniques in order to try to formulate the problem in such a way that meaningful conclusion may be achieved
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Inferred performance of surface hydraulic barriers from landfill operational data
International Nuclear Information System (INIS)
Gross, B.A.; Bonaparte, R.; Othman, M.A.
1997-01-01
There are few published data on the field performance of surface hydraulic barriers (SHBs) used in waste containment or remediation applications. In contrast, operational data for liner systems used beneath landfills are widely available. These data are frequently collected and reported as a facility permit condition. This paper uses leachate collection system (LCS) and leak detection system (LDS) liquid flow rate and chemical quality data collected from modem landfill double-liner systems to infer the likely hydraulic performance of SHBs. Operational data for over 200 waste management unit liner systems are currently being collected and evaluated by the authors as part of an ongoing research investigation for the United States Environmental Protection Agency (USEPA). The top liner of the double-liner system for the units is either a geomembrane (GMB) alone, geomembrane overlying a geosynthetic clay liner (GMB/GCL), or geomembrane overlying a compacted clay liner (GMB/CCL). In this paper, select data from the USEPA study are used to: (i) infer the likely efficiencies of SHBs incorporating GMBs and overlain by drainage layers; and (ii) evaluate the effectiveness of SHBs in reducing water infiltration into, and drainage from, the underlying waste (i.e., source control). SHB efficiencies are inferred from calculated landfill liner efficiencies and then used to estimate average water percolation rates through SHBs as a function of site average annual rainfall. The effectiveness of SHBs for source control is investigated by comparing LCS liquid flow rates for open and closed landfill cells. The LCS flow rates for closed cells are also compared to the estimated average water percolation rates through SHBs presented in the paper
An analysis pipeline for the inference of protein-protein interaction networks
Energy Technology Data Exchange (ETDEWEB)
Taylor, Ronald C.; Singhal, Mudita; Daly, Don S.; Gilmore, Jason M.; Cannon, William R.; Domico, Kelly O.; White, Amanda M.; Auberry, Deanna L.; Auberry, Kenneth J.; Hooker, Brian S.; Hurst, G. B.; McDermott, Jason E.; McDonald, W. H.; Pelletier, Dale A.; Schmoyer, Denise A.; Wiley, H. S.
2009-12-01
An analysis pipeline has been created for deployment of a novel algorithm, the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro), for use in the reconstruction of protein-protein interaction networks. We have combined the Software Environment for BIological Network Inference (SEBINI), an interactive environment for the deployment and testing of network inference algorithms that use high-throughput data, and the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources, to allow interactions computed by BEPro to be stored, visualized, and further analyzed. Incorporating BEPro into SEBINI and automatically feeding the resulting inferred network into CABIN, we have created a structured workflow for protein-protein network inference and supplemental analysis from sets of mass spectrometry bait-prey experiment data. SEBINI demo site: https://www.emsl.pnl.gov /SEBINI/ Contact: ronald.taylor@pnl.gov. BEPro is available at http://www.pnl.gov/statistics/BEPro3/index.htm. Contact: ds.daly@pnl.gov. CABIN is available at http://www.sysbio.org/dataresources/cabin.stm. Contact: mudita.singhal@pnl.gov.
Methodology in robust and nonparametric statistics
Jurecková, Jana; Picek, Jan
2012-01-01
Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators
Models for inference in dynamic metacommunity systems
Dorazio, Robert M.; Kery, Marc; Royle, J. Andrew; Plattner, Matthias
2010-01-01
A variety of processes are thought to be involved in the formation and dynamics of species assemblages. For example, various metacommunity theories are based on differences in the relative contributions of dispersal of species among local communities and interactions of species within local communities. Interestingly, metacommunity theories continue to be advanced without much empirical validation. Part of the problem is that statistical models used to analyze typical survey data either fail to specify ecological processes with sufficient complexity or they fail to account for errors in detection of species during sampling. In this paper, we describe a statistical modeling framework for the analysis of metacommunity dynamics that is based on the idea of adopting a unified approach, multispecies occupancy modeling, for computing inferences about individual species, local communities of species, or the entire metacommunity of species. This approach accounts for errors in detection of species during sampling and also allows different metacommunity paradigms to be specified in terms of species- and location-specific probabilities of occurrence, extinction, and colonization: all of which are estimable. In addition, this approach can be used to address inference problems that arise in conservation ecology, such as predicting temporal and spatial changes in biodiversity for use in making conservation decisions. To illustrate, we estimate changes in species composition associated with the species-specific phenologies of flight patterns of butterflies in Switzerland for the purpose of estimating regional differences in biodiversity.
Inference of Transmission Network Structure from HIV Phylogenetic Trees.
Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; Britton, Tom; Leitner, Thomas
2017-01-01
Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.
Multimodel inference and adaptive management
Rehme, S.E.; Powell, L.A.; Allen, Craig R.
2011-01-01
Ecology is an inherently complex science coping with correlated variables, nonlinear interactions and multiple scales of pattern and process, making it difficult for experiments to result in clear, strong inference. Natural resource managers, policy makers, and stakeholders rely on science to provide timely and accurate management recommendations. However, the time necessary to untangle the complexities of interactions within ecosystems is often far greater than the time available to make management decisions. One method of coping with this problem is multimodel inference. Multimodel inference assesses uncertainty by calculating likelihoods among multiple competing hypotheses, but multimodel inference results are often equivocal. Despite this, there may be pressure for ecologists to provide management recommendations regardless of the strength of their study’s inference. We reviewed papers in the Journal of Wildlife Management (JWM) and the journal Conservation Biology (CB) to quantify the prevalence of multimodel inference approaches, the resulting inference (weak versus strong), and how authors dealt with the uncertainty. Thirty-eight percent and 14%, respectively, of articles in the JWM and CB used multimodel inference approaches. Strong inference was rarely observed, with only 7% of JWM and 20% of CB articles resulting in strong inference. We found the majority of weak inference papers in both journals (59%) gave specific management recommendations. Model selection uncertainty was ignored in most recommendations for management. We suggest that adaptive management is an ideal method to resolve uncertainty when research results in weak inference.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity.
Pecevski, Dejan; Maass, Wolfgang
2016-01-01
Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p (*) that generates the examples it receives. This holds even if p (*) contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference.
Abductive Inference using Array-Based Logic
DEFF Research Database (Denmark)
Frisvad, Jeppe Revall; Falster, Peter; Møller, Gert L.
The notion of abduction has found its usage within a wide variety of AI fields. Computing abductive solutions has, however, shown to be highly intractable in logic programming. To avoid this intractability we present a new approach to logicbased abduction; through the geometrical view of data...... employed in array-based logic we embrace abduction in a simple structural operation. We argue that a theory of abduction on this form allows for an implementation which, at runtime, can perform abductive inference quite efficiently on arbitrary rules of logic representing knowledge of finite domains....
Inference algorithms and learning theory for Bayesian sparse factor analysis
International Nuclear Information System (INIS)
Rattray, Magnus; Sharp, Kevin; Stegle, Oliver; Winn, John
2009-01-01
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.
Inference algorithms and learning theory for Bayesian sparse factor analysis
Energy Technology Data Exchange (ETDEWEB)
Rattray, Magnus; Sharp, Kevin [School of Computer Science, University of Manchester, Manchester M13 9PL (United Kingdom); Stegle, Oliver [Max-Planck-Institute for Biological Cybernetics, Tuebingen (Germany); Winn, John, E-mail: magnus.rattray@manchester.ac.u [Microsoft Research Cambridge, Roger Needham Building, Cambridge, CB3 0FB (United Kingdom)
2009-12-01
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.
Causal Inference and Explaining Away in a Spiking Network
Moreno-Bote, Rubén; Drugowitsch, Jan
2015-01-01
While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426
Statistical media design for efficient polyhydroxyalkanoate production in Pseudomonas sp. MNNG-S.
Saranya, V; Rajeswari, V; Abirami, P; Poornimakkani, K; Suguna, P; Shenbagarathai, R
2016-07-03
Polyhydroxyalkanoate (PHA) is a promising polymer for various biomedical applications. There is a high need to improve the production rate to achieve end use. When a cost-effective production was carried out with cheaper agricultural residues like molasses, traces of toxins were incorporated into the polymer, which makes it unfit for biomedical applications. On the other hand, there is an increase in the popularity of using chemically defined media for the production of compounds with biomedical applications. However, these media do not exhibit favorable characteristics such as efficient utilization at large scale compared to complex media. This article aims to determine the specific nutritional requirement of Pseudomonas sp. MNNG-S for efficient production of polyhydroxyalkanoate. Response surface methodology (RSM) was used in this study to statistically design for PHA production based on the interactive effect of five significant variables (sucrose; potassium dihydrogen phosphate; ammonium sulfate; magnesium sulfate; trace elements). The interactive effects of sucrose with ammonium sulfate, ammonium sulfate with combined potassium phosphate, and trace element with magnesium sulfate were found to be significant (p production more than fourfold (from 0.85 g L(-1) to 4.56 g L(-1)).
Experimental statistics for biological sciences.
Bang, Heejung; Davidian, Marie
2010-01-01
In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.
International Nuclear Information System (INIS)
Ching, W-H; K H Leung, Michael; Leung, Dennis Y C
2009-01-01
Transient turbulent dispersion phenomena can be found in various practical problems, such as the accidental release of toxic chemical vapor and the airborne transmission of infectious droplets. Computational fluid dynamics (CFD) is an effective tool for analyzing such transient dispersion behaviors. However, the transient CFD analysis is often computationally expensive and time consuming. In the present study, a computationally efficient CFD-statistical hybrid modeling method has been developed for studying transient turbulent dispersion. In this method, the source emission is represented by emissions of many infinitesimal puffs. Statistical analysis is performed to obtain first the statistical properties of the puff trajectories and subsequently the most probable distribution of the puff trajectories that represent the macroscopic dispersion behaviors. In two case studies of ambient dispersion, the numerical modeling results obtained agree reasonably well with both experimental measurements and conventional k-ε modeling results published in the literature. More importantly, the proposed many-puff CFD-statistical hybrid modeling method effectively reduces the computational time by two orders of magnitude.
Directory of Open Access Journals (Sweden)
Funda H. Sezgin
2012-01-01
Full Text Available Data Envelopment Analysis (DEA is a mathematical programming formulationbased technique that provides an efficient frontier to suggest an estimate of therelative efficiency of each decision making unit (DMU in a problem set. DEA isdeveloped around the concept of evaluating the efficiency of a decision alternativebased on its performance of creating outputs in means of input consumption.Besides its advantages,criticisms about the potential bias of efficiency estimatesof DEA has been arised. One criticism about DEA is on the sampling variation ofthe estimated frontier which may affect the accuracy of results. The bootstrapmethod is a statistical resampling method used to perform inference complexproblems. The basic idea of the bootstrap method is to approximate the samplingdistributions of the estimator by using the empirical distribution of resampledestimates obtained from a Monte Carlo resampling. DEA estimators introducedan approach based on “bootstrap techniques” to correct and estimate the bias ofthe DEA efficiency indicators.The purpose of this study is to measure theefficiency of small amount ofinvestment banksin Turkeyafter thefinancialcrisis in 2010with theBootstrapDEA(BDEA.
Directory of Open Access Journals (Sweden)
Genadiy Vasserman
Full Text Available The visual system continually adjusts its sensitivity to the statistical properties of the environment through an adaptation process that starts in the retina. Colour perception and processing is commonly thought to occur mainly in high visual areas, and indeed most evidence for chromatic colour contrast adaptation comes from cortical studies. We show that colour contrast adaptation starts in the retina where ganglion cells adjust their responses to the spectral properties of the environment. We demonstrate that the ganglion cells match their responses to red-blue stimulus combinations according to the relative contrast of each of the input channels by rotating their functional response properties in colour space. Using measurements of the chromatic statistics of natural environments, we show that the retina balances inputs from the two (red and blue stimulated colour channels, as would be expected from theoretical optimal behaviour. Our results suggest that colour is encoded in the retina based on the efficient processing of spectral information that matches spectral combinations in natural scenes on the colour processing level.
Energy Technology Data Exchange (ETDEWEB)
Blanc, Guillermo A. [Observatories of the Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, CA 91101 (United States); Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A. [Research School of Astronomy and Astrophysics, Australian National University, Cotter Road, Weston, ACT 2611 (Australia)
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
International Nuclear Information System (INIS)
Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A.
2015-01-01
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances
Optimal inference with suboptimal models: Addiction and active Bayesian inference
Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl
2015-01-01
When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Using the Weibull distribution reliability, modeling and inference
McCool, John I
2012-01-01
Understand and utilize the latest developments in Weibull inferential methods While the Weibull distribution is widely used in science and engineering, most engineers do not have the necessary statistical training to implement the methodology effectively. Using the Weibull Distribution: Reliability, Modeling, and Inference fills a gap in the current literature on the topic, introducing a self-contained presentation of the probabilistic basis for the methodology while providing powerful techniques for extracting information from data. The author explains the use of the Weibull distribution
Inference of population splits and mixtures from genome-wide allele frequency data.
Directory of Open Access Journals (Sweden)
Joseph K Pickrell
Full Text Available Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.
International Conference on Robust Statistics 2015
Basu, Ayanendranath; Filzmoser, Peter; Mukherjee, Diganta
2016-01-01
This book offers a collection of recent contributions and emerging ideas in the areas of robust statistics presented at the International Conference on Robust Statistics 2015 (ICORS 2015) held in Kolkata during 12–16 January, 2015. The book explores the applicability of robust methods in other non-traditional areas which includes the use of new techniques such as skew and mixture of skew distributions, scaled Bregman divergences, and multilevel functional data methods; application areas being circular data models and prediction of mortality and life expectancy. The contributions are of both theoretical as well as applied in nature. Robust statistics is a relatively young branch of statistical sciences that is rapidly emerging as the bedrock of statistical analysis in the 21st century due to its flexible nature and wide scope. Robust statistics supports the application of parametric and other inference techniques over a broader domain than the strictly interpreted model scenarios employed in classical statis...
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
Sweeney, Elizabeth M; Shinohara, Russell T; Shiee, Navid; Mateen, Farrah J; Chudgar, Avni A; Cuzzocreo, Jennifer L; Calabresi, Peter A; Pham, Dzung L; Reich, Daniel S; Crainiceanu, Ciprian M
2013-01-01
Magnetic resonance imaging (MRI) can be used to detect lesions in the brains of multiple sclerosis (MS) patients and is essential for diagnosing the disease and monitoring its progression. In practice, lesion load is often quantified by either manual or semi-automated segmentation of MRI, which is time-consuming, costly, and associated with large inter- and intra-observer variability. We propose OASIS is Automated Statistical Inference for Segmentation (OASIS), an automated statistical method for segmenting MS lesions in MRI studies. We use logistic regression models incorporating multiple MRI modalities to estimate voxel-level probabilities of lesion presence. Intensity-normalized T1-weighted, T2-weighted, fluid-attenuated inversion recovery and proton density volumes from 131 MRI studies (98 MS subjects, 33 healthy subjects) with manual lesion segmentations were used to train and validate our model. Within this set, OASIS detected lesions with a partial area under the receiver operating characteristic curve for clinically relevant false positive rates of 1% and below of 0.59% (95% CI; [0.50%, 0.67%]) at the voxel level. An experienced MS neuroradiologist compared these segmentations to those produced by LesionTOADS, an image segmentation software that provides segmentation of both lesions and normal brain structures. For lesions, OASIS out-performed LesionTOADS in 74% (95% CI: [65%, 82%]) of cases for the 98 MS subjects. To further validate the method, we applied OASIS to 169 MRI studies acquired at a separate center. The neuroradiologist again compared the OASIS segmentations to those from LesionTOADS. For lesions, OASIS ranked higher than LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For a randomly selected subset of 50 of these studies, one additional radiologist and one neurologist also scored the images. Within this set, the neuroradiologist ranked OASIS higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of cases, the neurologist 66% (95% CI: [52%, 78
Inference rule and problem solving
Energy Technology Data Exchange (ETDEWEB)
Goto, S
1982-04-01
Intelligent information processing signifies an opportunity of having man's intellectual activity executed on the computer, in which inference, in place of ordinary calculation, is used as the basic operational mechanism for such an information processing. Many inference rules are derived from syllogisms in formal logic. The problem of programming this inference function is referred to as a problem solving. Although logically inference and problem-solving are in close relation, the calculation ability of current computers is on a low level for inferring. For clarifying the relation between inference and computers, nonmonotonic logic has been considered. The paper deals with the above topics. 16 references.
Indirect Inference for Stochastic Differential Equations Based on Moment Expansions
Ballesio, Marco
2016-01-06
We provide an indirect inference method to estimate the parameters of timehomogeneous scalar diffusion and jump diffusion processes. We obtain a system of ODEs that approximate the time evolution of the first two moments of the process by the approximation of the stochastic model applying a second order Taylor expansion of the SDE s infinitesimal generator in the Dynkin s formula. This method allows a simple and efficient procedure to infer the parameters of such stochastic processes given the data by the maximization of the likelihood of an approximating Gaussian process described by the two moments equations. Finally, we perform numerical experiments for two datasets arising from organic and inorganic fouling deposition phenomena.
Castruccio, Stefano; Huser, Raphaë l; Genton, Marc G.
2016-01-01
In multivariate or spatial extremes, inference for max-stable processes observed at a large collection of points is a very challenging problem and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. In this work, we explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely used multivariate or spatial extreme models, we discuss how to choose composite likelihood truncation to improve the efficiency, and we also provide recommendations for practitioners. This article has supplementary material online.
Bayesian inference for Markov jump processes with informative observations.
Golightly, Andrew; Wilkinson, Darren J
2015-04-01
In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.
Statistical aspects of the program of the Atomic Bomb Casualty Commission
Energy Technology Data Exchange (ETDEWEB)
Beebe, G W
1961-02-24
The Atomic Bomb Casualty Commission (ABCC) is a medical research institute in Hiroshima and Nagasaki devoted to long term study of the late effects of nuclear radiation upon man. The work draws its great interest from the paucity of existing information on the effect of radiation on man; from the unique radiation experience of the atomic bomb survivors; from the increasing utilization of nuclear energy in modern technology; and from humanitarian concern for the survivors of the bombs. The ABCC program provides the statistician with an important opportunity to apply the tools and concepts of statistics, for the inferences to be drawn are largely statistical inferences growing out of the comparison of samples defined as to radiation exposure. The work is of international as well as statistical interest by virtue of its subject matter and as a meeting-ground for statisticians trained in different countries.
Energy Technology Data Exchange (ETDEWEB)
Chertkov, Michael [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahn, Sungsoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of); Shin, Jinwoo [Korea Advanced Inst. Science and Technology (KAIST), Daejeon (Korea, Republic of)
2017-05-25
Computing partition function is the most important statistical inference task arising in applications of Graphical Models (GM). Since it is computationally intractable, approximate methods have been used to resolve the issue in practice, where meanfield (MF) and belief propagation (BP) are arguably the most popular and successful approaches of a variational type. In this paper, we propose two new variational schemes, coined Gauged-MF (G-MF) and Gauged-BP (G-BP), improving MF and BP, respectively. Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we prove that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case. Our extensive experiments, on complete GMs of relatively small size and on large GM (up-to 300 variables) confirm that the newly proposed algorithms outperform and generalize MF and BP.
Nagao, Makoto
1990-01-01
Knowledge and Inference discusses an important problem for software systems: How do we treat knowledge and ideas on a computer and how do we use inference to solve problems on a computer? The book talks about the problems of knowledge and inference for the purpose of merging artificial intelligence and library science. The book begins by clarifying the concept of """"knowledge"""" from many points of view, followed by a chapter on the current state of library science and the place of artificial intelligence in library science. Subsequent chapters cover central topics in the artificial intellig
Smith, Tony E.; Lee, Ka Lok
2012-01-01
There is a common belief that the presence of residual spatial autocorrelation in ordinary least squares (OLS) regression leads to inflated significance levels in beta coefficients and, in particular, inflated levels relative to the more efficient spatial error model (SEM). However, our simulations show that this is not always the case. Hence, the purpose of this paper is to examine this question from a geometric viewpoint. The key idea is to characterize the OLS test statistic in terms of angle cosines and examine the geometric implications of this characterization. Our first result is to show that if the explanatory variables in the regression exhibit no spatial autocorrelation, then the distribution of test statistics for individual beta coefficients in OLS is independent of any spatial autocorrelation in the error term. Hence, inferences about betas exhibit all the optimality properties of the classic uncorrelated error case. However, a second more important series of results show that if spatial autocorrelation is present in both the dependent and explanatory variables, then the conventional wisdom is correct. In particular, even when an explanatory variable is statistically independent of the dependent variable, such joint spatial dependencies tend to produce "spurious correlation" that results in over-rejection of the null hypothesis. The underlying geometric nature of this problem is clarified by illustrative examples. The paper concludes with a brief discussion of some possible remedies for this problem.
Kazak, Sibel; Pratt, Dave
2017-01-01
This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…
Directory of Open Access Journals (Sweden)
Zhang Zhang
2012-03-01
Full Text Available Abstract Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB. Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.
Fermions from classical statistics
International Nuclear Information System (INIS)
Wetterich, C.
2010-01-01
We describe fermions in terms of a classical statistical ensemble. The states τ of this ensemble are characterized by a sequence of values one or zero or a corresponding set of two-level observables. Every classical probability distribution can be associated to a quantum state for fermions. If the time evolution of the classical probabilities p τ amounts to a rotation of the wave function q τ (t)=±√(p τ (t)), we infer the unitary time evolution of a quantum system of fermions according to a Schroedinger equation. We establish how such classical statistical ensembles can be mapped to Grassmann functional integrals. Quantum field theories for fermions arise for a suitable time evolution of classical probabilities for generalized Ising models.
Streamflow Forecasting Using Nuero-Fuzzy Inference System
Nanduri, U. V.; Swain, P. C.
2005-12-01
The prediction of flow into a reservoir is fundamental in water resources planning and management. The need for timely and accurate streamflow forecasting is widely recognized and emphasized by many in water resources fraternity. Real-time forecasts of natural inflows to reservoirs are of particular interest for operation and scheduling. The physical system of the river basin that takes the rainfall as an input and produces the runoff is highly nonlinear, complicated and very difficult to fully comprehend. The system is influenced by large number of factors and variables. The large spatial extent of the systems forces the uncertainty into the hydrologic information. A variety of methods have been proposed for forecasting reservoir inflows including conceptual (physical) and empirical (statistical) models (WMO 1994), but none of them can be considered as unique superior model (Shamseldin 1997). Owing to difficulties of formulating reasonable non-linear watershed models, recent attempts have resorted to Neural Network (NN) approach for complex hydrologic modeling. In recent years the use of soft computing in the field of hydrological forecasting is gaining ground. The relatively new soft computing technique of Adaptive Neuro-Fuzzy Inference System (ANFIS), developed by Jang (1993) is able to take care of the non-linearity, uncertainty, and vagueness embedded in the system. It is a judicious combination of the Neural Networks and fuzzy systems. It can learn and generalize highly nonlinear and uncertain phenomena due to the embedded neural network (NN). NN is efficient in learning and generalization, and the fuzzy system mimics the cognitive capability of human brain. Hence, ANFIS can learn the complicated processes involved in the basin and correlate the precipitation to the corresponding discharge. In the present study, one step ahead forecasts are made for ten-daily flows, which are mostly required for short term operational planning of multipurpose reservoirs. A
New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics
International Nuclear Information System (INIS)
Akhmatskaya, Elena; Reich, Sebastian
2011-01-01
We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)
Detection of multiple damages employing best achievable eigenvectors under Bayesian inference
Prajapat, Kanta; Ray-Chaudhuri, Samit
2018-05-01
A novel approach is presented in this work to localize simultaneously multiple damaged elements in a structure along with the estimation of damage severity for each of the damaged elements. For detection of damaged elements, a best achievable eigenvector based formulation has been derived. To deal with noisy data, Bayesian inference is employed in the formulation wherein the likelihood of the Bayesian algorithm is formed on the basis of errors between the best achievable eigenvectors and the measured modes. In this approach, the most probable damage locations are evaluated under Bayesian inference by generating combinations of various possible damaged elements. Once damage locations are identified, damage severities are estimated using a Bayesian inference Markov chain Monte Carlo simulation. The efficiency of the proposed approach has been demonstrated by carrying out a numerical study involving a 12-story shear building. It has been found from this study that damage scenarios involving as low as 10% loss of stiffness in multiple elements are accurately determined (localized and severities quantified) even when 2% noise contaminated modal data are utilized. Further, this study introduces a term parameter impact (evaluated based on sensitivity of modal parameters towards structural parameters) to decide the suitability of selecting a particular mode, if some idea about the damaged elements are available. It has been demonstrated here that the accuracy and efficiency of the Bayesian quantification algorithm increases if damage localization is carried out a-priori. An experimental study involving a laboratory scale shear building and different stiffness modification scenarios shows that the proposed approach is efficient enough to localize the stories with stiffness modification.
Understanding the statistics of small risks
International Nuclear Information System (INIS)
Siddall, E.
1983-10-01
Monte Carlo analyses are used to show what inferences can and cannot be drawn when either a very small number of accidents result from a considerable exposure or where a very small number of people, down to a single individual, are exposed to small added risks. The distinction between relative and absolute uncertainty is illustrated. No new statistical principles are involved
Bayesian Nonparametric Statistical Inference for Shock Models and Wear Processes.
1979-12-01
also note that the results in Section 2 do not depend on the support of F .) This shock model have been studied by Esary, Marshall and Proschan (1973...Barlow and Proschan (1975), among others. The analogy of the shock model in risk and acturial analysis has been given by BUhlmann (1970, Chapter 2... Mathematical Statistics, Vol. 4, pp. 894-906. Billingsley, P. (1968), CONVERGENCE OF PROBABILITY MEASURES, John Wiley, New York. BUhlmann, H. (1970
Zamani, Pouya
2017-08-01
Traditional ratio measures of efficiency, including feed conversion ratio (FCR), gross milk efficiency (GME), gross energy efficiency (GEE) and net energy efficiency (NEE) may have some statistical problems including high correlations with milk yield. Residual energy intake (REI) or residual feed intake (RFI) is another criterion, proposed to overcome the problems attributed to the traditional ratio criteria, but it does not account for production or intake levels. For example, the same REI value could be considerable for low producing and negligible for high producing cows. The aim of this study was to propose a new measure of efficiency to overcome the problems attributed to the previous criteria. A total of 1478 monthly records of 268 lactating Holstein cows were used for this study. In addition to FCR, GME, GEE, NEE and REI, a new criterion called proportional residual energy intake (PREI) was calculated as REI to net energy intake ratio and defined as proportion of net energy intake lost as REI. The PREI had an average of -0·02 and range of -0·36 to 0·27, meaning that the least efficient cow lost 0·27 of her net energy intake as REI, while the most efficient animal saved 0·36 of her net energy intake as less REI. Traditional ratio criteria (FCR, GME, GEE and NEE) had high correlations with milk and fat corrected milk yields (absolute values from 0·469 to 0·816), while the REI and PREI had low correlations (0·000 to 0·069) with milk production. The results showed that the traditional ratio criteria (FCR, GME, GEE and NEE) are highly influenced by production traits, while the REI and PREI are independent of production level. Moreover, the PREI adjusts the REI magnitude for intake level. It seems that the PREI could be considered as a worthwhile measure of efficiency for future studies.
Fuzzy inference system for evaluating and improving nuclear power plant operating performance
International Nuclear Information System (INIS)
Guimaraes, Antonio Cesar F.; Lapa, Celso Marcelo Franklin
2003-01-01
This paper presents a fuzzy inference system (FIS) as an approach to estimate Nuclear Power Plant (NPP) performance indicators. The performance indicators for this study are the energy availability factor (EAF) and the planned (PUF) and unplanned unavailability factor (UUF). These indicators are obtained from a non analytical combination among the same operational parameters. Such parameters are, for example, environment impacts, industrial safety, radiological protection, safety indicators, scram rate, thermal efficiency, and fuel reliability. This approach uses the concept of a pure fuzzy logic system where the fuzzy rule base consists of a collection of fuzzy IF-THEN rules. The fuzzy inference engine uses these fuzzy IF-THEN rules to determine a mapping from fuzzy sets in the input universe of discourse to fuzzy sets in the output universe of discourse based on fuzzy logic principles. The results demonstrated the potential of the fuzzy inference to generate a knowledge basis that correlate operations occurrences and NPP performance. The inference system became possible the development of the sensitivity studies, future operational condition previsions and may support the eventual corrections on operation of the plant
Kernel learning at the first level of inference.
Cawley, Gavin C; Talbot, Nicola L C
2014-05-01
Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reflections on fourteen cryptic issues concerning the nature of statistical inference
Kardaun, O.J.W.F.; Salomé, D.; Schaafsma, W; Steerneman, A.G.M.; Willems, J.C; Cox, D.R.
The present paper provides the original formulation and a joint response of a group of statistically trained scientists to fourteen cryptic issues for discussion, which were handed out to the public by Professor Dr. D.R. Cox after his Bernoulli Lecture 1997 at Groningen University.
Probability and Bayesian statistics
1987-01-01
This book contains selected and refereed contributions to the "Inter national Symposium on Probability and Bayesian Statistics" which was orga nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...
Goal inferences about robot behavior : goal inferences and human response behaviors
Broers, H.A.T.; Ham, J.R.C.; Broeders, R.; De Silva, P.; Okada, M.
2014-01-01
This explorative research focused on the goal inferences human observers draw based on a robot's behavior, and the extent to which those inferences predict people's behavior in response to that robot. Results show that different robot behaviors cause different response behavior from people.
Computationally efficient statistical differential equation modeling using homogenization
Hooten, Mevin B.; Garlick, Martha J.; Powell, James A.
2013-01-01
Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.
Temporal and Statistical Information in Causal Structure Learning
McCormack, Teresa; Frosch, Caren; Patrick, Fiona; Lagnado, David
2015-01-01
Three experiments examined children's and adults' abilities to use statistical and temporal information to distinguish between common cause and causal chain structures. In Experiment 1, participants were provided with conditional probability information and/or temporal information and asked to infer the causal structure of a 3-variable mechanical…
Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction
Directory of Open Access Journals (Sweden)
Seong-Gon Kim
2011-06-01
Full Text Available Several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods have been used to approach the complex non-linear task of predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure in the past. This project introduces a new machine learning method by using an offline trained Multilayered Perceptrons (MLP as the likelihood models within a Bayesian Inference framework to predict secondary structures proteins. Varying window sizes are used to extract neighboring amino acid information and passed back and forth between the Neural Net models and the Bayesian Inference process until there is a convergence of the posterior secondary structure probability.
Learning Probabilistic Inference through Spike-Timing-Dependent Plasticity123
Pecevski, Dejan
2016-01-01
Abstract Numerous experimental data show that the brain is able to extract information from complex, uncertain, and often ambiguous experiences. Furthermore, it can use such learnt information for decision making through probabilistic inference. Several models have been proposed that aim at explaining how probabilistic inference could be performed by networks of neurons in the brain. We propose here a model that can also explain how such neural network could acquire the necessary information for that from examples. We show that spike-timing-dependent plasticity in combination with intrinsic plasticity generates in ensembles of pyramidal cells with lateral inhibition a fundamental building block for that: probabilistic associations between neurons that represent through their firing current values of random variables. Furthermore, by combining such adaptive network motifs in a recursive manner the resulting network is enabled to extract statistical information from complex input streams, and to build an internal model for the distribution p* that generates the examples it receives. This holds even if p* contains higher-order moments. The analysis of this learning process is supported by a rigorous theoretical foundation. Furthermore, we show that the network can use the learnt internal model immediately for prediction, decision making, and other types of probabilistic inference. PMID:27419214
Page, Robert; Satake, Eiki
2017-01-01
While interest in Bayesian statistics has been growing in statistics education, the treatment of the topic is still inadequate in both textbooks and the classroom. Because so many fields of study lead to careers that involve a decision-making process requiring an understanding of Bayesian methods, it is becoming increasingly clear that Bayesian…
Norris, Peter M.; da Silva, Arlindo M.
2018-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC. PMID:29618847
Norris, Peter M.; Da Silva, Arlindo M.
2016-01-01
A method is presented to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation. The gridcolumn model includes assumed probability density function (PDF) intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used in the current study are Moderate Resolution Imaging Spectroradiometer (MODIS) cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. The current study uses a skewed-triangle distribution for layer moisture. The article also includes a discussion of the Metropolis and multiple-try Metropolis versions of MCMC.
2017-09-01
application of statistical inference. Even when human forecasters leverage their professional experience, which is often gained through long periods of... application throughout statistics and Bayesian data analysis. The multivariate form of 2( , ) (e.g., Figure 12) is similarly analytically...data (i.e., no systematic manipulations with analytical functions), it is common in the statistical literature to apply mathematical transformations
Caticha, Ariel
2010-01-01
In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEn...
Directory of Open Access Journals (Sweden)
Ji Wei
2010-10-01
Full Text Available Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.
Counting statistics of many-particle quantum walks
Mayer, Klaus; Tichy, Malte C.; Mintert, Florian; Konrad, Thomas; Buchleitner, Andreas
2011-06-01
We study quantum walks of many noninteracting particles on a beam splitter array as a paradigmatic testing ground for the competition of single- and many-particle interference in a multimode system. We derive a general expression for multimode particle-number correlation functions, valid for bosons and fermions, and infer pronounced signatures of many-particle interferences in the counting statistics.
Counting statistics of many-particle quantum walks
International Nuclear Information System (INIS)
Mayer, Klaus; Tichy, Malte C.; Buchleitner, Andreas; Mintert, Florian; Konrad, Thomas
2011-01-01
We study quantum walks of many noninteracting particles on a beam splitter array as a paradigmatic testing ground for the competition of single- and many-particle interference in a multimode system. We derive a general expression for multimode particle-number correlation functions, valid for bosons and fermions, and infer pronounced signatures of many-particle interferences in the counting statistics.
Statistical mechanics of sparse generalization and graphical model selection
International Nuclear Information System (INIS)
Lage-Castellanos, Alejandro; Pagnani, Andrea; Weigt, Martin
2009-01-01
One of the crucial tasks in many inference problems is the extraction of an underlying sparse graphical model from a given number of high-dimensional measurements. In machine learning, this is frequently achieved using, as a penalty term, the L p norm of the model parameters, with p≤1 for efficient dilution. Here we propose a statistical mechanics analysis of the problem in the setting of perceptron memorization and generalization. Using a replica approach, we are able to evaluate the relative performance of naive dilution (obtained by learning without dilution, following by applying a threshold to the model parameters), L 1 dilution (which is frequently used in convex optimization) and L 0 dilution (which is optimal but computationally hard to implement). Whereas both L p diluted approaches clearly outperform the naive approach, we find a small region where L 0 works almost perfectly and strongly outperforms the simpler to implement L 1 dilution
A statistical model for interpreting computerized dynamic posturography data
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Learning Convex Inference of Marginals
Domke, Justin
2012-01-01
Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main ...
Inferring the conservative causal core of gene regulatory networks
Directory of Open Access Journals (Sweden)
Emmert-Streib Frank
2010-09-01
Full Text Available Abstract Background Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. Results In this paper, we introduce a novel gene regulatory network inference (GRNI algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. Conclusions For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Inferring the conservative causal core of gene regulatory networks.
Altay, Gökmen; Emmert-Streib, Frank
2010-09-28
Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically. In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently. For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Models and Inference for Multivariate Spatial Extremes
Vettori, Sabrina
2017-12-07
The development of flexible and interpretable statistical methods is necessary in order to provide appropriate risk assessment measures for extreme events and natural disasters. In this thesis, we address this challenge by contributing to the developing research field of Extreme-Value Theory. We initially study the performance of existing parametric and non-parametric estimators of extremal dependence for multivariate maxima. As the dimensionality increases, non-parametric estimators are more flexible than parametric methods but present some loss in efficiency that we quantify under various scenarios. We introduce a statistical tool which imposes the required shape constraints on non-parametric estimators in high dimensions, significantly improving their performance. Furthermore, by embedding the tree-based max-stable nested logistic distribution in the Bayesian framework, we develop a statistical algorithm that identifies the most likely tree structures representing the data\\'s extremal dependence using the reversible jump Monte Carlo Markov Chain method. A mixture of these trees is then used for uncertainty assessment in prediction through Bayesian model averaging. The computational complexity of full likelihood inference is significantly decreased by deriving a recursive formula for the nested logistic model likelihood. The algorithm performance is verified through simulation experiments which also compare different likelihood procedures. Finally, we extend the nested logistic representation to the spatial framework in order to jointly model multivariate variables collected across a spatial region. This situation emerges often in environmental applications but is not often considered in the current literature. Simulation experiments show that the new class of multivariate max-stable processes is able to detect both the cross and inner spatial dependence of a number of extreme variables at a relatively low computational cost, thanks to its Bayesian hierarchical
12th Workshop on Stochastic Models, Statistics and Their Applications
Rafajłowicz, Ewaryst; Szajowski, Krzysztof
2015-01-01
This volume presents the latest advances and trends in stochastic models and related statistical procedures. Selected peer-reviewed contributions focus on statistical inference, quality control, change-point analysis and detection, empirical processes, time series analysis, survival analysis and reliability, statistics for stochastic processes, big data in technology and the sciences, statistical genetics, experiment design, and stochastic models in engineering. Stochastic models and related statistical procedures play an important part in furthering our understanding of the challenging problems currently arising in areas of application such as the natural sciences, information technology, engineering, image analysis, genetics, energy and finance, to name but a few. This collection arises from the 12th Workshop on Stochastic Models, Statistics and Their Applications, Wroclaw, Poland.
Gao, Ya; Cheng, Wenchi; Zhang, Hailin
2017-08-23
Energy harvesting, which offers a never-ending energy supply, has emerged as a prominent technology to prolong the lifetime and reduce costs for the battery-powered wireless sensor networks. However, how to improve the energy efficiency while guaranteeing the quality of service (QoS) for energy harvesting based wireless sensor networks is still an open problem. In this paper, we develop statistical delay-bounded QoS-driven power control policies to maximize the effective energy efficiency (EEE), which is defined as the spectrum efficiency under given specified QoS constraints per unit harvested energy, for energy harvesting based wireless sensor networks. For the battery-infinite wireless sensor networks, our developed QoS-driven power control policy converges to the Energy harvesting Water Filling (E-WF) scheme and the Energy harvesting Channel Inversion (E-CI) scheme under the very loose and stringent QoS constraints, respectively. For the battery-finite wireless sensor networks, our developed QoS-driven power control policy becomes the Truncated energy harvesting Water Filling (T-WF) scheme and the Truncated energy harvesting Channel Inversion (T-CI) scheme under the very loose and stringent QoS constraints, respectively. Furthermore, we evaluate the outage probabilities to theoretically analyze the performance of our developed QoS-driven power control policies. The obtained numerical results validate our analysis and show that our developed optimal power control policies can optimize the EEE over energy harvesting based wireless sensor networks.
Statistical methods for spatio-temporal systems
Finkenstadt, Barbel
2006-01-01
Statistical Methods for Spatio-Temporal Systems presents current statistical research issues on spatio-temporal data modeling and will promote advances in research and a greater understanding between the mechanistic and the statistical modeling communities.Contributed by leading researchers in the field, each self-contained chapter starts with an introduction of the topic and progresses to recent research results. Presenting specific examples of epidemic data of bovine tuberculosis, gastroenteric disease, and the U.K. foot-and-mouth outbreak, the first chapter uses stochastic models, such as point process models, to provide the probabilistic backbone that facilitates statistical inference from data. The next chapter discusses the critical issue of modeling random growth objects in diverse biological systems, such as bacteria colonies, tumors, and plant populations. The subsequent chapter examines data transformation tools using examples from ecology and air quality data, followed by a chapter on space-time co...
Statistical x-ray computed tomography imaging from photon-starved measurements
Chang, Zhiqian; Zhang, Ruoqiao; Thibault, Jean-Baptiste; Sauer, Ken; Bouman, Charles
2013-03-01
Dose reduction in clinical X-ray computed tomography (CT) causes low signal-to-noise ratio (SNR) in photonsparse situations. Statistical iterative reconstruction algorithms have the advantage of retaining image quality while reducing input dosage, but they meet their limits of practicality when significant portions of the sinogram near photon starvation. The corruption of electronic noise leads to measured photon counts taking on negative values, posing a problem for the log() operation in preprocessing of data. In this paper, we propose two categories of projection correction methods: an adaptive denoising filter and Bayesian inference. The denoising filter is easy to implement and preserves local statistics, but it introduces correlation between channels and may affect image resolution. Bayesian inference is a point-wise estimation based on measurements and prior information. Both approaches help improve diagnostic image quality at dramatically reduced dosage.
Directory of Open Access Journals (Sweden)
Yinyin Yuan
Full Text Available Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/.
International Nuclear Information System (INIS)
Dai, Yu; Lu, Peng
2015-01-01
In evolutionary games, the temptation mechanism reduces cooperation percentage while the reputation mechanism promotes it. Inferring reputation theory proposes that agent's imitating neighbors with the highest reputation takes place with a probability. Although reputation promotes cooperation, when and how it enhances cooperation is still a question. This paper investigates the condition where the inferring reputation probability promotes cooperation. Hence, the effects of reputation and temptation on cooperation are explored under the spatial prisoners’ dilemma game, utilizing the methods of simulation and statistical analysis. Results show that temptation reduces cooperation unconditionally while reputation promotes it conditionally, i.e. reputation countervails temptation conditionally. When the inferring reputation probability is less than 0.5, reputation promotes cooperation substantially and thus countervails temptation. However, when the inferring reputation probability is larger than 0.5, its contribution to cooperation is relatively weak and cannot prevent temptation from undermining cooperation. Reputation even decreases cooperation together with temptation when the probability is higher than 0.8. It should be noticed that inferring reputation does not always succeed to countervail temptation and there is a specific interval for it to promote cooperation
Bayesian inference from count data using discrete uniform priors.
Directory of Open Access Journals (Sweden)
Federico Comoglio
Full Text Available We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.
Likelihood inference for a fractionally cointegrated vector autoregressive model
DEFF Research Database (Denmark)
Johansen, Søren; Ørregård Nielsen, Morten
2012-01-01
such that the process X_{t} is fractional of order d and cofractional of order d-b; that is, there exist vectors ß for which ß'X_{t} is fractional of order d-b, and no other fractionality order is possible. We define the statistical model by 0inference when the true values satisfy b0¿1/2 and d0-b0......We consider model based inference in a fractionally cointegrated (or cofractional) vector autoregressive model with a restricted constant term, ¿, based on the Gaussian likelihood conditional on initial values. The model nests the I(d) VAR model. We give conditions on the parameters...... process in the parameters when errors are i.i.d. with suitable moment conditions and initial values are bounded. When the limit is deterministic this implies uniform convergence in probability of the conditional likelihood function. If the true value b0>1/2, we prove that the limit distribution of (ß...
Mistaking geography for biology: inferring processes from species distributions.
Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I
2014-10-01
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Mixed integer linear programming for maximum-parsimony phylogeny inference.
Sridhar, Srinath; Lam, Fumei; Blelloch, Guy E; Ravi, R; Schwartz, Russell
2008-01-01
Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we present two integer linear programming (ILP) formulations to find the most parsimonious phylogenetic tree from a set of binary variation data. One method uses a flow-based formulation that can produce exponential numbers of variables and constraints in the worst case. The method has, however, proven extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods, solving several large mtDNA and Y-chromosome instances within a few seconds and giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality. An alternative formulation establishes that the problem can be solved with a polynomial-sized ILP. We further present a web server developed based on the exponential-sized ILP that performs fast maximum parsimony inferences and serves as a front end to a database of precomputed phylogenies spanning the human genome.
Practical Statistics for Particle Physicists
Lista, Luca
2017-01-01
These three lectures provide an introduction to the main concepts of statistical data analysis useful for precision measurements and searches for new signals in High Energy Physics. The frequentist and Bayesian approaches to probability theory will introduced and, for both approaches, inference methods will be presented. Hypothesis tests will be discussed, then significance and upper limit evaluation will be presented with an overview of the modern and most advanced techniques adopted for data analysis at the Large Hadron Collider.
International Nuclear Information System (INIS)
Nemnes, G A; Anghel, D V
2010-01-01
We present a stochastic method for the simulation of the time evolution in systems which obey generalized statistics, namely fractional exclusion statistics and Gentile's statistics. The transition rates are derived in the framework of canonical ensembles. This approach introduces a tool for describing interacting fermionic and bosonic systems in non-equilibrium as ideal FES systems, in a computationally efficient manner. The two types of statistics are analyzed comparatively, indicating their intrinsic thermodynamic differences and revealing key aspects related to the species size
Inference for Ecological Dynamical Systems: A Case Study of Two Endemic Diseases
Directory of Open Access Journals (Sweden)
Daniel A. Vasco
2012-01-01
Full Text Available A Bayesian Markov chain Monte Carlo method is used to infer parameters for an open stochastic epidemiological model: the Markovian susceptible-infected-recovered (SIR model, which is suitable for modeling and simulating recurrent epidemics. This allows exploring two major problems of inference appearing in many mechanistic population models. First, trajectories of these processes are often only partly observed. For example, during an epidemic the transmission process is only partly observable: one cannot record infection times. Therefore, one only records cases (infections as the observations. As a result some means of imputing or reconstructing individuals in the susceptible cases class must be accomplished. Second, the official reporting of observations (cases in epidemiology is typically done not as they are actually recorded but at some temporal interval over which they have been aggregated. To address these issues, this paper investigates the following problems. Parameter inference for a perfectly sampled open Markovian SIR is first considered. Next inference for an imperfectly observed sample path of the system is studied. Although this second problem has been solved for the case of closed epidemics, it has proven quite difficult for the case of open recurrent epidemics. Lastly, application of the statistical theory is made to measles and pertussis epidemic time series data from 60 UK cities.