Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka
2016-10-01
In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased.
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
新家, 健精
2013-01-01
© 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography
Nonparametric Bayesian Modeling of Complex Networks
DEFF Research Database (Denmark)
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Bayesian nonparametric duration model with censorship
Directory of Open Access Journals (Sweden)
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
2016-05-31
Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
Jin, Qian; He, Li-Jun; Zhang, Ai-Bing
2012-01-01
In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.
A Bayesian nonparametric method for prediction in EST analysis
Directory of Open Access Journals (Sweden)
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Introduction to Bayesian statistics
Bolstad, William M
2017-01-01
There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this Third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian staistics. The author continues to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inferenfe cfor discrete random variables, bionomial proprotion, Poisson, normal mean, and simple linear regression. In addition, newly-developing topics in the field are presented in four new chapters: Bayesian inference with unknown mean and variance; Bayesian inference for Multivariate Normal mean vector; Bayesian inference for Multiple Linear RegressionModel; and Computati...
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Nonparametric Bayesian drift estimation for multidimensional stochastic differential equations
Gugushvili, S.; Spreij, P.
2014-01-01
We consider nonparametric Bayesian estimation of the drift coefficient of a multidimensional stochastic differential equation from discrete-time observations on the solution of this equation. Under suitable regularity conditions, we establish posterior consistency in this context.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Energy Technology Data Exchange (ETDEWEB)
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Understanding Computational Bayesian Statistics
Bolstad, William M
2011-01-01
A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic
Bayesian statistics an introduction
Lee, Peter M
2012-01-01
Bayesian Statistics is the school of thought that combines prior beliefs with the likelihood of a hypothesis to arrive at posterior beliefs. The first edition of Peter Lee’s book appeared in 1989, but the subject has moved ever onwards, with increasing emphasis on Monte Carlo based techniques. This new fourth edition looks at recent techniques such as variational methods, Bayesian importance sampling, approximate Bayesian computation and Reversible Jump Markov Chain Monte Carlo (RJMCMC), providing a concise account of the way in which the Bayesian approach to statistics develops as wel
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
Non-parametric Bayesian inference for inhomogeneous Markov point processes
DEFF Research Database (Denmark)
Berthelsen, Kasper Klitgaard; Møller, Jesper
With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Nonparametric statistical structuring of knowledge systems using binary feature matches
DEFF Research Database (Denmark)
Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue
2014-01-01
statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...
Probability and Bayesian statistics
1987-01-01
This book contains selected and refereed contributions to the "Inter national Symposium on Probability and Bayesian Statistics" which was orga nized to celebrate the 80th birthday of Professor Bruno de Finetti at his birthplace Innsbruck in Austria. Since Professor de Finetti died in 1985 the symposium was dedicated to the memory of Bruno de Finetti and took place at Igls near Innsbruck from 23 to 26 September 1986. Some of the pa pers are published especially by the relationship to Bruno de Finetti's scientific work. The evolution of stochastics shows growing importance of probability as coherent assessment of numerical values as degrees of believe in certain events. This is the basis for Bayesian inference in the sense of modern statistics. The contributions in this volume cover a broad spectrum ranging from foundations of probability across psychological aspects of formulating sub jective probability statements, abstract measure theoretical considerations, contributions to theoretical statistics an...
Non-Parametric Bayesian Areal Linguistics
Daumé, Hal
2009-01-01
We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
Saad, Walid; Poor, H Vincent; Başar, Tamer; Song, Ju Bin
2012-01-01
This paper introduces a novel approach that enables a number of cognitive radio devices that are observing the availability pattern of a number of primary users(PUs), to cooperate and use \\emph{Bayesian nonparametric} techniques to estimate the distributions of the PUs' activity pattern, assumed to be completely unknown. In the proposed model, each cognitive node may have its own individual view on each PU's distribution, and, hence, seeks to find partners having a correlated perception. To address this problem, a coalitional game is formulated between the cognitive devices and an algorithm for cooperative coalition formation is proposed. It is shown that the proposed coalition formation algorithm allows the cognitive nodes that are experiencing a similar behavior from some PUs to self-organize into disjoint, independent coalitions. Inside each coalition, the cooperative cognitive nodes use a combination of Bayesian nonparametric models such as the Dirichlet process and statistical goodness of fit techniques ...
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Prior processes and their applications nonparametric Bayesian estimation
Phadia, Eswar G
2016-01-01
This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Directory of Open Access Journals (Sweden)
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Bayesian Methods for Statistical Analysis
Puza, Borek
2015-01-01
Bayesian methods for statistical analysis is a book on statistical methods for analysing a wide variety of data. The book consists of 12 chapters, starting with basic concepts and covering numerous topics, including Bayesian estimation, decision theory, prediction, hypothesis testing, hierarchical models, Markov chain Monte Carlo methods, finite population inference, biased sampling and nonignorable nonresponse. The book contains many exercises, all with worked solutions, including complete c...
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
2010-04-01
OF EACH CELL ARE RESULTS OF KSVD AND BPFA, RESPECTIVELY. σ C.man House Peppers Lena Barbara Boats F.print Couple Hill 5 37.87 39.37 37.78 38.60 38.08...INTERPOLATION PSNR RESULTS, USING PATCH SIZE 8× 8. BOTTOM: BPFA RGB IMAGE INTERPOLATION PSNR RESULTS, USING PATCH SIZE 7× 7. data ratio C.man House Peppers Lena...of subspaces. IEEE Trans. Inform. Theory, 2009. [16] T. Ferguson . A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1:209–230
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Introduction to Bayesian statistics
Koch, Karl-Rudolf
2007-01-01
This book presents Bayes' theorem, the estimation of unknown parameters, the determination of confidence regions and the derivation of tests of hypotheses for the unknown parameters. It does so in a simple manner that is easy to comprehend. The book compares traditional and Bayesian methods with the rules of probability presented in a logical way allowing an intuitive understanding of random variables and their probability distributions to be formed.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
Nonparametric Bayesian Clustering of Structural Whole Brain Connectivity in Full Image Resolution
DEFF Research Database (Denmark)
Ambrosen, Karen Marie Sandø; Albers, Kristoffer Jon; Dyrby, Tim B.
2014-01-01
Diffusion magnetic resonance imaging enables measuring the structural connectivity of the human brain at a high spatial resolution. Local noisy connectivity estimates can be derived using tractography approaches and statistical models are necessary to quantify the brain’s salient structural...... organization. However, statistically modeling these massive structural connectivity datasets is a computational challenging task. We develop a high-performance inference procedure for the infinite relational model (a prominent non-parametric Bayesian model for clustering networks into structurally similar...... groups) that defines structural units at the resolution of statistical support. We apply the model to a network of structural brain connectivity in full image resolution with more than one hundred thousand regions (voxels in the gray-white matter boundary) and around one hundred million connections...
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Borsboom, D.; Haig, B.D.
2013-01-01
Unlike most other statistical frameworks, Bayesian statistical inference is wedded to a particular approach in the philosophy of science (see Howson & Urbach, 2006); this approach is called Bayesianism. Rather than being concerned with model fitting, this position in the philosophy of science primar
Methodology in robust and nonparametric statistics
Jurecková, Jana; Picek, Jan
2012-01-01
Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators
DEFF Research Database (Denmark)
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....
Nonparametric statistics a step-by-step approach
Corder, Gregory W
2014-01-01
"…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory. It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca
Non-parametric Bayesian graph models reveal community structure in resting state fMRI
DEFF Research Database (Denmark)
Andersen, Kasper Winther; Madsen, Kristoffer H.; Siebner, Hartwig Roman
2014-01-01
Modeling of resting state functional magnetic resonance imaging (rs-fMRI) data using network models is of increasing interest. It is often desirable to group nodes into clusters to interpret the communication patterns between nodes. In this study we consider three different nonparametric Bayesian...
DEFF Research Database (Denmark)
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
Bayesian Model Selection and Statistical Modeling
Ando, Tomohiro
2010-01-01
Bayesian model selection is a fundamental part of the Bayesian statistical modeling process. The quality of these solutions usually depends on the goodness of the constructed Bayesian model. Realizing how crucial this issue is, many researchers and practitioners have been extensively investigating the Bayesian model selection problem. This book provides comprehensive explanations of the concepts and derivations of the Bayesian approach for model selection and related criteria, including the Bayes factor, the Bayesian information criterion (BIC), the generalized BIC, and the pseudo marginal lik
Using Mathematica to build Non-parametric Statistical Tables
Directory of Open Access Journals (Sweden)
Gloria Perez Sainz de Rozas
2003-01-01
Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
12th Brazilian Meeting on Bayesian Statistics
Louzada, Francisco; Rifo, Laura; Stern, Julio; Lauretto, Marcelo
2015-01-01
Through refereed papers, this volume focuses on the foundations of the Bayesian paradigm; their comparison to objectivistic or frequentist Statistics counterparts; and the appropriate application of Bayesian foundations. This research in Bayesian Statistics is applicable to data analysis in biostatistics, clinical trials, law, engineering, and the social sciences. EBEB, the Brazilian Meeting on Bayesian Statistics, is held every two years by the ISBrA, the International Society for Bayesian Analysis, one of the most active chapters of the ISBA. The 12th meeting took place March 10-14, 2014 in Atibaia. Interest in foundations of inductive Statistics has grown recently in accordance with the increasing availability of Bayesian methodological alternatives. Scientists need to deal with the ever more difficult choice of the optimal method to apply to their problem. This volume shows how Bayes can be the answer. The examination and discussion on the foundations work towards the goal of proper application of Bayesia...
Bayesian Nonparametric Mixture Estimation for Time-Indexed Functional Data in R
Directory of Open Access Journals (Sweden)
Terrance D. Savitsky
2016-08-01
Full Text Available We present growfunctions for R that offers Bayesian nonparametric estimation models for analysis of dependent, noisy time series data indexed by a collection of domains. This data structure arises from combining periodically published government survey statistics, such as are reported in the Current Population Study (CPS. The CPS publishes monthly, by-state estimates of employment levels, where each state expresses a noisy time series. Published state-level estimates from the CPS are composed from household survey responses in a model-free manner and express high levels of volatility due to insufficient sample sizes. Existing software solutions borrow information over a modeled time-based dependence to extract a de-noised time series for each domain. These solutions, however, ignore the dependence among the domains that may be additionally leveraged to improve estimation efficiency. The growfunctions package offers two fully nonparametric mixture models that simultaneously estimate both a time and domain-indexed dependence structure for a collection of time series: (1 A Gaussian process (GP construction, which is parameterized through the covariance matrix, estimates a latent function for each domain. The covariance parameters of the latent functions are indexed by domain under a Dirichlet process prior that permits estimation of the dependence among functions across the domains: (2 An intrinsic Gaussian Markov random field prior construction provides an alternative to the GP that expresses different computation and estimation properties. In addition to performing denoised estimation of latent functions from published domain estimates, growfunctions allows estimation of collections of functions for observation units (e.g., households, rather than aggregated domains, by accounting for an informative sampling design under which the probabilities for inclusion of observation units are related to the response variable. growfunctions includes plot
MAP estimators and their consistency in Bayesian nonparametric inverse problems
Dashti, M.; Law, K. J. H.; Stuart, A. M.; Voss, J.
2013-09-01
We consider the inverse problem of estimating an unknown function u from noisy measurements y of a known, possibly nonlinear, map {G} applied to u. We adopt a Bayesian approach to the problem and work in a setting where the prior measure is specified as a Gaussian random field μ0. We work under a natural set of conditions on the likelihood which implies the existence of a well-posed posterior measure, μy. Under these conditions, we show that the maximum a posteriori (MAP) estimator is well defined as the minimizer of an Onsager-Machlup functional defined on the Cameron-Martin space of the prior; thus, we link a problem in probability with a problem in the calculus of variations. We then consider the case where the observational noise vanishes and establish a form of Bayesian posterior consistency for the MAP estimator. We also prove a similar result for the case where the observation of {G}(u) can be repeated as many times as desired with independent identically distributed noise. The theory is illustrated with examples from an inverse problem for the Navier-Stokes equation, motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, motivated by problems arising in molecular dynamics.
MAP estimators and their consistency in Bayesian nonparametric inverse problems
Dashti, M.
2013-09-01
We consider the inverse problem of estimating an unknown function u from noisy measurements y of a known, possibly nonlinear, map applied to u. We adopt a Bayesian approach to the problem and work in a setting where the prior measure is specified as a Gaussian random field μ0. We work under a natural set of conditions on the likelihood which implies the existence of a well-posed posterior measure, μy. Under these conditions, we show that the maximum a posteriori (MAP) estimator is well defined as the minimizer of an Onsager-Machlup functional defined on the Cameron-Martin space of the prior; thus, we link a problem in probability with a problem in the calculus of variations. We then consider the case where the observational noise vanishes and establish a form of Bayesian posterior consistency for the MAP estimator. We also prove a similar result for the case where the observation of can be repeated as many times as desired with independent identically distributed noise. The theory is illustrated with examples from an inverse problem for the Navier-Stokes equation, motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, motivated by problems arising in molecular dynamics. © 2013 IOP Publishing Ltd.
A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems
Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2017-06-01
We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times.
Xu, Yanxun; Müller, Peter; Wahed, Abdus S; Thall, Peter F
2016-01-01
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
1st Conference of the International Society for Nonparametric Statistics
Lahiri, S; Politis, Dimitris
2014-01-01
This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world. The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...
Bayesian Statistics for Biological Data: Pedigree Analysis
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Non-Parametric Bayesian State Space Estimator for Negative Information
Directory of Open Access Journals (Sweden)
Guillaume de Chambrier
2017-09-01
Full Text Available Simultaneous Localization and Mapping (SLAM is concerned with the development of filters to accurately and efficiently infer the state parameters (position, orientation, etc. of an agent and aspects of its environment, commonly referred to as the map. A mapping system is necessary for the agent to achieve situatedness, which is a precondition for planning and reasoning. In this work, we consider an agent who is given the task of finding a set of objects. The agent has limited perception and can only sense the presence of objects if a direct contact is made, as a result most of the sensing is negative information. In the absence of recurrent sightings or direct measurements of objects, there are no correlations from the measurement errors that can be exploited. This renders SLAM estimators, for which this fact is their backbone such as EKF-SLAM, ineffective. In addition for our setting, no assumptions are taken with respect to the marginals (beliefs of both the agent and objects (map. From the loose assumptions we stipulate regarding the marginals and measurements, we adopt a histogram parametrization. We introduce a Bayesian State Space Estimator (BSSE, which we name Measurement Likelihood Memory Filter (MLMF, in which the values of the joint distribution are not parametrized but instead we directly apply changes from the measurement integration step to the marginals. This is achieved by keeping track of the history of likelihood functions’ parameters. We demonstrate that the MLMF gives the same filtered marginals as a histogram filter and show two implementations: MLMF and scalable-MLMF that both have a linear space complexity. The original MLMF retains an exponential time complexity (although an order of magnitude smaller than the histogram filter while the scalable-MLMF introduced independence assumption such to have a linear time complexity. We further quantitatively demonstrate the scalability of our algorithm with 25 beliefs having up to
Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan
2017-02-15
Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.
Bayesian models a statistical primer for ecologists
Hobbs, N Thompson
2015-01-01
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probabili
A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.
Shalchyan, Vahid; Farina, Dario
2014-02-15
Neural spikes from multiple neurons recorded in a multi-unit signal are usually separated by clustering. Drifts in the position of the recording electrode relative to the neurons over time cause gradual changes in the position and shapes of the clusters, challenging the clustering task. By dividing the data into short time intervals, Bayesian tracking of the clusters based on Gaussian cluster model has been previously proposed. However, the Gaussian cluster model is often not verified for neural spikes. We present a Bayesian clustering approach that makes no assumptions on the distribution of the clusters and use kernel-based density estimation of the clusters in every time interval as a prior for Bayesian classification of the data in the subsequent time interval. The proposed method was tested and compared to Gaussian model-based approach for cluster tracking by using both simulated and experimental datasets. The results showed that the proposed non-parametric kernel-based density estimation of the clusters outperformed the sequential Gaussian model fitting in both simulated and experimental data tests. Using non-parametric kernel density-based clustering that makes no assumptions on the distribution of the clusters enhances the ability of tracking cluster non-stationarity over time with respect to the Gaussian cluster modeling approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling
Knowles, David
2010-01-01
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data Y is modeled as a linear superposition, G, of a potentially infinite number of hidden factors, X. The Indian Buffet Process (IBP) is used as a prior on G to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated datasets based on a known sparse connectivity matrix for E. Coli, and on three biological datasets of increasing complexity.
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result. PMID:28133490
Directory of Open Access Journals (Sweden)
Yue Fan
2017-01-01
Full Text Available Gene regulatory networks (GRNs play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Philosophy and the practice of Bayesian statistics.
Gelman, Andrew; Shalizi, Cosma Rohilla
2013-02-01
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.
A Bayesian non-parametric Potts model with application to pre-surgical FMRI data.
Johnson, Timothy D; Liu, Zhuqing; Bartsch, Andreas J; Nichols, Thomas E
2013-08-01
The Potts model has enjoyed much success as a prior model for image segmentation. Given the individual classes in the model, the data are typically modeled as Gaussian random variates or as random variates from some other parametric distribution. In this article, we present a non-parametric Potts model and apply it to a functional magnetic resonance imaging study for the pre-surgical assessment of peritumoral brain activation. In our model, we assume that the Z-score image from a patient can be segmented into activated, deactivated, and null classes, or states. Conditional on the class, or state, the Z-scores are assumed to come from some generic distribution which we model non-parametrically using a mixture of Dirichlet process priors within the Bayesian framework. The posterior distribution of the model parameters is estimated with a Markov chain Monte Carlo algorithm, and Bayesian decision theory is used to make the final classifications. Our Potts prior model includes two parameters, the standard spatial regularization parameter and a parameter that can be interpreted as the a priori probability that each voxel belongs to the null, or background state, conditional on the lack of spatial regularization. We assume that both of these parameters are unknown, and jointly estimate them along with other model parameters. We show through simulation studies that our model performs on par, in terms of posterior expected loss, with parametric Potts models when the parametric model is correctly specified and outperforms parametric models when the parametric model in misspecified.
Rotondi, R.
2009-04-01
According to the unified scaling theory the probability distribution function of the recurrence time T is a scaled version of a base function and the average value of T can be used as a scale parameter for the distribution. The base function must belong to the scale family of distributions: tested on different catalogues and for different scale levels, for Corral (2005) the (truncated) generalized gamma distribution is the best model, for German (2006) the Weibull distribution. The scaling approach should overcome the difficulty of estimating distribution functions over small areas but theorical limitations and partial instability of the estimated distributions have been pointed out in the literature. Our aim is to analyze the recurrence time of strong earthquakes that occurred in the Italian territory. To satisfy the hypotheses of independence and identical distribution we have evaluated the times between events that occurred in each area of the Database of Individual Seismogenic Sources and then we have gathered them by eight tectonically coherent regions, each of them dominated by a well characterized geodynamic process. To solve problems like: paucity of data, presence of outliers and uncertainty in the choice of the functional expression for the distribution of t, we have followed a nonparametric approach (Rotondi (2009)) in which: (a) the maximum flexibility is obtained by assuming that the probability distribution is a random function belonging to a large function space, distributed as a stochastic process; (b) nonparametric estimation method is robust when the data contain outliers; (c) Bayesian methodology allows to exploit different information sources so that the model fitting may be good also to scarce samples. We have compared the hazard rates evaluated through the parametric and nonparametric approach. References Corral A. (2005). Mixing of rescaled data and Bayesian inference for earthquake recurrence times, Nonlin. Proces. Geophys., 12, 89
Energy Technology Data Exchange (ETDEWEB)
Koutsourelakis, P
2007-05-03
Clustering represents one of the most common statistical procedures and a standard tool for pattern discovery and dimension reduction. Most often the objects to be clustered are described by a set of measurements or observables e.g. the coordinates of the vectors, the attributes of people. In a lot of cases however the available observations appear in the form of links or connections (e.g. communication or transaction networks). This data contains valuable information that can in general be exploited in order to discover groups and better understand the structure of the dataset. Since in most real-world datasets, several of these links are missing, it is also useful to develop procedures that can predict those unobserved connections. In this report we address the problem of unsupervised group discovery in relational datasets. A fundamental issue in all clustering problems is that the actual number of clusters is unknown a priori. In most cases this is addressed by running the model several times assuming a different number of clusters each time and selecting the value that provides the best fit based on some criterion (ie Bayes factor in the case of Bayesian techniques). It is easily understood that it would be preferable to develop techniques that are able to number of clusters is essentially learned from that data along with the rest of model parameters. For that purpose, we adopt a nonparametric Bayesian framework which provides a very flexible modeling environment in which the size of the model i.e. the number of clusters, can adapt to the available data and readily accommodate outliers. The latter is particularly important since several groups of interest might consist of a small number of members and would most likely be smeared out by traditional modeling techniques. Finally, the proposed framework combines all the advantages of standard Bayesian techniques such as integration of prior knowledge in a principled manner, seamless accommodation of missing data
Bayesian Inference in Statistical Analysis
Box, George E P
2011-01-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Rob
Xu, Zhiqiang
2017-02-16
Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.
Gugushvili, S.; Spreij, P.
2016-01-01
We consider the problem of non-parametric estimation of the deterministic dispersion coefficient of a linear stochastic differential equation based on discrete time observations on its solution. We take a Bayesian approach to the problem and under suitable regularity assumptions derive the posteror
Morton, Kenneth D., Jr.; Torrione, Peter A.; Collins, Leslie
2010-04-01
Time domain ground penetrating radar (GPR) has been shown to be a powerful sensing phenomenology for detecting buried objects such as landmines. Landmine detection with GPR data typically utilizes a feature-based pattern classification algorithm to discriminate buried landmines from other sub-surface objects. In high-fidelity GPR, the time-frequency characteristics of a landmine response should be indicative of the physical construction and material composition of the landmine and could therefore be useful for discrimination from other non-threatening sub-surface objects. In this research we propose modeling landmine time-domain responses with a nonparametric Bayesian time-series model and we perform clustering of these time-series models with a hierarchical nonparametric Bayesian model. Each time-series is modeled as a hidden Markov model (HMM) with autoregressive (AR) state densities. The proposed nonparametric Bayesian prior allows for automated learning of the number of states in the HMM as well as the AR order within each state density. This creates a flexible time-series model with complexity determined by the data. Furthermore, a hierarchical non-parametric Bayesian prior is used to group landmine responses with similar HMM model parameters, thus learning the number of distinct landmine response models within a data set. Model inference is accomplished using a fast variational mean field approximation that can be implemented for on-line learning.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Palacios, Julia A; Minin, Vladimir N
2013-03-01
Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.
Approximate Bayesian computation with functional statistics.
Soubeyrand, Samuel; Carpentier, Florence; Guiton, François; Klein, Etienne K
2013-03-26
Functional statistics are commonly used to characterize spatial patterns in general and spatial genetic structures in population genetics in particular. Such functional statistics also enable the estimation of parameters of spatially explicit (and genetic) models. Recently, Approximate Bayesian Computation (ABC) has been proposed to estimate model parameters from functional statistics. However, applying ABC with functional statistics may be cumbersome because of the high dimension of the set of statistics and the dependences among them. To tackle this difficulty, we propose an ABC procedure which relies on an optimized weighted distance between observed and simulated functional statistics. We applied this procedure to a simple step model, a spatial point process characterized by its pair correlation function and a pollen dispersal model characterized by genetic differentiation as a function of distance. These applications showed how the optimized weighted distance improved estimation accuracy. In the discussion, we consider the application of the proposed ABC procedure to functional statistics characterizing non-spatial processes.
Bayesian approach to inverse statistical mechanics.
Habeck, Michael
2014-05-01
Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.
Bayesian Statistics in Software Engineering: Practical Guide and Case Studies
Furia, Carlo A.
2016-01-01
Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics has dominated data analysis in the past; but Bayesian statistics is making a comeback at the forefront of science. In this paper, we give a practical overview of Bayesian statistics and illustrate its main advantages over frequentist statistics for the kinds of analyses that are common in empirical software engineering, where frequentist statistics still is standard. We...
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Bayesian Cosmological inference beyond statistical isotropy
Souradeep, Tarun; Das, Santanu; Wandelt, Benjamin
2016-10-01
With advent of rich data sets, computationally challenge of inference in cosmology has relied on stochastic sampling method. First, I review the widely used MCMC approach used to infer cosmological parameters and present a adaptive improved implementation SCoPE developed by our group. Next, I present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method with a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. The general, principled, approach to a Bayesian inference of the covariance structure in a random field on a sphere presented here has huge potential for application to other many aspects of cosmology and astronomy, as well as, more distant areas of research like geosciences and climate modelling.
A Nonparametric Bayesian Approach to Seismic Hazard Modeling Using the ETAS Framework
Ross, G.
2015-12-01
The epidemic-type aftershock sequence (ETAS) model is one of the most popular tools for modeling seismicity and quantifying risk in earthquake-prone regions. Under the ETAS model, the occurrence times of earthquakes are treated as a self-exciting Poisson process where each earthquake briefly increases the probability of subsequent earthquakes occurring soon afterwards, which captures the fact that large mainshocks tend to produce long sequences of aftershocks. A triggering kernel controls the amount by which the probability increases based on the magnitude of each earthquake, and the rate at which it then decays over time. This triggering kernel is usually chosen heuristically, to match the parametric form of the modified Omori law for aftershock decay. However recent work has questioned whether this is an appropriate choice. Since the choice of kernel has a large impact on the predictions made by the ETAS model, avoiding misspecification is crucially important. We present a novel nonparametric version of ETAS which avoids making parametric assumptions, and instead learns the correct specification from the data itself. Our approach is based on the Dirichlet process, which is a modern class of Bayesian prior distribution which allows for efficient inference over an infinite dimensional space of functions. We show how our nonparametric ETAS model can be fit to data, and present results demonstrating that the fit is greatly improved compared to the standard parametric specification. Additionally, we explain how our model can be used to perform probabilistic declustering of earthquake catalogs, to classify earthquakes as being either aftershocks or mainshocks. and to learn the causal relations between pairs of earthquakes.
Connolly, Brian; Cohen, K Bretonnel; Santel, Daniel; Bayram, Ulya; Pestian, John
2017-08-07
Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.
Examples of the Application of Nonparametric Information Geometry to Statistical Physics
Directory of Open Access Journals (Sweden)
Giovanni Pistone
2013-09-01
Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.
Macaro, Christian; Prado, Raquel
2014-01-01
We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Directory of Open Access Journals (Sweden)
Antonio Canale
2017-06-01
Full Text Available msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016. The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Pólya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016.
Modular autopilot design and development featuring Bayesian non-parametric adaptive control
Stockton, Jacob
Over the last few decades, Unmanned Aircraft Systems, or UAS, have become a critical part of the defense of our nation and the growth of the aerospace sector. UAS have a great potential for the agricultural industry, first response, and ecological monitoring. However, the wide range of applications require many mission-specific vehicle platforms. These platforms must operate reliably in a range of environments, and in presence of significant uncertainties. The accepted practice for enabling autonomously flying UAS today relies on extensive manual tuning of the UAS autopilot parameters, or time consuming approximate modeling of the dynamics of the UAS. These methods may lead to overly conservative controllers or excessive development times. A comprehensive approach to the development of an adaptive, airframe-independent controller is presented. The control algorithm leverages a nonparametric, Bayesian approach to adaptation, and is used as a cornerstone for the development of a new modular autopilot. Promising simulation results are presented for the adaptive controller, as well as, flight test results for the modular autopilot.
Das, Moumita; Bhattacharya, Sourabh
2014-01-01
In this paper, using kernel convolution of order based dependent Dirichlet process (Griffin & Steel (2006)) we construct a nonstationary, nonseparable, nonparametric space-time process, which, as we show, satisfies desirable properties, and includes the stationary, separable, parametric processes as special cases. We also investigate the smoothness properties of our proposed model. Since our model entails an infinite random series, for Bayesian model fitting purpose we must either truncate th...
National Research Council Canada - National Science Library
Arbel, Julyan; King, Catherine K; Raymond, Ben; Winsley, Tristrom; Mengersen, Kerrie L
2015-01-01
...‐species toxicity tests. In this study, we apply a Bayesian nonparametric model to a soil microbial data set acquired across a hydrocarbon contamination gradient at the site of a fuel spill in Antarctica...
STATISTICAL BAYESIAN ANALYSIS OF EXPERIMENTAL DATA.
Directory of Open Access Journals (Sweden)
AHLAM LABDAOUI
2012-12-01
Full Text Available The Bayesian researcher should know the basic ideas underlying Bayesian methodology and the computational tools used in modern Bayesian econometrics. Some of the most important methods of posterior simulation are Monte Carlo integration, importance sampling, Gibbs sampling and the Metropolis- Hastings algorithm. The Bayesian should also be able to put the theory and computational tools together in the context of substantive empirical problems. We focus primarily on recent developments in Bayesian computation. Then we focus on particular models. Inevitably, we combine theory and computation in the context of particular models. Although we have tried to be reasonably complete in terms of covering the basic ideas of Bayesian theory and the computational tools most commonly used by the Bayesian, there is no way we can cover all the classes of models used in econometrics. We propose to the user of analysis of variance and linear regression model.
Bayesian credible interval construction for Poisson statistics
Institute of Scientific and Technical Information of China (English)
ZHU Yong-Sheng
2008-01-01
The construction of the Bayesian credible (confidence) interval for a Poisson observable including both the signal and background with and without systematic uncertainties is presented.Introducing the conditional probability satisfying the requirement of the background not larger than the observed events to construct the Bayesian credible interval is also discussed.A Fortran routine,BPOCI,has been developed to implement the calculation.
MIRELA SECARĂ
2008-01-01
Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...
An introduction to Bayesian statistics in health psychology.
Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske
2017-09-01
The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.
Bayesian Information Criterion as an Alternative way of Statistical Inference
Directory of Open Access Journals (Sweden)
Nadejda Yu. Gubanova
2012-05-01
Full Text Available The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.
Bayesian Information Criterion as an Alternative way of Statistical Inference
Nadejda Yu. Gubanova; Simon Zh. Simavoryan
2012-01-01
The article treats Bayesian information criterion as an alternative to traditional methods of statistical inference, based on NHST. The comparison of ANOVA and BIC results for psychological experiment is discussed.
Non-parametric Bayesian human motion recognition using a single MEMS tri-axial accelerometer.
Ahmed, M Ejaz; Song, Ju Bin
2012-09-27
In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM) and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM) technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Non-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer
Directory of Open Access Journals (Sweden)
M. Ejaz Ahmed
2012-09-01
Full Text Available In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
An exact predictive recursion for Bayesian nonparametric analysis of incomplete data
Garibaldi, Ubaldo; Viarengo, Paolo
2010-01-01
This paper presents a new derivation of nonparametric distribution estimation with right-censored data. It is based on an extension of the predictive inferences to compound evidence. The estimate is recursive and exact, and no stochastic approximation is needed: it simply requires that the censored data are processed in decreasing order. Only in this case the recursion provides exact posterior predictive distributions for subsequent samples under a Dirichlet process prior. The resulting estim...
Introduction to applied Bayesian statistics and estimation for social scientists
Lynch, Scott M
2007-01-01
""Introduction to Applied Bayesian Statistics and Estimation for Social Scientists"" covers the complete process of Bayesian statistical analysis in great detail from the development of a model through the process of making statistical inference. The key feature of this book is that it covers models that are most commonly used in social science research - including the linear regression model, generalized linear models, hierarchical models, and multivariate regression models - and it thoroughly develops each real-data example in painstaking detail.The first part of the book provides a detailed
Non-parametric Estimation approach in statistical investigation of nuclear spectra
Jafarizadeh, M A; Sabri, H; Maleki, B Rashidian
2011-01-01
In this paper, Kernel Density Estimation (KDE) as a non-parametric estimation method is used to investigate statistical properties of nuclear spectra. The deviation to regular or chaotic dynamics, is exhibited by closer distances to Poisson or Wigner limits respectively which evaluated by Kullback-Leibler Divergence (KLD) measure. Spectral statistics of different sequences prepared by nuclei corresponds to three dynamical symmetry limits of Interaction Boson Model(IBM), oblate and prolate nuclei and also the pairing effect on nuclear level statistics are analyzed (with pure experimental data). KD-based estimated density function, confirm previous predictions with minimum uncertainty (evaluated with Integrate Absolute Error (IAE)) in compare to Maximum Likelihood (ML)-based method. Also, the increasing of regularity degrees of spectra due to pairing effect is reveal.
Directory of Open Access Journals (Sweden)
Urbi Garay
2016-03-01
Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S
2005-10-01
In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.
Understanding data better with Bayesian and global statistical methods
Press, W H
1996-01-01
To understand their data better, astronomers need to use statistical tools that are more advanced than traditional ``freshman lab'' statistics. As an illustration, the problem of combining apparently incompatible measurements of a quantity is presented from both the traditional, and a more sophisticated Bayesian, perspective. Explicit formulas are given for both treatments. Results are shown for the value of the Hubble Constant, and a 95% confidence interval of 66 < H0 < 82 (km/s/Mpc) is obtained.
Bayesian Semi- and Non-Parametric Models for Longitudinal Data with Multiple Membership Effects in R
Directory of Open Access Journals (Sweden)
Terrance Savitsky
2014-03-01
Full Text Available We introduce growcurves for R that performs analysis of repeated measures multiple membership (MM data. This data structure arises in studies under which an intervention is delivered to each subject through the subjects participation in a set of multiple elements that characterize the intervention. In our motivating study design under which subjects receive a group cognitive behavioral therapy (CBT treatment, an element is a group CBT session and each subject attends multiple sessions that, together, comprise the treatment. The sets of elements, or group CBT sessions, attended by subjects will partly overlap with some of those from other subjects to induce a dependence in their responses. The growcurves package offers two alternative sets of hierarchical models: 1. Separate terms are specified for multivariate subject and MM element random effects, where the subject effects are modeled under a Dirichlet process prior to produce a semi-parametric construction; 2. A single term is employed to model joint subject-by-MM effects. A fully non-parametric dependent Dirichlet process formulation allows exploration of differences in subject responses across different MM elements. This model allows for borrowing information among subjects who express similar longitudinal trajectories for flexible estimation. growcurves deploys estimation functions to perform posterior sampling under a suite of prior options. An accompanying set of plot functions allows the user to readily extract by-subject growth curves. The design approach intends to anticipate inferential goals with tools that fully extract information from repeated measures data. Computational efficiency is achieved by performing the sampling for estimation functions using compiled C++ code.
Some Bayesian statistical techniques useful in estimating frequency and density
Johnson, D.H.
1977-01-01
This paper presents some elementary applications of Bayesian statistics to problems faced by wildlife biologists. Bayesian confidence limits for frequency of occurrence are shown to be generally superior to classical confidence limits. Population density can be estimated from frequency data if the species is sparsely distributed relative to the size of the sample plot. For other situations, limits are developed based on the normal distribution and prior knowledge that the density is non-negative, which insures that the lower confidence limit is non-negative. Conditions are described under which Bayesian confidence limits are superior to those calculated with classical methods; examples are also given on how prior knowledge of the density can be used to sharpen inferences drawn from a new sample.
Lennox, Kristin P; Dahl, David B; Vannucci, Marina; Tsai, Jerry W
2009-06-01
Interest in predicting protein backbone conformational angles has prompted the development of modeling and inference procedures for bivariate angular distributions. We present a Bayesian approach to density estimation for bivariate angular data that uses a Dirichlet process mixture model and a bivariate von Mises distribution. We derive the necessary full conditional distributions to fit the model, as well as the details for sampling from the posterior predictive distribution. We show how our density estimation method makes it possible to improve current approaches for protein structure prediction by comparing the performance of the so-called "whole" and "half" position distributions. Current methods in the field are based on whole position distributions, as density estimation for the half positions requires techniques, such as ours, that can provide good estimates for small datasets. With our method we are able to demonstrate that half position data provides a better approximation for the distribution of conformational angles at a given sequence position, therefore providing increased efficiency and accuracy in structure prediction.
Zoffoli, Luca; Ditroilo, Massimiliano; Federici, Ario; Lucertini, Francesco
2017-09-09
This study used surface electromyography (EMG) to investigate the regions and patterns of activity of the external oblique (EO), erector spinae longissimus (ES), multifidus (MU) and rectus abdominis (RA) muscles during walking (W) and pole walking (PW) performed at different speeds and grades. Eighteen healthy adults undertook W and PW on a motorized treadmill at 60% and 100% of their walk-to-run preferred transition speed at 0% and 7% treadmill grade. The Teager-Kaiser energy operator was employed to improve the muscle activity detection and statistical non-parametric mapping based on paired t-tests was used to highlight statistical differences in the EMG patterns corresponding to different trials. The activation amplitude of all trunk muscles increased at high speed, while no differences were recorded at 7% treadmill grade. ES and MU appeared to support the upper body at the heel-strike during both W and PW, with the latter resulting in elevated recruitment of EO and RA as required to control for the longer stride and the push of the pole. Accordingly, the greater activity of the abdominal muscles and the comparable intervention of the spine extensors supports the use of poles by walkers seeking higher engagement of the lower trunk region. Copyright © 2017 Elsevier Ltd. All rights reserved.
COLOR IMAGE RETRIEVAL BASED ON NON-PARAMETRIC STATISTICAL TESTS OF HYPOTHESIS
Directory of Open Access Journals (Sweden)
R. Shekhar
2016-09-01
Full Text Available A novel method for color image retrieval, based on statistical non-parametric tests such as twosample Wald Test for equality of variance and Man-Whitney U test, is proposed in this paper. The proposed method tests the deviation, i.e. distance in terms of variance between the query and target images; if the images pass the test, then it is proceeded to test the spectrum of energy, i.e. distance between the mean values of the two images; otherwise, the test is dropped. If the query and target images pass the tests then it is inferred that the two images belong to the same class, i.e. both the images are same; otherwise, it is assumed that the images belong to different classes, i.e. both images are different. The proposed method is robust for scaling and rotation, since it adjusts itself and treats either the query image or the target image is the sample of other.
Use of Bayesian statistical approach in diagnosing secondary hypertension.
Krzych, Lukasz Jerzy
2008-03-01
Bayes's theorem is predominantly used in diagnosing based on the results of various diagnostic tests. This statistical approach is intuitive in differential diagnosis as it explicitly takes into consideration data from medical history, physical examination, laboratory findings and imaging. Bayes's theorem states that the probability of disease occurrence (or occurrence of other outcome) after new information is obtained, called a posteriori probability, depends directly on an a priori probability and the value of likelihood ratio associated with a given test result. This paper describes basic Bayesian analysis in relation to the diagnosis of two types of secondary hypertension; primary aldosteronism and pheochromocytoma. This choice is based on two facts; primary aldosteronism is believed to be the most common and the most commonly detected cause of symptomatic hypertension and pheochromocytoma is thought to have rapid progress and stormy clinical course. This article aims to draw physicians' attention to and increase the knowledge of Bayesian analysis, and to describe its use in everyday clinical decision making. On the basis of this theorem's foundations, the discussion in relation to the issue of differential diagnosis between physicians, their patients, and medical students should also improve. When used in practice, one should be aware, however, of Bayesian analysis limitations concerning the diagnostic test application and limited knowledge of diagnostic test accuracy, and insecure or faulty a priori probability estimates.
Directory of Open Access Journals (Sweden)
Zhang Xiaohua
2003-11-01
Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
Bayesians versus frequentists a philosophical debate on statistical reasoning
Vallverdú, Jordi
2016-01-01
This book analyzes the origins of statistical thinking as well as its related philosophical questions, such as causality, determinism or chance. Bayesian and frequentist approaches are subjected to a historical, cognitive and epistemological analysis, making it possible to not only compare the two competing theories, but to also find a potential solution. The work pursues a naturalistic approach, proceeding from the existence of numerosity in natural environments to the existence of contemporary formulas and methodologies to heuristic pragmatism, a concept introduced in the book’s final section. This monograph will be of interest to philosophers and historians of science and students in related fields. Despite the mathematical nature of the topic, no statistical background is required, making the book a valuable read for anyone interested in the history of statistics and human cognition.
Bayesian statistic methods and theri application in probabilistic simulation models
Directory of Open Access Journals (Sweden)
Sergio Iannazzo
2007-03-01
Full Text Available Bayesian statistic methods are facing a rapidly growing level of interest and acceptance in the field of health economics. The reasons of this success are probably to be found on the theoretical fundaments of the discipline that make these techniques more appealing to decision analysis. To this point should be added the modern IT progress that has developed different flexible and powerful statistical software framework. Among them probably one of the most noticeably is the BUGS language project and its standalone application for MS Windows WinBUGS. Scope of this paper is to introduce the subject and to show some interesting applications of WinBUGS in developing complex economical models based on Markov chains. The advantages of this approach reside on the elegance of the code produced and in its capability to easily develop probabilistic simulations. Moreover an example of the integration of bayesian inference models in a Markov model is shown. This last feature let the analyst conduce statistical analyses on the available sources of evidence and exploit them directly as inputs in the economic model.
Chung, Clement; Emili, Andrew; Frey, Brendan J
2013-04-01
Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions. The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study. Our algorithm is implemented in
Bayesian statistics and information fusion for GPS-denied navigation
Copp, Brian Lee
It is well known that satellite navigation systems are vulnerable to disruption due to jamming, spoofing, or obstruction of the signal. The desire for robust navigation of aircraft in GPS-denied environments has motivated the development of feature-aided navigation systems, in which measurements of environmental features are used to complement the dead reckoning solution produced by an inertial navigation system. Examples of environmental features which can be exploited for navigation include star positions, terrain elevation, terrestrial wireless signals, and features extracted from photographic data. Feature-aided navigation represents a particularly challenging estimation problem because the measurements are often strongly nonlinear, and the quality of the navigation solution is limited by the knowledge of nuisance parameters which may be difficult to model accurately. As a result, integration approaches based on the Kalman filter and its variants may fail to give adequate performance. This project develops a framework for the integration of feature-aided navigation techniques using Bayesian statistics. In this approach, the probability density function for aircraft horizontal position (latitude and longitude) is approximated by a two-dimensional point mass function defined on a rectangular grid. Nuisance parameters are estimated using a hypothesis based approach (Multiple Model Adaptive Estimation) which continuously maintains an accurate probability density even in the presence of strong nonlinearities. The effectiveness of the proposed approach is illustrated by the simulated use of terrain referenced navigation and wireless time-of-arrival positioning to estimate a reference aircraft trajectory. Monte Carlo simulations have shown that accurate position estimates can be obtained in terrain referenced navigation even with a strongly nonlinear altitude bias. The integration of terrain referenced and wireless time-of-arrival measurements is described along with
Bayesian inference on the sphere beyond statistical isotropy
Das, Santanu; Souradeep, Tarun
2015-01-01
We present a general method for Bayesian inference of the underlying covariance structure of random fields on a sphere. We employ the Bipolar Spherical Harmonic (BipoSH) representation of general covariance structure on the sphere. We illustrate the efficacy of the method as a principled approach to assess violation of statistical isotropy (SI) in the sky maps of Cosmic Microwave Background (CMB) fluctuations. SI violation in observed CMB maps arise due to known physical effects such as Doppler boost and weak lensing; yet unknown theoretical possibilities like cosmic topology and subtle violations of the cosmological principle, as well as, expected observational artefacts of scanning the sky with a non-circular beam, masking, foreground residuals, anisotropic noise, etc. We explicitly demonstrate the recovery of the input SI violation signals with their full statistics in simulated CMB maps. Our formalism easily adapts to exploring parametric physical models with non-SI covariance, as we illustrate for the in...
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test
Fully Bayesian tests of neutrality using genealogical summary statistics
Directory of Open Access Journals (Sweden)
Drummond Alexei J
2008-10-01
Full Text Available Abstract Background Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Results Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Conclusion Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
STATISTICAL ANALYSIS OF THE TM- MODEL VIA BAYESIAN APPROACH
Directory of Open Access Journals (Sweden)
Muhammad Aslam
2012-11-01
Full Text Available The method of paired comparisons calls for the comparison of treatments presented in pairs to judges who prefer the better one based on their sensory evaluations. Thurstone (1927 and Mosteller (1951 employ the method of maximum likelihood to estimate the parameters of the Thurstone-Mosteller model for the paired comparisons. A Bayesian analysis of the said model using the non-informative reference (Jeffreys prior is presented in this study. The posterior estimates (means and joint modes of the parameters and the posterior probabilities comparing the two parameters are obtained for the analysis. The predictive probabilities that one treatment (Ti in preferred to any other treatment (Tj in a future single comparison are also computed. In addition, the graphs of the marginal posterior distributions of the individual parameter are drawn. The appropriateness of the model is also tested using the Chi-Square test statistic.
Bayesian Analysis of Multiple Populations I: Statistical and Computational Methods
Stenning, D C; Robinson, E; van Dyk, D A; von Hippel, T; Sarajedini, A; Stein, N
2016-01-01
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and al...
Applications of non-parametric statistics and analysis of variance on sample variances
Myers, R. H.
1981-01-01
Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.
t-tests, non-parametric tests, and large studies—a paradox of statistical practice?
Directory of Open Access Journals (Sweden)
Fagerland Morten W
2012-06-01
Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.
Konijn, Elly A.; van de Schoot, Rens; Winter, Sonja D.; Ferguson, Christopher J.
2015-01-01
The present paper argues that an important cause of publication bias resides in traditional frequentist statistics forcing binary decisions. An alternative approach through Bayesian statistics provides various degrees of support for any hypothesis allowing balanced decisions and proper null hypothes
The application of non-parametric statistical techniques to an ALARA programme.
Moon, J H; Cho, Y H; Kang, C S
2001-01-01
For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.
Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
Chakrabarty, Dalia
2013-01-01
In lieu of direct detection of dark matter, estimation of the distribution of the gravitational mass in distant galaxies is of crucial importance in Astrophysics. Typically, such estimation is performed using small samples of noisy, partially missing measurements - only some of the three components of the velocity and location vectors of individual particles that live in the galaxy are measurable. Such limitations of the available data in turn demands that simplifying model assumptions be undertaken. Thus, assuming that the phase space of a galaxy manifests simple symmetries - such as isotropy - allows for the learning of the density of the gravitational mass in galaxies. This is equivalent to assuming that the phase space $pdf$ from which the velocity and location vectors of galactic particles are sampled from, is an isotropic function of these vectors. We present a new non-parametric test of hypothesis that tests for relative support in two or more measured data sets of disparate sizes, for the undertaken m...
Directory of Open Access Journals (Sweden)
Paul H. Grawe
2016-01-01
Full Text Available Sydney Siegel and N. John Castellan, Jr. Nonparametric Statistics for the Behavioral Sciences, Second Edition (New York NY: McGraw Hill, 1988. 399 pp. ISBN: 9780070573574. Almost 60 years ago, Sidney Siegel wrote a stellar book helping anyone in academe to use nonparametric statistics, but ironically, 60 years after that achievement, American higher education confesses itself to be in the worst Quantitative Teaching Crisis of all time. The key clue to solving that crisis may be in Siegel and Castellan’s title, Nonparametric Statistics for the Behavioral Sciences, which quietly and perhaps unconsciously excludes the Humanities. Yet it is in humanistic realities that students read, write, and think. This book review considers what could be done if the Humanities were made aware of the enormous power of nonparametric statistics for advancing both their disciplines and their students’ ability to think quantitatively. A potentially revolutionary, humanistic, nonparametric finding is considered in detail along with a brief account of tens of humanistic discoveries deriving from Siegel and Castellan’s impetus.
Directory of Open Access Journals (Sweden)
Jun Zhang
2013-12-01
Full Text Available Divergence functions are the non-symmetric “distance” on the manifold, Μθ, of parametric probability density functions over a measure space, (Χ,μ. Classical information geometry prescribes, on Μθ: (i a Riemannian metric given by the Fisher information; (ii a pair of dual connections (giving rise to the family of α-connections that preserve the metric under parallel transport by their joint actions; and (iii a family of divergence functions ( α-divergence defined on Μθ x Μθ, which induce the metric and the dual connections. Here, we construct an extension of this differential geometric structure from Μθ (that of parametric probability density functions to the manifold, Μ, of non-parametric functions on X, removing the positivity and normalization constraints. The generalized Fisher information and α-connections on M are induced by an α-parameterized family of divergence functions, reflecting the fundamental convex inequality associated with any smooth and strictly convex function. The infinite-dimensional manifold, M, has zero curvature for all these α-connections; hence, the generally non-zero curvature of M can be interpreted as arising from an embedding of Μθ into Μ. Furthermore, when a parametric model (after a monotonic scaling forms an affine submanifold, its natural and expectation parameters form biorthogonal coordinates, and such a submanifold is dually flat for α = ± 1, generalizing the results of Amari’s α-embedding. The present analysis illuminates two different types of duality in information geometry, one concerning the referential status of a point (measurable function expressed in the divergence function (“referential duality” and the other concerning its representation under an arbitrary monotone scaling (“representational duality”.
Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems
Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.
2007-01-01
Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al
Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems
Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.
2007-01-01
Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al
Institute of Scientific and Technical Information of China (English)
陈亮; 程汉文; 吴乐南
2009-01-01
依据星座图采用非参数贝叶斯方法对多元相移键控(MPSK)信号进行调制识别.将未知信噪比(SNR)水平的MPSK信号看成复平面内多个未知均值和方差的高斯分布依照一定的比例混合而成,利用非参数贝叶斯推断方法进行密度估计,实现对MPSK信号分类目的.推断过程中,引入Dirichlet过程作为混合比例因子的先验分布,结合正态逆Wishart(NIW)分布作为均值和方差的先验分布,根据接收信号,利用Gibbs采样的MCMC(Monte Carlo Markov chain)随机采样算法,不断调整混合比例因子、均值和方差.通过多次迭代,得到对调制信号的密度估计.仿真表明,在SNR>5 dB,码元数目大于1 600时,2/4/8PSK的识别率超过了95%.%A nonparametric Bayesian method is presented to classify the MPSK (M-ary phase shift keying) signals. The MPSK signals with unknown signal noise ratios (SNRs) are modeled as a Gaussian mixture model with unknown means and covariances in the constellation plane, and a clustering method is proposed to estimate the probability density of the MPSK signals. The method is based on the nonparametric Bayesian inference, which introduces the Dirichlet process as the prior probability of the mixture coefficient, and applies a normal inverse Wishart (NIW) distribution as the prior probability of the unknown mean and covariance. Then, according to the received signals, the parameters are adjusted by the Monte Carlo Markov chain (MCMC) random sampling algorithm. By iterations, the density estimation of the MPSK signals can be estimated. Simulation results show that the correct recognition ratio of 2/4/8PSK is greater than 95% under the condition that SNR >5 dB and 1 600 symbols are used in this method.
Ice Shelf Modeling: A Cross-Polar Bayesian Statistical Approach
Kirchner, N.; Furrer, R.; Jakobsson, M.; Zwally, H. J.
2010-12-01
Ice streams interlink glacial terrestrial and marine environments: embedded in a grounded inland ice such as the Antarctic Ice Sheet or the paleo ice sheets covering extensive parts of the Eurasian and Amerasian Arctic respectively, ice streams are major drainage agents facilitating the discharge of substantial portions of continental ice into the ocean. At their seaward side, ice streams can either extend onto the ocean as floating ice tongues (such as the Drygalsky Ice Tongue/East Antarctica), or feed large ice shelves (as is the case for e.g. the Siple Coast and the Ross Ice Shelf/West Antarctica). The flow behavior of ice streams has been recognized to be intimately linked with configurational changes in their attached ice shelves; in particular, ice shelf disintegration is associated with rapid ice stream retreat and increased mass discharge from the continental ice mass, contributing eventually to sea level rise. Investigations of ice stream retreat mechanism are however incomplete if based on terrestrial records only: rather, the dynamics of ice shelves (and, eventually, the impact of the ocean on the latter) must be accounted for. However, since floating ice shelves leave hardly any traces behind when melting, uncertainty regarding the spatio-temporal distribution and evolution of ice shelves in times prior to instrumented and recorded observation is high, calling thus for a statistical modeling approach. Complementing ongoing large-scale numerical modeling efforts (Pollard & DeConto, 2009), we model the configuration of ice shelves by using a Bayesian Hiearchial Modeling (BHM) approach. We adopt a cross-polar perspective accounting for the fact that currently, ice shelves exist mainly along the coastline of Antarctica (and are virtually non-existing in the Arctic), while Arctic Ocean ice shelves repeatedly impacted the Arctic ocean basin during former glacial periods. Modeled Arctic ocean ice shelf configurations are compared with geological spatial
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
Directory of Open Access Journals (Sweden)
Mohamed Khalaf-Allah
2008-01-01
Full Text Available The mobile terminal positioning problem is categorized into three different types according to the availability of (1 initial accurate location information and (2 motion measurement data.Location estimation refers to the mobile positioning problem when both the initial location and motion measurement data are not available. If both are available, the positioning problem is referred to as position tracking. When only motion measurements are available, the problem is known as global localization. These positioning problems were solved within the Bayesian filtering framework. Filter derivation and implementation algorithms are provided with emphasis on the mapping approach. The radio maps of the experimental area have been created by a 3D deterministic radio propagation tool with a grid resolution of 5Ã¢Â€Â‰m. Real-world experimentation was conducted in a GSM network deployed in a semiurban environment in order to investigate the performance of the different positioning algorithms.
Incorporating Nonparametric Statistics into Delphi Studies in Library and Information Science
Ju, Boryung; Jin, Tao
2013-01-01
Introduction: The Delphi technique is widely used in library and information science research. However, many researchers in the field fail to employ standard statistical tests when using this technique. This makes the technique vulnerable to criticisms of its reliability and validity. The general goal of this article is to explore how…
Rohée, E.; Coulon, R.; Carrel, F.; Dautremer, T.; Barat, E.; Montagu, T.; Normand, S.; Jammes, C.
2016-11-01
Radionuclide identification and quantification are a serious concern for many applications as for in situ monitoring at nuclear facilities, laboratory analysis, special nuclear materials detection, environmental monitoring, and waste measurements. High resolution gamma-ray spectrometry based on high purity germanium diode detectors is the best solution available for isotopic identification. Over the last decades, methods have been developed to improve gamma spectra analysis. However, some difficulties remain in the analysis when full energy peaks are folded together with high ratio between their amplitudes, and when the Compton background is much larger compared to the signal of a single peak. In this context, this study deals with the comparison between a conventional analysis based on "iterative peak fitting deconvolution" method and a "nonparametric Bayesian deconvolution" approach developed by the CEA LIST and implemented into the SINBAD code. The iterative peak fit deconvolution is used in this study as a reference method largely validated by industrial standards to unfold complex spectra from HPGe detectors. Complex cases of spectra are studied from IAEA benchmark protocol tests and with measured spectra. The SINBAD code shows promising deconvolution capabilities compared to the conventional method without any expert parameter fine tuning.
Non-parametric group-level statistics for source-resolved ERP analysis.
Lee, Clement; Miyakoshi, Makoto; Delorme, Arnaud; Cauwenberghs, Gert; Makeig, Scott
2015-01-01
We have developed a new statistical framework for group-level event-related potential (ERP) analysis in EEGLAB. The framework calculates the variance of scalp channel signals accounted for by the activity of homogeneous clusters of sources found by independent component analysis (ICA). When ICA data decomposition is performed on each subject's data separately, functionally equivalent ICs can be grouped into EEGLAB clusters. Here, we report a new addition (statPvaf) to the EEGLAB plug-in std_envtopo to enable inferential statistics on main effects and interactions in event related potentials (ERPs) of independent component (IC) processes at the group level. We demonstrate the use of the updated plug-in on simulated and actual EEG data.
A Java program for non-parametric statistic comparison of community structure
Directory of Open Access Journals (Sweden)
WenJun Zhang
2011-09-01
Full Text Available The Java algorithm to statistically compare structure difference of two communities was presented in this study. Euclidean distance, Manhattan distance, Pearson correlation, Point correlation, quadratic correlation and Jaccard coefficient were included in the algorithm. The algorithm was used to compare rice arthropod communities in Pearl River Delta, China, and the results showed that the family composition of arthropods for Guangzhou, Zhongshan, Zhuhai, and Dongguan are not significantly different.
Embracing Uncertainty: The Interface of Bayesian Statistics and Cognitive Psychology
Directory of Open Access Journals (Sweden)
Judith L. Anderson
1998-06-01
Full Text Available Ecologists working in conservation and resource management are discovering the importance of using Bayesian analytic methods to deal explicitly with uncertainty in data analyses and decision making. However, Bayesian procedures require, as inputs and outputs, an idea that is problematic for the human brain: the probability of a hypothesis ("single-event probability". I describe several cognitive concepts closely related to single-event probabilities, and discuss how their interchangeability in the human mind results in "cognitive illusions," apparent deficits in reasoning about uncertainty. Each cognitive illusion implies specific possible pitfalls for the use of single-event probabilities in ecology and resource management. I then discuss recent research in cognitive psychology showing that simple tactics of communication, suggested by an evolutionary perspective on human cognition, help people to process uncertain information more effectively as they read and talk about probabilities. In addition, I suggest that carefully considered standards for methodology and conventions for presentation may also make Bayesian analyses easier to understand.
DEFF Research Database (Denmark)
A methodology is presented that combines modelling based on first principles and data based modelling into a modelling cycle that facilitates fast decision-making based on statistical methods. A strong feature of this methodology is that given a first principles model along with process data, the......, the corresponding modelling cycle model of the given system for a given purpose. A computer-aided tool, which integrates the elements of the modelling cycle, is also presented, and an example is given of modelling a fed-batch bioreactor....
An improved Bayesian matting method based on image statistic characteristics
Sun, Wei; Luo, Siwei; Wu, Lina
2015-03-01
Image matting is an important task in image and video editing and has been studied for more than 30 years. In this paper we propose an improved interactive matting method. Starting from a coarse user-guided trimap, we first perform a color estimation based on texture and color information and use the result to refine the original trimap. Then with the new trimap, we apply soft matting process which is improved Bayesian matting with smoothness constraints. Experimental results on natural image show that this method is useful, especially for the images have similar texture feature in the background or the images which is hard to give a precise trimap.
Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R
2014-12-10
Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.
关于贝叶斯统计之我见%My view on Bayesian statistics
Institute of Scientific and Technical Information of China (English)
殷羽
2014-01-01
Bayesian statistics and classical statistics are the two modern mathematical statistics, two university school debate plays a positive role in promoting the development of modern statistical theory. Through comparison of Bayesian statistics and classical statistics, deepen the understanding of Bayesian statistics. This paper also from aspects of economic research, actuarial insurance to introduce the Bayesian statistics.%贝叶斯统计和经典统计是现代数理统计的两大学派，两大学派的争论对现代统计理论的发展起到了积极的促进作用。本文通过贝叶斯统计和经典统计的比较，加深了人们对贝叶斯统计的认识。本文还从经济研究、精算保险研究两个方面介绍了贝叶斯统计的应用。
Directory of Open Access Journals (Sweden)
José L. Valencia
2015-11-01
Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.
Statistical Mechanical Development of a Sparse Bayesian Classifier
Uda, Shinsuke; Kabashima, Yoshiyuki
2005-08-01
The demand for extracting rules from high dimensional real world data is increasing in various fields. However, the possible redundancy of such data sometimes makes it difficult to obtain a good generalization ability for novel samples. To resolve this problem, we provide a scheme that reduces the effective dimensions of data by pruning redundant components for bicategorical classification based on the Bayesian framework. First, the potential of the proposed method is confirmed in ideal situations using the replica method. Unfortunately, performing the scheme exactly is computationally difficult. So, we next develop a tractable approximation algorithm, which turns out to offer nearly optimal performance in ideal cases when the system size is large. Finally, the efficacy of the developed classifier is experimentally examined for a real world problem of colon cancer classification, which shows that the developed method can be practically useful.
Bayesian statistics for the calibration of the LISA Pathfinder experiment
Armano, M.; Audley, H.; Auger, G.; Binetruy, P.; Born, M.; Bortoluzzi, D.; Brandt, N.; Bursi, A.; Caleno, M.; Cavalleri, A.; Cesarini, A.; Cruise, M.; Danzmann, K.; Diepholz, I.; Dolesi, R.; Dunbar, N.; Ferraioli, L.; Ferroni, V.; Fitzsimons, E.; Freschi, M.; García Marirrodriga, C.; Gerndt, R.; Gesa, L.; Gibert, F.; Giardini, D.; Giusteri, R.; Grimani, C.; Harrison, I.; Heinzel, G.; Hewitson, M.; Hollington, D.; Hueller, M.; Huesler, J.; Inchauspé, H.; Jennrich, O.; Jetzer, P.; Johlander, B.; Karnesis, N.; Kaune, B.; Korsakova, N.; Killow, C.; Lloro, I.; Maarschalkerweerd, R.; Madden, S.; Mance, D.; Martin, V.; Martin-Porqueras, F.; Mateos, I.; McNamara, P.; Mendes, J.; Mitchell, E.; Moroni, A.; Nofrarias, M.; Paczkowski, S.; Perreur-Lloyd, M.; Pivato, P.; Plagnol, E.; Prat, P.; Ragnit, U.; Ramos-Castro, J.; Reiche, J.; Romera Perez, J. A.; Robertson, D.; Rozemeijer, H.; Russano, G.; Sarra, P.; Schleicher, A.; Slutsky, J.; Sopuerta, C. F.; Sumner, T.; Texier, D.; Thorpe, J.; Trenkel, C.; Tu, H. B.; Vitale, S.; Wanner, G.; Ward, H.; Waschke, S.; Wass, P.; Wealthy, D.; Wen, S.; Weber, W.; Wittchen, A.; Zanoni, C.; Ziegler, T.; Zweifel, P.
2015-05-01
The main goal of LISA Pathfinder (LPF) mission is to estimate the acceleration noise models of the overall LISA Technology Package (LTP) experiment on-board. This will be of crucial importance for the future space-based Gravitational-Wave (GW) detectors, like eLISA. Here, we present the Bayesian analysis framework to process the planned system identification experiments designed for that purpose. In particular, we focus on the analysis strategies to predict the accuracy of the parameters that describe the system in all degrees of freedom. The data sets were generated during the latest operational simulations organised by the data analysis team and this work is part of the LTPDA Matlab toolbox.
Preferential sampling and Bayesian geostatistics: Statistical modeling and examples.
Cecconi, Lorenzo; Grisotto, Laura; Catelan, Dolores; Lagazio, Corrado; Berrocal, Veronica; Biggeri, Annibale
2016-08-01
Preferential sampling refers to any situation in which the spatial process and the sampling locations are not stochastically independent. In this paper, we present two examples of geostatistical analysis in which the usual assumption of stochastic independence between the point process and the measurement process is violated. To account for preferential sampling, we specify a flexible and general Bayesian geostatistical model that includes a shared spatial random component. We apply the proposed model to two different case studies that allow us to highlight three different modeling and inferential aspects of geostatistical modeling under preferential sampling: (1) continuous or finite spatial sampling frame; (2) underlying causal model and relevant covariates; and (3) inferential goals related to mean prediction surface or prediction uncertainty.
Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants
Jin, Ick Hoon
2014-03-01
Statistical inference for the models with intractable normalizing constants has attracted much attention. During the past two decades, various approximation- or simulation-based methods have been proposed for the problem, such as the Monte Carlo maximum likelihood method and the auxiliary variable Markov chain Monte Carlo methods. The Bayesian stochastic approximation Monte Carlo algorithm specifically addresses this problem: It works by sampling from a sequence of approximate distributions with their average converging to the target posterior distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm. A strong law of large numbers is established for the Bayesian stochastic approximation Monte Carlo estimator under mild conditions. Compared to the Monte Carlo maximum likelihood method, the Bayesian stochastic approximation Monte Carlo algorithm is more robust to the initial guess of model parameters. Compared to the auxiliary variable MCMC methods, the Bayesian stochastic approximation Monte Carlo algorithm avoids the requirement for perfect samples, and thus can be applied to many models for which perfect sampling is not available or very expensive. The Bayesian stochastic approximation Monte Carlo algorithm also provides a general framework for approximate Bayesian analysis. © 2012 Elsevier B.V. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Kwag, Shinyoung [North Carolina State University, Raleigh, NC 27695 (United States); Korea Atomic Energy Research Institute, Daejeon 305-353 (Korea, Republic of); Gupta, Abhinav, E-mail: agupta1@ncsu.edu [North Carolina State University, Raleigh, NC 27695 (United States)
2017-04-15
Highlights: • This study presents the development of Bayesian framework for probabilistic risk assessment (PRA) of structural systems under multiple hazards. • The concepts of Bayesian network and Bayesian inference are combined by mapping the traditionally used fault trees into a Bayesian network. • The proposed mapping allows for consideration of dependencies as well as correlations between events. • Incorporation of Bayesian inference permits a novel way for exploration of a scenario that is likely to result in a system level “vulnerability.” - Abstract: Conventional probabilistic risk assessment (PRA) methodologies (USNRC, 1983; IAEA, 1992; EPRI, 1994; Ellingwood, 2001) conduct risk assessment for different external hazards by considering each hazard separately and independent of each other. The risk metric for a specific hazard is evaluated by a convolution of the fragility and the hazard curves. The fragility curve for basic event is obtained by using empirical, experimental, and/or numerical simulation data for a particular hazard. Treating each hazard as an independently can be inappropriate in some cases as certain hazards are statistically correlated or dependent. Examples of such correlated events include but are not limited to flooding induced fire, seismically induced internal or external flooding, or even seismically induced fire. In the current practice, system level risk and consequence sequences are typically calculated using logic trees to express the causative relationship between events. In this paper, we present the results from a study on multi-hazard risk assessment that is conducted using a Bayesian network (BN) with Bayesian inference. The framework can consider statistical dependencies among risks from multiple hazards, allows updating by considering the newly available data/information at any level, and provide a novel way to explore alternative failure scenarios that may exist due to vulnerabilities.
Statistical assignment of DNA sequences using Bayesian phylogenetics
DEFF Research Database (Denmark)
Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.;
2008-01-01
-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....
Bayesian statistical analysis of censored data in geotechnical engineering
DEFF Research Database (Denmark)
Ditlevsen, Ove Dalager; Tarp-Johansen, Niels Jacob; Denver, Hans
2000-01-01
The geotechnical engineer is often faced with the problem ofhow to assess the statistical properties of a soil parameter on the basis ofa sample measured in-situ or in the laboratory with the defect that somevalues have been replaced by interval bounds because the corresponding soilparameter values...
Applied Bayesian statistical studies in biology and medicine
D’Amore, G; Scalfari, F
2004-01-01
It was written on another occasion· that "It is apparent that the scientific culture, if one means production of scientific papers, is growing exponentially, and chaotically, in almost every field of investigation". The biomedical sciences sensu lato and mathematical statistics are no exceptions. One might say then, and with good reason, that another collection of bio statistical papers would only add to the overflow and cause even more confusion. Nevertheless, this book may be greeted with some interest if we state that most of the papers in it are the result of a collaboration between biologists and statisticians, and partly the product of the Summer School th "Statistical Inference in Human Biology" which reaches its 10 edition in 2003 (information about the School can be obtained at the Web site http://www2. stat. unibo. itleventilSito%20scuolalindex. htm). is common experience - and not only This is rather important. Indeed, it in Italy - that encounters between statisticians and researchers are spora...
DEFF Research Database (Denmark)
Pedersen, Thorkild Find
2003-01-01
Rotating and reciprocating mechanical machines emit acoustic noise and vibrations when they operate. Typically, the noise and vibrations are concentrated in narrow frequency bands related to the running speed of the machine. The frequency of the running speed is referred to as the fundamental...... of an adaptive comb filter is derived for tracking non-stationary signals. The estimation problem is then rephrased in terms of the Bayesian statistical framework. In the Bayesian framework both parameters and observations are considered stochastic processes. The result of the estimation is an expression...
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Choy, Samantha Low; O'Leary, Rebecca; Mengersen, Kerrie
2009-01-01
Bayesian statistical modeling has several benefits within an ecological context. In particular, when observed data are limited in sample size or representativeness, then the Bayesian framework provides a mechanism to combine observed data with other "prior" information. Prior information may be obtained from earlier studies, or in their absence, from expert knowledge. This use of the Bayesian framework reflects the scientific "learning cycle," where prior or initial estimates are updated when new data become available. In this paper we outline a framework for statistical design of expert elicitation processes for quantifying such expert knowledge, in a form suitable for input as prior information into Bayesian models. We identify six key elements: determining the purpose and motivation for using prior information; specifying the relevant expert knowledge available; formulating the statistical model; designing effective and efficient numerical encoding; managing uncertainty; and designing a practical elicitation protocol. We demonstrate this framework applies to a variety of situations, with two examples from the ecological literature and three from our experience. Analysis of these examples reveals several recurring important issues affecting practical design of elicitation in ecological problems.
Johnson, Eric D; Tubau, Elisabet
2016-09-27
Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.
Bayesian inference – a way to combine statistical data and semantic analysis meaningfully
Directory of Open Access Journals (Sweden)
Eila Lindfors
2011-11-01
Full Text Available This article focuses on presenting the possibilities of Bayesian modelling (Finite Mixture Modelling in the semantic analysis of statistically modelled data. The probability of a hypothesis in relation to the data available is an important question in inductive reasoning. Bayesian modelling allows the researcher to use many models at a time and provides tools to evaluate the goodness of different models. The researcher should always be aware that there is no such thing as the exact probability of an exact event. This is the reason for using probabilistic models. Each model presents a different perspective on the phenomenon in focus, and the researcher has to choose the most probable model with a view to previous research and the knowledge available.The idea of Bayesian modelling is illustrated here by presenting two different sets of data, one from craft science research (n=167 and the other (n=63 from educational research (Lindfors, 2007, 2002. The principles of how to build models and how to combine different profiles are described in the light of the research mentioned.Bayesian modelling is an analysis based on calculating probabilities in relation to a specific set of quantitative data. It is a tool for handling data and interpreting it semantically. The reliability of the analysis arises from an argumentation of which model can be selected from the model space as the basis for an interpretation, and on which arguments.Keywords: method, sloyd, Bayesian modelling, student teachersURN:NBN:no-29959
Bayesian Bigot? Statistical Discrimination, Stereotypes, and Employer Decision Making.
Pager, Devah; Karafin, Diana
2009-01-01
Much of the debate over the underlying causes of discrimination centers on the rationality of employer decision making. Economic models of statistical discrimination emphasize the cognitive utility of group estimates as a means of dealing with the problems of uncertainty. Sociological and social-psychological models, by contrast, question the accuracy of group-level attributions. Although mean differences may exist between groups on productivity-related characteristics, these differences are often inflated in their application, leading to much larger differences in individual evaluations than would be warranted by actual group-level trait distributions. In this study, the authors examine the nature of employer attitudes about black and white workers and the extent to which these views are calibrated against their direct experiences with workers from each group. They use data from fifty-five in-depth interviews with hiring managers to explore employers' group-level attributions and their direct observations to develop a model of attitude formation and employer learning.
Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea
2016-01-01
In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used
Statistical detection of EEG synchrony using empirical bayesian inference.
Directory of Open Access Journals (Sweden)
Archana K Singh
Full Text Available There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001 for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Reasoning with data an introduction to traditional and Bayesian statistics using R
Stanton, Jeffrey M
2017-01-01
Engaging and accessible, this book teaches readers how to use inferential statistical thinking to check their assumptions, assess evidence about their beliefs, and avoid overinterpreting results that may look more promising than they really are. It provides step-by-step guidance for using both classical (frequentist) and Bayesian approaches to inference. Statistical techniques covered side by side from both frequentist and Bayesian approaches include hypothesis testing, replication, analysis of variance, calculation of effect sizes, regression, time series analysis, and more. Students also get a complete introduction to the open-source R programming language and its key packages. Throughout the text, simple commands in R demonstrate essential data analysis skills using real-data examples. The companion website provides annotated R code for the book's examples, in-class exercises, supplemental reading lists, and links to online videos, interactive materials, and other resources.
Institute of Scientific and Technical Information of China (English)
Jongbin Im; Jungsun Park
2013-01-01
This paper focuses on a method to solve structural optimization problems using particle swarm optimization (PSO),surrogate models and Bayesian statistics.PSO is a random/stochastic search algorithm designed to find the global optimum.However,PSO needs many evaluations compared to gradient-based optimization.This means PSO increases the analysis costs of structural optimization.One of the methods to reduce computing costs in stochastic optimization is to use approximation techniques.In this work,surrogate models are used,including the response surface method (RSM) and Kriging.When surrogate models are used,there are some errors between exact values and approximated values.These errors decrease the reliability of the optimum values and discard the realistic approximation of using surrogate models.In this paper,Bayesian statistics is used to obtain more reliable results.To verify and confirm the efficiency of the proposed method using surrogate models and Bayesian statistics for stochastic structural optimization,two numerical examples are optimized,and the optimization of a hub sleeve is demonstrated as a practical problem.
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
A new model test in high energy physics in frequentist and Bayesian statistical formalisms
Kamenshchikov, Andrey
2016-01-01
A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.
A new model test in high energy physics in frequentist and Bayesian statistical formalisms
Kamenshchikov, A.
2017-01-01
A problem of a new physical model test given observed experimental data is a typical one for modern experiments of high energy physics (HEP). A solution of the problem may be provided with two alternative statistical formalisms, namely frequentist and Bayesian, which are widely spread in contemporary HEP searches. A characteristic experimental situation is modeled from general considerations and both the approaches are utilized in order to test a new model. The results are juxtaposed, what demonstrates their consistency in this work. An effect of a systematic uncertainty treatment in the statistical analysis is also considered.
Biagini, Francesca
2016-01-01
This book provides an introduction to elementary probability and to Bayesian statistics using de Finetti's subjectivist approach. One of the features of this approach is that it does not require the introduction of sample space – a non-intrinsic concept that makes the treatment of elementary probability unnecessarily complicate – but introduces as fundamental the concept of random numbers directly related to their interpretation in applications. Events become a particular case of random numbers and probability a particular case of expectation when it is applied to events. The subjective evaluation of expectation and of conditional expectation is based on an economic choice of an acceptable bet or penalty. The properties of expectation and conditional expectation are derived by applying a coherence criterion that the evaluation has to follow. The book is suitable for all introductory courses in probability and statistics for students in Mathematics, Informatics, Engineering, and Physics.
Bayesian Statistics at Work: the Troublesome Extraction of the CKM Phase {alpha}
Energy Technology Data Exchange (ETDEWEB)
Charles, J. [CPT, Luminy Case 907, F-13288 Marseille Cedex 9 (France); Hoecker, A. [CERN, CH-1211 Geneva 23 (Switzerland); Lacker, H. [TU Dresden, IKTP, D-01062 Dresden (Germany); Le Diberder, F.R. [LAL, CNRS/IN2P3, Universite Paris-Sud 11, Bat. 200, BP 34, F-91898 Orsay Cedex (France); T' Jampens, S. [LAPP, CNRS/IN2P3, Universite de Savoie, 9 Chemin de Bellevue, BP 110, F-74941 Annecy-le-Vieux Cedex (France)
2007-04-15
In Bayesian statistics, one's prior beliefs about underlying model parameters are revised with the information content of observed data from which, using Bayes' rule, a posterior belief is obtained. A non-trivial example taken from the isospin analysis of B {yields} PP (P = {pi} or {rho}) decays in heavy-flavor physics is chosen to illustrate the effect of the naive 'objective' choice of flat priors in a multi- dimensional parameter space in presence of mirror solutions. It is demonstrated that the posterior distribution for the parameter of interest, the phase {alpha}, strongly depends on the choice of the parameterization in which the priors are uniform, and on the validity range in which the (un-normalizable) priors are truncated. We prove that the most probable values found by the Bayesian treatment do not coincide with the explicit analytical solutions, in contrast to the frequentist approach. It is also shown in the appendix that the {alpha} {yields} 0 limit cannot be consistently treated in the Bayesian paradigm, because the latter violates the physical symmetries of the problem. (authors)
Directory of Open Access Journals (Sweden)
C. Mukherjee
2011-01-01
Full Text Available Inverse modeling applications in atmospheric chemistry are increasingly addressing the challenging statistical issues of data synthesis by adopting refined statistical analysis methods. This paper advances this line of research by addressing several central questions in inverse modeling, focusing specifically on Bayesian statistical computation. Motivated by problems of refining bottom-up estimates of source/sink fluxes of trace gas and aerosols based on increasingly high-resolution satellite retrievals of atmospheric chemical concentrations, we address head-on the need for integrating formal spatial statistical methods of residual error structure in global scale inversion models. We do this using analytically and computationally tractable spatial statistical models, know as conditional autoregressive spatial models, as components of a global inversion framework. We develop Markov chain Monte Carlo methods to explore and fit these spatial structures in an overall statistical framework that simultaneously estimates source fluxes. Additional aspects of the study extend the statistical framework to utilize priors in a more physically realistic manner, and to formally address and deal with missing data in satellite retrievals. We demonstrate the analysis in the context of inferring carbon monoxide (CO sources constrained by satellite retrievals of column CO from the Measurement of Pollution in the Troposphere (MOPITT instrument on the TERRA satellite, paying special attention to evaluating performance of the inverse approach using various statistical diagnostic metrics. This is developed using synthetic data generated to resemble MOPITT data to define a~proof-of-concept and model assessment, and then in analysis of real MOPITT data.
Rubin, David; Barbary, Kyle; Boone, Kyle; Chappell, Greta; Currie, Miles; Deustua, Susana; Fagrelius, Parker; Fruchter, Andrew; Hayden, Brian; Lidman, Chris; Nordin, Jakob; Perlmutter, Saul; Saunders, Clare; Sofiatti, Caroline
2015-01-01
While recent supernova cosmology research has benefited from improved measurements, current analysis approaches are not statistically optimal and will prove insufficient for future surveys. This paper discusses the limitations of current supernova cosmological analyses in treating outliers, selection effects, shape- and color-standardization relations, intrinsic dispersion, and heterogeneous observations. We present a new Bayesian framework, called UNITY (Unified Nonlinear Inference for Type-Ia cosmologY), that incorporates significant improvements in our ability to confront these effects. We apply the framework to real supernova observations and demonstrate smaller statistical and systematic uncertainties. We verify earlier results that SNe Ia require nonlinear shape and color standardizations, but we now include these nonlinear relations in a statistically well-justified way. This analysis was blinded, in that the method was first validated on simulated data, and no analysis changes were made after transiti...
2nd Bayesian Young Statisticians Meeting
Bitto, Angela; Kastner, Gregor; Posekany, Alexandra
2015-01-01
The Second Bayesian Young Statisticians Meeting (BAYSM 2014) and the research presented here facilitate connections among researchers using Bayesian Statistics by providing a forum for the development and exchange of ideas. WU Vienna University of Business and Economics hosted BAYSM 2014 from September 18th to 19th. The guidance of renowned plenary lecturers and senior discussants is a critical part of the meeting and this volume, which follows publication of contributions from BAYSM 2013. The meeting's scientific program reflected the variety of fields in which Bayesian methods are currently employed or could be introduced in the future. Three brilliant keynote lectures by Chris Holmes (University of Oxford), Christian Robert (Université Paris-Dauphine), and Mike West (Duke University), were complemented by 24 plenary talks covering the major topics Dynamic Models, Applications, Bayesian Nonparametrics, Biostatistics, Bayesian Methods in Economics, and Models and Methods, as well as a lively poster session ...
Thomas: building Bayesian statistical expert systems to aid in clinical decision making.
Lehmann, H P; Shortliffe, E H
1991-08-01
Knowledge-based system for classical statistical analysis must separate the task of analyzing data from that of using the results of the analysis. In contrast, a Bayesian framework for building biostatistical expert system allows for the integration of the data-analytic and decision-making tasks. The architecture of such a framework entails enabling the system (1) to make its recommendations on decision-analytic grounds; (2) to construct statistical models dynamically; (3) to update a statistical model based on the user's prior beliefs and on data from, the methodological concerns evinced by, the study. This architecture permits the knowledge engineer to represent a variety of types of statistical and domain knowledge. Construction of such systems requires that the knowledge engineer reinterpret traditional statistical concerns, such as by replacing the notion of statistical significance with that of a pragmatic clinical threshold. The clinical user of such a system can interact with the system at a semantic level appropriate to her fund of methodological knowledge, rather than at the level of statistical details. We demonstrate these issues with a prototype system called THOMAS which helps a physician decision maker interpret the results of a published randomized clinical trial.
Predictive data-derived Bayesian statistic-transport model and simulator of sunken oil mass
Echavarria Gregory, Maria Angelica
Sunken oil is difficult to locate because remote sensing techniques cannot as yet provide views of sunken oil over large areas. Moreover, the oil may re-suspend and sink with changes in salinity, sediment load, and temperature, making deterministic fate models difficult to deploy and calibrate when even the presence of sunken oil is difficult to assess. For these reasons, together with the expense of field data collection, there is a need for a statistical technique integrating limited data collection with stochastic transport modeling. Predictive Bayesian modeling techniques have been developed and demonstrated for exploiting limited information for decision support in many other applications. These techniques brought to a multi-modal Lagrangian modeling framework, representing a near-real time approach to locating and tracking sunken oil driven by intrinsic physical properties of field data collected following a spill after oil has begun collecting on a relatively flat bay bottom. Methods include (1) development of the conceptual predictive Bayesian model and multi-modal Gaussian computational approach based on theory and literature review; (2) development of an object-oriented programming and combinatorial structure capable of managing data, integration and computation over an uncertain and highly dimensional parameter space; (3) creating a new bi-dimensional approach of the method of images to account for curved shoreline boundaries; (4) confirmation of model capability for locating sunken oil patches using available (partial) real field data and capability for temporal projections near curved boundaries using simulated field data; and (5) development of a stand-alone open-source computer application with graphical user interface capable of calibrating instantaneous oil spill scenarios, obtaining sets maps of relative probability profiles at different prediction times and user-selected geographic areas and resolution, and capable of performing post
Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing
2016-01-08
A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Directory of Open Access Journals (Sweden)
Han Zhang
2014-01-01
Full Text Available A novel fast SAR image change detection method is presented in this paper. Based on a Bayesian approach, the prior information that speckles follow the Nakagami distribution is incorporated into the difference image (DI generation process. The new DI performs much better than the familiar log ratio (LR DI as well as the cumulant based Kullback-Leibler divergence (CKLD DI. The statistical region merging (SRM approach is first introduced to change detection context. A new clustering procedure with the region variance as the statistical inference variable is exhibited to tailor SAR image change detection purposes, with only two classes in the final map, the unchanged and changed classes. The most prominent advantages of the proposed modified SRM (MSRM method are the ability to cope with noise corruption and the quick implementation. Experimental results show that the proposed method is superior in both the change detection accuracy and the operation efficiency.
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described
Directory of Open Access Journals (Sweden)
Wills Rachael A
2009-05-01
Full Text Available Abstract Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones, rather than objective reality. Bayesian analysis is (arguably a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Onisko, Agnieszka; Druzdzel, Marek J.; Austin, R. Marshall
2016-01-01
Background: Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. Aim: The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. Materials and Methods: This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan–Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. Results: The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Conclusion: Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches. PMID:28163973
How to construct the optimal Bayesian measurement in quantum statistical decision theory
Tanaka, Fuyuhiko
Recently, much more attention has been paid to the study aiming at the application of fundamental properties in quantum theory to information processing and technology. In particular, modern statistical methods have been recognized in quantum state tomography (QST), where we have to estimate a density matrix (positive semidefinite matrix of trace one) representing a quantum system from finite data collected in a certain experiment. When the dimension of the density matrix gets large (from a few hundred to millions), it gets a nontrivial problem. While a specific measurement is often given and fixed in QST, we are also able to choose a measurement itself according to the purpose of QST by using qunatum statistical decision theory. Here we propose a practical method to find the best projective measurement in the Bayesian sense. We assume that a prior distribution (e.g., the uniform distribution) and a convex loss function (e.g., the squared error) are given. In many quantum experiments, these assumptions are not so restrictive. We show that the best projective measurement and the best statistical inference based on the measurement outcome exist and that they are obtained explicitly by using the Monte Carlo optimization. The Grant-in-Aid for Scientific Research (B) (No. 26280005).
Pérez, Hector E; Kettner, Keith
2013-10-01
Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.
Heinrich, Lothar; Schmidt, Volker
2012-01-01
We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic \\chi^2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean-square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable \\beta-mixing condition on the underlying stationary marked point process which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. Our method needs a central limit theorem for \\beta-mixing random fields which is proved by extending Bernstein's blocking technique to non-cubic index sets and seems to be of interest in its own right. By large-scale model-based simulations the performance of our t...
To be certain about the uncertainty: Bayesian statistics for (13) C metabolic flux analysis.
Theorell, Axel; Leweke, Samuel; Wiechert, Wolfgang; Nöh, Katharina
2017-07-11
(13) C Metabolic Fluxes Analysis ((13) C MFA) remains to be the most powerful approach to determine intracellular metabolic reaction rates. Decisions on strain engineering and experimentation heavily rely upon the certainty with which these fluxes are estimated. For uncertainty quantification, the vast majority of (13) C MFA studies relies on confidence intervals from the paradigm of Frequentist statistics. However, it is well known that the confidence intervals for a given experimental outcome are not uniquely defined. As a result, confidence intervals produced by different methods can be different, but nevertheless equally valid. This is of high relevance to (13) C MFA, since practitioners regularly use three different approximate approaches for calculating confidence intervals. By means of a computational study with a realistic model of the central carbon metabolism of E. coli, we provide strong evidence that confidence intervals used in the field depend strongly on the technique with which they were calculated and, thus, their use leads to misinterpretation of the flux uncertainty. In order to provide a better alternative to confidence intervals in (13) C MFA, we demonstrate that credible intervals from the paradigm of Bayesian statistics give more reliable flux uncertainty quantifications which can be readily computed with high accuracy using Markov chain Monte Carlo. In addition, the widely applied chi-square test, as a means of testing whether the model reproduces the data, is examined closer. © 2017 Wiley Periodicals, Inc.
Directory of Open Access Journals (Sweden)
J. Norberg
2015-09-01
Full Text Available We validate two-dimensional ionospheric tomography reconstructions against EISCAT incoherent scatter radar measurements. Our tomography method is based on Bayesian statistical inversion with prior distribution given by its mean and covariance. We employ ionosonde measurements for the choice of the prior mean and covariance parameters, and use the Gaussian Markov random fields as a sparse matrix approximation for the numerical computations. This results in a computationally efficient and statistically clear inversion algorithm for tomography. We demonstrate how this method works with simultaneous beacon satellite and ionosonde measurements obtained in northern Scandinavia. The performance is compared with results obtained with a zero mean prior and with the prior mean taken from the International Reference Ionosphere 2007 model. In validating the results, we use EISCAT UHF incoherent scatter radar measurements as the ground truth for the ionization profile shape. We find that ionosonde measurements improve the reconstruction by adding accurate information about the absolute value and the height distribution of electron density, and outperforms the alternative prior information sources. With an ionosonde at continuous disposal, the presented method enhances stand-alone near real-time ionospheric tomography for the given conditions significantly.
Statistical Methods for Astronomy
Feigelson, Eric D
2012-01-01
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spati...
Bickel, David R
2011-01-01
In statistical practice, whether a Bayesian or frequentist approach is used in inference depends not only on the availability of prior information but also on the attitude taken toward partial prior information, with frequentists tending to be more cautious than Bayesians. The proposed framework defines that attitude in terms of a specified amount of caution, thereby enabling data analysis at the level of caution desired and on the basis of any prior information. The caution parameter represents the attitude toward partial prior information in much the same way as a loss function represents the attitude toward risk. When there is very little prior information and nonzero caution, the resulting inferences correspond to those of the candidate confidence intervals and p-values that are most similar to the credible intervals and hypothesis probabilities of the specified Bayesian posterior. On the other hand, in the presence of a known physical distribution of the parameter, inferences are based only on the corres...
Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.
2016-07-01
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
Institute of Scientific and Technical Information of China (English)
MING Zhimao; TAO Junyong; ZHANG Yunan; YI Xiaoshan; CHEN Xun
2009-01-01
New armament systems are subjected to the method for dealing with multi-stage system reliability-growth statistical problems of diverse population in order to improve reliability before starting mass production. Aiming at the test process which is high expense and small sample-size in the development of complex system, the specific methods are studied on how to process the statistical information of Bayesian reliability growth regarding diverse populations. Firstly, according to the characteristics of reliability growth during product development, the Bayesian method is used to integrate the testing information of multi-stage and the order relations of distribution parameters. And then a Gamma-Beta prior distribution is proposed based on non-homogeneous Poisson process(NHPP) corresponding to the reliability growth process. The posterior distribution of reliability parameters is obtained regarding different stages of product, and the reliability parameters are evaluated based on the posterior distribution. Finally, Bayesian approach proposed in this paper for multi-stage reliability growth test is applied to the test process which is small sample-size in the astronautics filed. The results of a numerical example show that the presented model can make use of the diverse information synthetically, and pave the way for the application of the Bayesian model for multi-stage reliability growth test evaluation with small sample-size. The method is useful for evaluating multi-stage system reliability and making reliability growth plan rationally.
Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.
2014-11-01
Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
统计软件R在非参数统计教学中的应用%Application of Statistical Software R in the Teaching of Non-Parametric Statistics
Institute of Scientific and Technical Information of China (English)
王志刚; 冯利英; 刘勇
2012-01-01
Introduces the applieation of statistical software R in the teaching of non-parametric statistic's, which is an important branch of statistics. In particular, describes the using of software R in ex- ploratory data analysis, inferential statistics and stochastic, simulation in details. The flexihle, open-sourc, e characteristics of software R makes the data processing more efficient. This soft- ware can realize all the methods of the teaching process, and is convenient fi~r learners to opti- mize and improve based on the previous work. R software is suitable for teaching of the non- parametric statistics.%主要介绍统计软件R在统计中一个重要分支非参数统计中的应用．分别从探索性数据分析、推断统计、随机模拟三个角度介绍R软件的应用。从介绍可以看出R软件的灵活、开源的特性，使得数据处理变得更加高效、得心应手。能够通过软件实现教学环节中的所有方法，并且方便学习者在前人工作基础上对方法进行优化、改进，在非参数统计教学中选用R软件是适合的。
Kittisuwan, Pichid
2015-03-01
The application of image processing in industry has shown remarkable success over the last decade, for example, in security and telecommunication systems. The denoising of natural image corrupted by Gaussian noise is a classical problem in image processing. So, image denoising is an indispensable step during image processing. This paper is concerned with dual-tree complex wavelet-based image denoising using Bayesian techniques. One of the cruxes of the Bayesian image denoising algorithms is to estimate the statistical parameter of the image. Here, we employ maximum a posteriori (MAP) estimation to calculate local observed variance with generalized Gamma density prior for local observed variance and Laplacian or Gaussian distribution for noisy wavelet coefficients. Evidently, our selection of prior distribution is motivated by efficient and flexible properties of generalized Gamma density. The experimental results show that the proposed method yields good denoising results.
An overview of component qualification using Bayesian statistics and energy methods.
Energy Technology Data Exchange (ETDEWEB)
Dohner, Jeffrey Lynn
2011-09-01
The below overview is designed to give the reader a limited understanding of Bayesian and Maximum Likelihood (MLE) estimation; a basic understanding of some of the mathematical tools to evaluate the quality of an estimation; an introduction to energy methods and a limited discussion of damage potential. This discussion then goes on to presented a limited presentation as to how energy methods and Bayesian estimation are used together to qualify components. Example problems with solutions have been supplied as a learning aid. Bold letters are used to represent random variables. Un-bolded letter represent deterministic values. A concluding section presents a discussion of attributes and concerns.
Festa, Roberto
1992-01-01
According to the Bayesian view, scientific hypotheses must be appraised in terms of their posterior probabilities relative to the available experimental data. Such posterior probabilities are derived from the prior probabilities of the hypotheses by applying Bayes'theorem. One of the most important
Priors, Posterior Odds and Lagrange Multiplier Statistics in Bayesian Analyses of Cointegration
F.R. Kleibergen (Frank); R. Paap (Richard)
1996-01-01
textabstractUsing the standard linear model as a base, a unified theory of Bayesian Analyses of Cointegration Models is constructed. This is achieved by defining (natural conjugate) priors in the linear model and using the implied priors for the cointegration model. Using these priors, posterior res
Festa, Roberto
1992-01-01
According to the Bayesian view, scientific hypotheses must be appraised in terms of their posterior probabilities relative to the available experimental data. Such posterior probabilities are derived from the prior probabilities of the hypotheses by applying Bayes'theorem. One of the most important
A Bayesian Nonparametric Meta-Analysis Model
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
2015-10-24
Cases , KB Morris, E Law, R Jefferys, & E Fabyanic, 67th AAFS Meeting, Orlando , FL, February 2015 Poster: Using likelihood ratios for source attribution...of Glock™ model 21 fired cartridge cases , C Hefner, & KB Morris, 67th AAFS Meeting, Orlando , FL, February 2015. (c) Presentations Number of...and known cartridge cases ) to assess the performance of the Bayesian networks created during the study . In all cases the sets were submitted in a
Bayesian Statistical Inference in Ion-Channel Models with Exact Missed Event Correction.
Epstein, Michael; Calderhead, Ben; Girolami, Mark A; Sivilotti, Lucia G
2016-07-26
The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel
Hagos, Seifu; Hailemariam, Damen; WoldeHanna, Tasew; Lindtjørn, Bernt
2017-01-01
Background Understanding the spatial distribution of stunting and underlying factors operating at meso-scale is of paramount importance for intervention designing and implementations. Yet, little is known about the spatial distribution of stunting and some discrepancies are documented on the relative importance of reported risk factors. Therefore, the present study aims at exploring the spatial distribution of stunting at meso- (district) scale, and evaluates the effect of spatial dependency on the identification of risk factors and their relative contribution to the occurrence of stunting and severe stunting in a rural area of Ethiopia. Methods A community based cross sectional study was conducted to measure the occurrence of stunting and severe stunting among children aged 0–59 months. Additionally, we collected relevant information on anthropometric measures, dietary habits, parent and child-related demographic and socio-economic status. Latitude and longitude of surveyed households were also recorded. Local Anselin Moran's I was calculated to investigate the spatial variation of stunting prevalence and identify potential local pockets (hotspots) of high prevalence. Finally, we employed a Bayesian geo-statistical model, which accounted for spatial dependency structure in the data, to identify potential risk factors for stunting in the study area. Results Overall, the prevalence of stunting and severe stunting in the district was 43.7% [95%CI: 40.9, 46.4] and 21.3% [95%CI: 19.5, 23.3] respectively. We identified statistically significant clusters of high prevalence of stunting (hotspots) in the eastern part of the district and clusters of low prevalence (cold spots) in the western. We found out that the inclusion of spatial structure of the data into the Bayesian model has shown to improve the fit for stunting model. The Bayesian geo-statistical model indicated that the risk of stunting increased as the child’s age increased (OR 4.74; 95% Bayesian credible
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Predicting uncertainty in future marine ice sheet volume using Bayesian statistical methods
Davis, A. D.
2015-12-01
The marine ice instability can trigger rapid retreat of marine ice streams. Recent observations suggest that marine ice systems in West Antarctica have begun retreating. However, unknown ice dynamics, computationally intensive mathematical models, and uncertain parameters in these models make predicting retreat rate and ice volume difficult. In this work, we fuse current observational data with ice stream/shelf models to develop probabilistic predictions of future grounded ice sheet volume. Given observational data (e.g., thickness, surface elevation, and velocity) and a forward model that relates uncertain parameters (e.g., basal friction and basal topography) to these observations, we use a Bayesian framework to define a posterior distribution over the parameters. A stochastic predictive model then propagates uncertainties in these parameters to uncertainty in a particular quantity of interest (QoI)---here, the volume of grounded ice at a specified future time. While the Bayesian approach can in principle characterize the posterior predictive distribution of the QoI, the computational cost of both the forward and predictive models makes this effort prohibitively expensive. To tackle this challenge, we introduce a new Markov chain Monte Carlo method that constructs convergent approximations of the QoI target density in an online fashion, yielding accurate characterizations of future ice sheet volume at significantly reduced computational cost.Our second goal is to attribute uncertainty in these Bayesian predictions to uncertainties in particular parameters. Doing so can help target data collection, for the purpose of constraining the parameters that contribute most strongly to uncertainty in the future volume of grounded ice. For instance, smaller uncertainties in parameters to which the QoI is highly sensitive may account for more variability in the prediction than larger uncertainties in parameters to which the QoI is less sensitive. We use global sensitivity
Directory of Open Access Journals (Sweden)
Sarah Depaoli
2015-03-01
Full Text Available Background: After traumatic events, such as disaster, war trauma, and injuries including burns (which is the focus here, the risk to develop posttraumatic stress disorder (PTSD is approximately 10% (Breslau & Davis, 1992. Latent Growth Mixture Modeling can be used to classify individuals into distinct groups exhibiting different patterns of PTSD (Galatzer-Levy, 2015. Currently, empirical evidence points to four distinct trajectories of PTSD patterns in those who have experienced burn trauma. These trajectories are labeled as: resilient, recovery, chronic, and delayed onset trajectories (e.g., Bonanno, 2004; Bonanno, Brewin, Kaniasty, & Greca, 2010; Maercker, Gäbler, O'Neil, Schützwohl, & Müller, 2013; Pietrzak et al., 2013. The delayed onset trajectory affects only a small group of individuals, that is, about 4–5% (O'Donnell, Elliott, Lau, & Creamer, 2007. In addition to its low frequency, the later onset of this trajectory may contribute to the fact that these individuals can be easily overlooked by professionals. In this special symposium on Estimating PTSD trajectories (Van de Schoot, 2015a, we illustrate how to properly identify this small group of individuals through the Bayesian estimation framework using previous knowledge through priors (see, e.g., Depaoli & Boyajian, 2014; Van de Schoot, Broere, Perryck, Zondervan-Zwijnenburg, & Van Loey, 2015. Method: We used latent growth mixture modeling (LGMM (Van de Schoot, 2015b to estimate PTSD trajectories across 4 years that followed a traumatic burn. We demonstrate and compare results from traditional (maximum likelihood and Bayesian estimation using priors (see, Depaoli, 2012, 2013. Further, we discuss where priors come from and how to define them in the estimation process. Results: We demonstrate that only the Bayesian approach results in the desired theory-driven solution of PTSD trajectories. Since the priors are chosen subjectively, we also present a sensitivity analysis of the
DEFF Research Database (Denmark)
Møller, Jesper; Jacobsen, Robert Dahl
We introduce a promising alternative to the usual hidden Markov tree model for Gaussian wavelet coefficients, where their variances are specified by the hidden states and take values in a finite set. In our new model, the hidden states have a similar dependence structure but they are jointly...... Gaussian, and the wavelet coefficients have log-variances equal to the hidden states. We argue why this provides a flexible model where frequentist and Bayesian inference procedures become tractable for estimation of parameters and hidden states. Our methodology is illustrated for denoising and edge...
Affine Invariant, Model-Based Object Recognition Using Robust Metrics and Bayesian Statistics
Zografos, Vasileios; 10.1007/11559573_51
2010-01-01
We revisit the problem of model-based object recognition for intensity images and attempt to address some of the shortcomings of existing Bayesian methods, such as unsuitable priors and the treatment of residuals with a non-robust error norm. We do so by using a refor- mulation of the Huber metric and carefully chosen prior distributions. Our proposed method is invariant to 2-dimensional affine transforma- tions and, because it is relatively easy to train and use, it is suited for general object matching problems.
Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics.
Directory of Open Access Journals (Sweden)
Korsuk Sirinukunwattana
Full Text Available Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Sandoval-Castellanos, Edson; Palkopoulou, Eleftheria; Dalén, Love
2014-01-01
Inference of population demographic history has vastly improved in recent years due to a number of technological and theoretical advances including the use of ancient DNA. Approximate Bayesian computation (ABC) stands among the most promising methods due to its simple theoretical fundament and exceptional flexibility. However, limited availability of user-friendly programs that perform ABC analysis renders it difficult to implement, and hence programming skills are frequently required. In addition, there is limited availability of programs able to deal with heterochronous data. Here we present the software BaySICS: Bayesian Statistical Inference of Coalescent Simulations. BaySICS provides an integrated and user-friendly platform that performs ABC analyses by means of coalescent simulations from DNA sequence data. It estimates historical demographic population parameters and performs hypothesis testing by means of Bayes factors obtained from model comparisons. Although providing specific features that improve inference from datasets with heterochronous data, BaySICS also has several capabilities making it a suitable tool for analysing contemporary genetic datasets. Those capabilities include joint analysis of independent tables, a graphical interface and the implementation of Markov-chain Monte Carlo without likelihoods.
Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems
He, Yuning; Davies, Misty Dawn
2014-01-01
The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
Directory of Open Access Journals (Sweden)
M. A. Zotov
2016-01-01
Full Text Available An improved algorithm for the synthesis of the secondary structure of algebraic Bayesian networks represented by a minimal join graph is proposed in the paper. The algorithm differs from the previously offered one so that it relies on the incremental principle, uses specially selected edges and, finally, eliminates redundant edges by a greedy algorithm. The correct operation of the incremental algorithm is mathematically proved. Comparison of the computational complexity of the new (incremental algorithm implementation and two well-known (greedy and direct is made by means of statistical estimates of complexity, based on the sample values of the runtime ratio of software implementations of two compared algorithms. Theoretical complexity estimates of the greedy and direct algorithms have been obtained earlier, but are not suitable for comparative analysis, as they are based on the hidden characteristics of the secondary structure, which can be calculated only when it is built. To minimize the influence of random factors at calculating the ratio average program runtime is used obtained by N launches on the same set of workloads. The sample values of ratio is formed for M sets of equal power K. According to the sample values the median is calculated, as well as the other statistics that characterize the spread: borders of the 97% confidence interval along with the first and the third quartiles. Sets of loads are stochastically generated according to the specified parameters using the algorithm described in the paper. The stochastic algorithms generating a set of loads with given power, as well as collecting the statistical data and calculating of statistical estimates of the ratio of forward and greedy algorithm to the incremental algorithm runtimes is described in the paper. A series of experiments is carried out in which N is changed in the range 1, 2 ... 9, 10, 26, 42 ... 170.They have showed that the incremental algorithm speed exceeds the
Colistete, R C; Goncalves, S V B
2004-01-01
The type Ia supernovae (SNe Ia) observational data are used to estimate the parameters of a cosmological model with cold dark matter and the generalized Chaplygin gas model (GCGM). The GCGM depends essentially on five parameters: the Hubble constant, the parameter $\\bar{A}$ related to the velocity of the sound, the equation of state parameter $\\alpha$, the curvature of the Universe and the fraction density of the generalized Chaplygin gas (or the cold dark matter). The parameter $\\alpha$ is allowed to take negative values and to be greater than 1. The Bayesian parameter estimation yields $\\alpha = - 0.86^{+6.01}_{-0.15}$, $H_0 = 62.0^{+1.32}_{-1.42} km/Mpc.s$, $\\Omega _{k0}=-1.26_{-1.42}^{+1.32}$, $\\Omega_{m0} = 0.00^{+0.86}_{-0.00}$, $\\Omega_{c0} = 1.39^{+1.21}_{-1.25}$, $\\bar A =1.00^{+0.00}_{-0.39}$, $t_0 = 15.3^{+4.2}_{-3.2}$ and $q_0 = -0.80^{+0.86}_{-0.62}$, where $t_0$ is the age of the Universe and $q_0$ is the value of the deceleration parameter today. Our results indicate that a Universe completely ...
Radiative Transfer meets Bayesian statistics: where does your Galaxy's [CII] come from?
Accurso, Gioacchino; Bisbas, Thomas G; Viti, Serena
2016-01-01
The [CII] 158$\\mu$m emission line can arise in all phases of the ISM, therefore being able to disentangle the different contributions is an important yet unresolved problem when undertaking galaxy-wide, integrated [CII] observations. We present a new multi-phase 3D radiative transfer interface that couples Starburst99, a stellar spectrophotometric code, with the photoionisation and astrochemistry codes Mocassin and 3D-PDR. We model entire star forming regions, including the ionised, atomic and molecular phases of the ISM, and apply a Bayesian inference methodology to parametrise how the fraction of the [CII] emission originating from molecular regions, $f_{[CII],mol}$, varies as a function of typical integrated properties of galaxies in the local Universe. The main parameters responsible for the variations of $f_{[CII],mol}$ are specific star formation rate (sSFR), gas phase metallicity, HII region electron number density ($n_e$), and dust mass fraction. For example, $f_{[CII],mol}$ can increase from 60% to 8...
Bayesian statistics applied to the location of the source of explosions at Stromboli Volcano, Italy
Saccorotti, G.; Chouet, B.; Martini, M.; Scarpa, R.
1998-01-01
We present a method for determining the location and spatial extent of the source of explosions at Stromboli Volcano, Italy, based on a Bayesian inversion of the slowness vector derived from frequency-slowness analyses of array data. The method searches for source locations that minimize the error between the expected and observed slowness vectors. For a given set of model parameters, the conditional probability density function of slowness vectors is approximated by a Gaussian distribution of expected errors. The method is tested with synthetics using a five-layer velocity model derived for the north flank of Stromboli and a smoothed velocity model derived from a power-law approximation of the layered structure. Application to data from Stromboli allows for a detailed examination of uncertainties in source location due to experimental errors and incomplete knowledge of the Earth model. Although the solutions are not constrained in the radial direction, excellent resolution is achieved in both transverse and depth directions. Under the assumption that the horizontal extent of the source does not exceed the crater dimension, the 90% confidence region in the estimate of the explosive source location corresponds to a small volume extending from a depth of about 100 m to a maximum depth of about 300 m beneath the active vents, with a maximum likelihood source region located in the 120- to 180-m-depth interval.
Variational Bayesian labeled multi-Bernoulli filter with unknown sensor noise statistics
Directory of Open Access Journals (Sweden)
Qiu Hao
2016-10-01
Full Text Available It is difficult to build accurate model for measurement noise covariance in complex backgrounds. For the scenarios of unknown sensor noise variances, an adaptive multi-target tracking algorithm based on labeled random finite set and variational Bayesian (VB approximation is proposed. The variational approximation technique is introduced to the labeled multi-Bernoulli (LMB filter to jointly estimate the states of targets and sensor noise variances. Simulation results show that the proposed method can give unbiased estimation of cardinality and has better performance than the VB probability hypothesis density (VB-PHD filter and the VB cardinality balanced multi-target multi-Bernoulli (VB-CBMeMBer filter in harsh situations. The simulations also confirm the robustness of the proposed method against the time-varying noise variances. The computational complexity of proposed method is higher than the VB-PHD and VB-CBMeMBer in extreme cases, while the mean execution times of the three methods are close when targets are well separated.
Statistical flaw characterization through Bayesian shape inversion from scattered wave observations
McMahan, Jerry A.; Criner, Amanda K.
2016-02-01
A method is discussed to characterize the shape of a flaw from noisy far-field measurements of a scattered wave. The scattering model employed is a two-dimensional Helmholtz equation which quantifies scattering due to interrogating signals from various physical phenomena such as acoustics or electromagnetics. The well-known inherent ill-posedness of the inverse scattering problem is addressed via Bayesian regularization. The method is loosely related to the approach described in [1] which uses the framework of [2] to prove the well-posedness of the infinite-dimensional problem and derive estimates of the error for a particular discretization approach. The method computes the posterior probability density for the flaw shape from the scattered field observations, taking into account prior assumptions which are used to describe any a priori knowledge of the flaw. We describe the computational approach to the forward problem as well as the Markov chain Monte Carlo (MCMC) based approach to approximating the posterior. We present simulation results for some hypothetical flaw shapes with varying levels of observation error and arrangement of observation points. The results show how the posterior probability density can be used to visualize the shape of the flaw taking into account the quantitative confidence in the quality of the estimation and how various arrangements of the measurements and interrogating signals affect the estimation
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
Neural network uncertainty assessment using Bayesian statistics: a remote sensing application
Aires, F.; Prigent, C.; Rossow, W. B.
2004-01-01
Neural network (NN) techniques have proved successful for many regression problems, in particular for remote sensing; however, uncertainty estimates are rarely provided. In this article, a Bayesian technique to evaluate uncertainties of the NN parameters (i.e., synaptic weights) is first presented. In contrast to more traditional approaches based on point estimation of the NN weights, we assess uncertainties on such estimates to monitor the robustness of the NN model. These theoretical developments are illustrated by applying them to the problem of retrieving surface skin temperature, microwave surface emissivities, and integrated water vapor content from a combined analysis of satellite microwave and infrared observations over land. The weight uncertainty estimates are then used to compute analytically the uncertainties in the network outputs (i.e., error bars and correlation structure of these errors). Such quantities are very important for evaluating any application of an NN model. The uncertainties on the NN Jacobians are then considered in the third part of this article. Used for regression fitting, NN models can be used effectively to represent highly nonlinear, multivariate functions. In this situation, most emphasis is put on estimating the output errors, but almost no attention has been given to errors associated with the internal structure of the regression model. The complex structure of dependency inside the NN is the essence of the model, and assessing its quality, coherency, and physical character makes all the difference between a blackbox model with small output errors and a reliable, robust, and physically coherent model. Such dependency structures are described to the first order by the NN Jacobians: they indicate the sensitivity of one output with respect to the inputs of the model for given input data. We use a Monte Carlo integration procedure to estimate the robustness of the NN Jacobians. A regularization strategy based on principal component
Radiative transfer meets Bayesian statistics: where does a galaxy's [C II] emission come from?
Accurso, G.; Saintonge, A.; Bisbas, T. G.; Viti, S.
2017-01-01
The [C II] 158 μm emission line can arise in all phases of the interstellar medium (ISM), therefore being able to disentangle the different contributions is an important yet unresolved problem when undertaking galaxy-wide, integrated [C II] observations. We present a new multiphase 3D radiative transfer interface that couples STARBURST99, a stellar spectrophotometric code, with the photoionization and astrochemistry codes MOCASSIN and 3D-PDR. We model entire star-forming regions, including the ionized, atomic, and molecular phases of the ISM, and apply a Bayesian inference methodology to parametrize how the fraction of the [C II] emission originating from molecular regions, f_{[C II],mol}, varies as a function of typical integrated properties of galaxies in the local Universe. The main parameters responsible for the variations of f_{[C II],mol} are specific star formation rate (SSFR), gas phase metallicity, H II region electron number density (ne), and dust mass fraction. For example, f_{[C II],mol} can increase from 60 to 80 per cent when either ne increases from 101.5 to 102.5 cm-3, or SSFR decreases from 10-9.6 to 10-10.6 yr-1. Our model predicts for the Milky Way that f_{[C II],mol} = 75.8 ± 5.9 per cent, in agreement with the measured value of 75 per cent. When applying the new prescription to a complete sample of galaxies from the Herschel Reference Survey, we find that anywhere from 60 to 80 per cent of the total integrated [C II] emission arises from molecular regions.
Jain, Lakhmi
2012-01-01
Data mining is one of the most rapidly growing research areas in computer science and statistics. In Volume 2 of this three volume series, we have brought together contributions from some of the most prestigious researchers in theoretical data mining. Each of the chapters is self contained. Statisticians and applied scientists/ engineers will find this volume valuable. Additionally, it provides a sourcebook for graduate students interested in the current direction of research in data mining.
Statistical analysis of a Bayesian classifier based on the expression of miRNAs
Ricci, Leonardo; Del Vescovo, Valerio; Cantaloni, Chiara; Grasso, Margherita; Barbareschi, Mattia; Denti, Michela Alessandra
2015-01-01
Background During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics ...
Energy Technology Data Exchange (ETDEWEB)
Pekney, Natalie J.; Cheng, Hanqi; Small, Mitchell J.
2015-11-05
Abstract: The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality.
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
Miaou, Shaw-Pin; Song, Joon Jin
2005-07-01
limitation of using the naïve approach in ranking is illustrated. Second, following the model based approach, the choice of decision parameters and consideration of treatability are discussed. Third, several statistical ranking criteria that have been used in biomedical, health, and other scientific studies are presented from a Bayesian perspective. Their applications in roadway safety are then demonstrated using two data sets: one for individual urban intersections and one for rural two-lane roads at the county level. As part of the demonstration, it is shown how multivariate spatial GLMM can be used to model traffic crashes of several injury severity types simultaneously and how the model can be used within a Bayesian framework to rank sites by crash cost per vehicle-mile traveled (instead of by crash frequency rate). Finally, the significant impact of spatial effects on the overall model goodness-of-fit and site ranking performances are discussed for the two data sets examined. The paper is concluded with a discussion on possible directions in which the study can be extended.
Ayubi, Erfan; Mansournia, Mohammad Ali; Motlagh, Ali Ghanbari; Mosavi-Jarrahi, Alireza; Hosseini, Ali; Yazdani, Kamran
2017-01-01
The aim of this study was to explore the spatial pattern of female breast cancer (BC) incidence at the neighborhood level in Tehran, Iran. The present study included all registered incident cases of female BC from March 2008 to March 2011. The raw standardized incidence ratio (SIR) of BC for each neighborhood was estimated by comparing observed cases relative to expected cases. The estimated raw SIRs were smoothed by a Besag, York, and Mollie spatial model and the spatial empirical Bayesian method. The purely spatial scan statistic was used to identify spatial clusters. There were 4,175 incident BC cases in the study area from 2008 to 2011, of which 3,080 were successfully geocoded to the neighborhood level. Higher than expected rates of BC were found in neighborhoods located in northern and central Tehran, whereas lower rates appeared in southern areas. The most likely cluster of higher than expected BC incidence involved neighborhoods in districts 3 and 6, with an observed-to-expected ratio of 3.92 (p<0.001), whereas the most likely cluster of lower than expected rates involved neighborhoods in districts 17, 18, and 19, with an observed-to-expected ratio of 0.05 (p<0.001). Neighborhood-level inequality in the incidence of BC exists in Tehran. These findings can serve as a basis for resource allocation and preventive strategies in at-risk areas.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Bernardo, Jose M
2000-01-01
This highly acclaimed text, now available in paperback, provides a thorough account of key concepts and theoretical results, with particular emphasis on viewing statistical inference as a special case of decision theory. Information-theoretic concepts play a central role in the development of the theory, which provides, in particular, a detailed discussion of the problem of specification of so-called prior ignorance . The work is written from the authors s committed Bayesian perspective, but an overview of non-Bayesian theories is also provided, and each chapter contains a wide-ranging critica
Directory of Open Access Journals (Sweden)
Alcides Cabrera Campos
2012-09-01
Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Directory of Open Access Journals (Sweden)
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Beramendi-Orosco, Laura E.; Gonzalez-Hernandez, Galia; Urrutia-Fucugauchi, Jaime; Manzanilla, Linda R.; Soler-Arechalde, Ana M.; Goguitchaishvili, Avto; Jarboe, Nick
2009-03-01
A high-resolution 14C chronology for the Teopancazco archaeological site in the Teotihuacan urban center of Mesoamerica was generated by Bayesian analysis of 33 radiocarbon dates and detailed archaeological information related to occupation stratigraphy, pottery and archaeomagnetic dates. The calibrated intervals obtained using the Bayesian model are up to ca. 70% shorter than those obtained with individual calibrations. For some samples, this is a consequence of plateaus in the part of the calibration curve covered by the sample dates (2500 to 1450 14C yr BP). Effects of outliers are explored by comparing the results from a Bayesian model that incorporates radiocarbon data for two outlier samples with the same model excluding them. The effect of outliers was more significant than expected. Inclusion of radiocarbon dates from two altered contexts, 500 14C yr earlier than those for the first occupational phase, results in ages calculated by the model earlier than the archaeological records. The Bayesian chronology excluding these outliers separates the first two Teopancazco occupational phases and suggests that ending of the Xolalpan phase was around cal AD 550, 100 yr earlier than previously estimated and in accordance with previously reported archaeomagnetic dates from lime plasters for the same site.
Bayesian non parametric modelling of Higgs pair production
Scarpa, Bruno; Dorigo, Tommaso
2017-03-01
Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART) to describe the atoms in the Dirichlet process.
Bayesian non parametric modelling of Higgs pair production
Directory of Open Access Journals (Sweden)
Scarpa Bruno
2017-01-01
Full Text Available Statistical classification models are commonly used to separate a signal from a background. In this talk we face the problem of isolating the signal of Higgs pair production using the decay channel in which each boson decays into a pair of b-quarks. Typically in this context non parametric methods are used, such as Random Forests or different types of boosting tools. We remain in the same non-parametric framework, but we propose to face the problem following a Bayesian approach. A Dirichlet process is used as prior for the random effects in a logit model which is fitted by leveraging the Polya-Gamma data augmentation. Refinements of the model include the insertion in the simple model of P-splines to relate explanatory variables with the response and the use of Bayesian trees (BART to describe the atoms in the Dirichlet process.
Nonparametric statistical testing of coherence differences
Maris, E.; Schoffelen, J.M.; Fries, P.
2007-01-01
Many important questions in neuroscience are about interactions between neurons or neuronal groups. These interactions are often quantified by coherence, which is a frequency-indexed measure that quantifies the extent to which two signals exhibit a consistent phase relation. In this paper, we consid
Non-parametric star formation histories for 5 dwarf spheroidal galaxies of the local group
Hernández, X; Valls-Gabaud, D; Gilmore, Gerard; Valls-Gabaud, David
2000-01-01
We use recent HST colour-magnitude diagrams of the resolved stellar populations of a sample of local dSph galaxies (Carina, LeoI, LeoII, Ursa Minor and Draco) to infer the star formation histories of these systems, $SFR(t)$. Applying a new variational calculus maximum likelihood method which includes a full Bayesian analysis and allows a non-parametric estimate of the function one is solving for, we infer the star formation histories of the systems studied. This method has the advantage of yielding an objective answer, as one need not assume {\\it a priori} the form of the function one is trying to recover. The results are checked independently using Saha's $W$ statistic. The total luminosities of the systems are used to normalize the results into physical units and derive SN type II rates. We derive the luminosity weighted mean star formation history of this sample of galaxies.
Bayesian modelling of the emission spectrum of the JET Li-BES system
Kwak, Sehyun; Brix, M; Ghim, Y -c; Contributors, JET
2015-01-01
A Bayesian model of the emission spectrum of the JET lithium beam has been developed to infer the intensity of the Li I (2p-2s) line radiation and associated uncertainties. The detected spectrum for each channel of the lithium beam emission spectroscopy (Li-BES) system is here modelled by a single Li line modified by an instrumental function, Bremsstrahlung background, instrumental offset, and interference filter curve. Both the instrumental function and the interference filter curve are modelled with non-parametric Gaussian processes. All free parameters of the model, the intensities of the Li line, Bremsstrahlung background, and instrumental offset, are inferred using Bayesian probability theory with a Gaussian likelihood for photon statistics and electronic background noise. The prior distributions of the free parameters are chosen as Gaussians. Given these assumptions, the intensity of the Li line and corresponding uncertainties are analytically available using a Bayesian linear inversion technique. The p...
DEFF Research Database (Denmark)
Linnet, Kristian
2005-01-01
Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
Current trends in Bayesian methodology with applications
Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia
2015-01-01
Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on
Directory of Open Access Journals (Sweden)
Jiaming Liu
2016-01-01
Full Text Available Many downscaling techniques have been developed in the past few years for projection of station-scale hydrological variables from large-scale atmospheric variables to assess the hydrological impacts of climate change. To improve the simulation accuracy of downscaling methods, the Bayesian Model Averaging (BMA method combined with three statistical downscaling methods, which are support vector machine (SVM, BCC/RCG-Weather Generators (BCC/RCG-WG, and Statistics Downscaling Model (SDSM, is proposed in this study, based on the statistical relationship between the larger scale climate predictors and observed precipitation in upper Hanjiang River Basin (HRB. The statistical analysis of three performance criteria (the Nash-Sutcliffe coefficient of efficiency, the coefficient of correlation, and the relative error shows that the performance of ensemble downscaling method based on BMA for rainfall is better than that of each single statistical downscaling method. Moreover, the performance for the runoff modelled by the SWAT rainfall-runoff model using the downscaled daily rainfall by four methods is also compared, and the ensemble downscaling method has better simulation accuracy. The ensemble downscaling technology based on BMA can provide scientific basis for the study of runoff response to climate change.
DEFF Research Database (Denmark)
Mørup, Morten; Schmidt, Mikkel N
2012-01-01
Many networks of scientific interest naturally decompose into clusters or communities with comparatively fewer external than internal links; however, current Bayesian models of network communities do not exert this intuitive notion of communities. We formulate a nonparametric Bayesian model...... for community detection consistent with an intuitive definition of communities and present a Markov chain Monte Carlo procedure for inferring the community structure. A Matlab toolbox with the proposed inference procedure is available for download. On synthetic and real networks, our model detects communities...... consistent with ground truth, and on real networks, it outperforms existing approaches in predicting missing links. This suggests that community structure is an important structural property of networks that should be explicitly modeled....
Yang, Yuqing; Chen, Ning; Chen, Ting
2017-01-25
The inference of associations between environmental factors and microbes and among microbes is critical to interpreting metagenomic data, but compositional bias, indirect associations resulting from common factors, and variance within metagenomic sequencing data limit the discovery of associations. To account for these problems, we propose metagenomic Lognormal-Dirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints, to estimate absolute microbial abundance and simultaneously infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors. We empirically show the effectiveness of the mLDM model using synthetic data, data from the TARA Oceans project, and a colorectal cancer dataset. Finally, we apply mLDM to 16S sequencing data from the western English Channel and report several associations. Our model can be used on both natural environmental and human metagenomic datasets, promoting the understanding of associations in the microbial community.
Energy Technology Data Exchange (ETDEWEB)
Blanc, Guillermo A. [Observatories of the Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, CA 91101 (United States); Kewley, Lisa; Vogt, Frédéric P. A.; Dopita, Michael A. [Research School of Astronomy and Astrophysics, Australian National University, Cotter Road, Weston, ACT 2611 (Australia)
2015-01-10
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Wade, Leslie; Ochsner, Evan; Lackey, Benjamin D; Farr, Benjamin F; Littenberg, Tyson B; Raymond, Vivien
2014-01-01
Advanced ground-based gravitational-wave detectors are capable of measuring tidal influences in binary neutron-star systems. In this work, we report on the statistical uncertainties in measuring tidal deformability with a full Bayesian parameter estimation implementation. We show how simultaneous measurements of chirp mass and tidal deformability can be used to constrain the neutron-star equation of state. We also study the effects of waveform modeling bias and individual instances of detector noise on these measurements. We notably find that systematic error between post-Newtonian waveform families can significantly bias the estimation of tidal parameters, thus motivating the continued development of waveform models that are more reliable at high frequencies.
Directory of Open Access Journals (Sweden)
Archer Kellie J
2008-02-01
Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been
Kong, Xiangrong; Mas, Valeria; Archer, Kellie J
2008-02-26
With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the
Blanc, Guillermo A; Vogt, Frédéric P A; Dopita, Michael A
2014-01-01
We present a new method for inferring the metallicity (Z) and ionization parameter (q) of HII regions and star-forming galaxies using strong nebular emission lines (SEL). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photo-ionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SEL diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics the method is flexible and not tied to a particular photo-ionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extra-galactic HII regions we assess the performance of commonly used SEL abundance diagnostics. W...
Bayesian non- and semi-parametric methods and applications
Rossi, Peter
2014-01-01
This book reviews and develops Bayesian non-parametric and semi-parametric methods for applications in microeconometrics and quantitative marketing. Most econometric models used in microeconomics and marketing applications involve arbitrary distributional assumptions. As more data becomes available, a natural desire to provide methods that relax these assumptions arises. Peter Rossi advocates a Bayesian approach in which specific distributional assumptions are replaced with more flexible distributions based on mixtures of normals. The Bayesian approach can use either a large but fixed number
Olugboji, T. M.; Lekic, V.; McDonough, W.
2017-07-01
We present a new approach for evaluating existing crustal models using ambient noise data sets and its associated uncertainties. We use a transdimensional hierarchical Bayesian inversion approach to invert ambient noise surface wave phase dispersion maps for Love and Rayleigh waves using measurements obtained from Ekström (2014). Spatiospectral analysis shows that our results are comparable to a linear least squares inverse approach (except at higher harmonic degrees), but the procedure has additional advantages: (1) it yields an autoadaptive parameterization that follows Earth structure without making restricting assumptions on model resolution (regularization or damping) and data errors; (2) it can recover non-Gaussian phase velocity probability distributions while quantifying the sources of uncertainties in the data measurements and modeling procedure; and (3) it enables statistical assessments of different crustal models (e.g., CRUST1.0, LITHO1.0, and NACr14) using variable resolution residual and standard deviation maps estimated from the ensemble. These assessments show that in the stable old crust of the Archean, the misfits are statistically negligible, requiring no significant update to crustal models from the ambient noise data set. In other regions of the U.S., significant updates to regionalization and crustal structure are expected especially in the shallow sedimentary basins and the tectonically active regions, where the differences between model predictions and data are statistically significant.
Semi- and Nonparametric ARCH Processes
Directory of Open Access Journals (Sweden)
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Directory of Open Access Journals (Sweden)
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Nonparametric Bayesian Context Learning for Buried Threat Detection
2012-01-01
route clearance patrols, the vast majority of GPR data collected in the field will be free of buried threats. In current processing strategies , the large...mining competition spearheaded by Netflix , which sought to improve its movie recommen- dation algorithm [146]. As more customer data becomes available...no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control
Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.
Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S
2016-05-01
A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Random Variate Generation for Bayesian Nonparametric Reliability Analysis
2005-05-01
functions 21 22 U = rv.mrand(); //uniform(0,1) variate 23 N = rv.stdnorm(); //normal(0,1) variate 24 Chi = rv.ChiSquare(8); // ChiSquare n = 8 d.f. 25 Gam...and mu ≥ 0 for lognormal. To obtain a Chi Squared random variate with n degrees of freedom, call function ChiSquare (n) with integer valued n ≥ 1. For... ChiSquare (): Returns ChiSquare R.V. using stdnorm(). *************************************************************************/ double RanV:: ChiSquare
Nonparametric estimation of ultrasound pulses
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders
DEFF Research Database (Denmark)
Nielsen, Morten Ørregaard
2009-01-01
In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...
Pestman, Wiebe R
2009-01-01
This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.
Tattar, Prabhanjan N; Manjunath, B G
2016-01-01
Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and...
Directory of Open Access Journals (Sweden)
Maria João Nunes
2005-03-01
Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.
Nonparametric Econometrics: The np Package
Directory of Open Access Journals (Sweden)
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
Directory of Open Access Journals (Sweden)
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
Nonparametric test for detecting change in distribution with panel data
Pommeret, Denys; Ghattas, Badih
2011-01-01
This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
Scargle, Jeffrey D; Jackson, Brad; Chiang, James
2012-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of vari...
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Granade, Christopher; Cory, D G
2015-01-01
In recent years, Bayesian methods have been proposed as a solution to a wide range of issues in quantum state and process tomography. State-of- the-art Bayesian tomography solutions suffer from three problems: numerical intractability, a lack of informative prior distributions, and an inability to track time-dependent processes. Here, we solve all three problems. First, we use modern statistical methods, as pioneered by Husz\\'ar and Houlsby and by Ferrie, to make Bayesian tomography numerically tractable. Our approach allows for practical computation of Bayesian point and region estimators for quantum states and channels. Second, we propose the first informative priors on quantum states and channels. Finally, we develop a method that allows online tracking of time-dependent states and estimates the drift and diffusion processes affecting a state. We provide source code and animated visual examples for our methods.
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Nonparametric estimation of employee stock options
Institute of Scientific and Technical Information of China (English)
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
Inference in hybrid Bayesian networks
DEFF Research Database (Denmark)
Lanseth, Helge; Nielsen, Thomas Dyhre; Rumí, Rafael
2009-01-01
Since the 1980s, Bayesian Networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability-techniques (like fault trees...... decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability....
Schutte, Willem D.; Swanepoel, Jan W. H.
2016-09-01
An automated tool to derive the off-pulse interval of a light curve originating from a pulsar is needed. First, we derive a powerful and accurate non-parametric sequential estimation technique to estimate the off-pulse interval of a pulsar light curve in an objective manner. This is in contrast to the subjective `eye-ball' (visual) technique, and complementary to the Bayesian Block method which is currently used in the literature. The second aim involves the development of a statistical package, necessary for the implementation of our new estimation technique. We develop a statistical procedure to estimate the off-pulse interval in the presence of noise. It is based on a sequential application of p-values obtained from goodness-of-fit tests for uniformity. The Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling and Rayleigh test statistics are applied. The details of the newly developed statistical package SOPIE (Sequential Off-Pulse Interval Estimation) are discussed. The developed estimation procedure is applied to simulated and real pulsar data. Finally, the SOPIE estimated off-pulse intervals of two pulsars are compared to the estimates obtained with the Bayesian Block method and yield very satisfactory results. We provide the code to implement the SOPIE package, which is publicly available at http://CRAN.R-project.org/package=SOPIE (Schutte).
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Quantum-Like Bayesian Networks for Modeling Decision Making
Directory of Open Access Journals (Sweden)
Catarina eMoreira
2016-01-01
Full Text Available In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios.
Quantum-Like Bayesian Networks for Modeling Decision Making.
Moreira, Catarina; Wichert, Andreas
2016-01-01
In this work, we explore an alternative quantum structure to perform quantum probabilistic inferences to accommodate the paradoxical findings of the Sure Thing Principle. We propose a Quantum-Like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. However, since this approach suffers from the problem of exponential growth of quantum parameters, we also propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalized for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. In the end, the model that we propose consists in a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous quantum dynamic and quantum-like models proposed in the literature. We tested the proposed network with several empirical data from the literature, mainly from the Prisoner's Dilemma game and the Two Stage Gambling game. The results obtained show that the proposed quantum Bayesian Network is a general method that can accommodate violations of the laws of classical probability theory and make accurate predictions regarding human decision-making in these scenarios.
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Bayesian Inference in Polling Technique: 1992 Presidential Polls.
Satake, Eiki
1994-01-01
Explores the potential utility of Bayesian statistical methods in determining the predictability of multiple polls. Compares Bayesian techniques to the classical statistical method employed by pollsters. Considers these questions in the context of the 1992 presidential elections. (HB)
Energy Technology Data Exchange (ETDEWEB)
Boulanger, Jean-Philippe [LODYC, UMR CNRS/IRD/UPMC, Tour 45-55/Etage 4/Case 100, UPMC, Paris Cedex 05 (France); University of Buenos Aires, Departamento de Ciencias de la Atmosfera y los Oceanos, Facultad de Ciencias Exactas y Naturales, Buenos Aires (Argentina); Martinez, Fernando; Segura, Enrique C. [University of Buenos Aires, Departamento de Computacion, Facultad de Ciencias Exactas y Naturales, Buenos Aires (Argentina)
2007-02-15
Evaluating the response of climate to greenhouse gas forcing is a major objective of the climate community, and the use of large ensemble of simulations is considered as a significant step toward that goal. The present paper thus discusses a new methodology based on neural network to mix ensemble of climate model simulations. Our analysis consists of one simulation of seven Atmosphere-Ocean Global Climate Models, which participated in the IPCC Project and provided at least one simulation for the twentieth century (20c3m) and one simulation for each of three SRES scenarios: A2, A1B and B1. Our statistical method based on neural networks and Bayesian statistics computes a transfer function between models and observations. Such a transfer function was then used to project future conditions and to derive what we would call the optimal ensemble combination for twenty-first century climate change projections. Our approach is therefore based on one statement and one hypothesis. The statement is that an optimal ensemble projection should be built by giving larger weights to models, which have more skill in representing present climate conditions. The hypothesis is that our method based on neural network is actually weighting the models that way. While the statement is actually an open question, which answer may vary according to the region or climate signal under study, our results demonstrate that the neural network approach indeed allows to weighting models according to their skills. As such, our method is an improvement of existing Bayesian methods developed to mix ensembles of simulations. However, the general low skill of climate models in simulating precipitation mean climatology implies that the final projection maps (whatever the method used to compute them) may significantly change in the future as models improve. Therefore, the projection results for late twenty-first century conditions are presented as possible projections based on the &apos
da Silva, Arlindo M.; Norris, Peter M.
2013-01-01
Part I presented a Monte Carlo Bayesian method for constraining a complex statistical model of GCM sub-gridcolumn moisture variability using high-resolution MODIS cloud data, thereby permitting large-scale model parameter estimation and cloud data assimilation. This part performs some basic testing of this new approach, verifying that it does indeed significantly reduce mean and standard deviation biases with respect to the assimilated MODIS cloud optical depth, brightness temperature and cloud top pressure, and that it also improves the simulated rotational-Ramman scattering cloud optical centroid pressure (OCP) against independent (non-assimilated) retrievals from the OMI instrument. Of particular interest, the Monte Carlo method does show skill in the especially difficult case where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach allows finite jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. This paper also examines a number of algorithmic and physical sensitivities of the new method and provides guidance for its cost-effective implementation. One obvious difficulty for the method, and other cloud data assimilation methods as well, is the lack of information content in the cloud observables on cloud vertical structure, beyond cloud top pressure and optical thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard (1998) provides some help in this respect, by better honoring inversion structures in the background state.
Bayesian Inference on Gravitational Waves
Directory of Open Access Journals (Sweden)
Asad Ali
2015-12-01
Full Text Available The Bayesian approach is increasingly becoming popular among the astrophysics data analysis communities. However, the Pakistan statistics communities are unaware of this fertile interaction between the two disciplines. Bayesian methods have been in use to address astronomical problems since the very birth of the Bayes probability in eighteenth century. Today the Bayesian methods for the detection and parameter estimation of gravitational waves have solid theoretical grounds with a strong promise for the realistic applications. This article aims to introduce the Pakistan statistics communities to the applications of Bayesian Monte Carlo methods in the analysis of gravitational wave data with an overview of the Bayesian signal detection and estimation methods and demonstration by a couple of simplified examples.
Institute of Scientific and Technical Information of China (English)
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
The Bayesian Inventory Problem
1984-05-01
Bayesian Approach to Demand Estimation and Inventory Provisioning," Naval Research Logistics Quarterly. Vol 20, 1973, (p607-624). 4 DeGroot , Morris H...page is blank APPENDIX A SUFFICIENT STATISTICS A convenient reference for moat of this material is DeGroot (41. Su-pose that we are sampling from a
Shterev, Ivo; Dunson, David
2012-01-01
This paper presents an application of statistical machine learning to the field of watermarking. We propose a new attack model on additive spread-spectrum watermarking systems. The proposed attack is based on Bayesian statistics. We consider the scenario in which a watermark signal is repeatedly embedded in specific, possibly chosen based on a secret message bitstream, segments (signals) of the host data. The host signal can represent a patch of pixels from an image or a video frame. We propo...
Nonparametric Regression with Common Shocks
Directory of Open Access Journals (Sweden)
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
Lesaffre, Emmanuel
2012-01-01
The growth of biostatistics has been phenomenal in recent years and has been marked by considerable technical innovation in both methodology and computational practicality. One area that has experienced significant growth is Bayesian methods. The growing use of Bayesian methodology has taken place partly due to an increasing number of practitioners valuing the Bayesian paradigm as matching that of scientific discovery. In addition, computational advances have allowed for more complex models to be fitted routinely to realistic data sets. Through examples, exercises and a combination of introd
Nonparametric TOA estimators for low-resolution IR-UWB digital receiver
Institute of Scientific and Technical Information of China (English)
Yanlong Zhang; Weidong Chen
2015-01-01
Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.
An asymptotically optimal nonparametric adaptive controller
Institute of Scientific and Technical Information of China (English)
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
Bayesian demography 250 years after Bayes.
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms.
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Bayesian flaw characterization from eddy current measurements with grain noise
McMahan, Jerry A.; Aldrin, John C.; Shell, Eric; Oneida, Erin
2017-02-01
The Bayesian approach to inference from measurement data has the potential to provide highly reliable characterizations of flaw geometry by quantifying the confidence in the estimate results. The accuracy of these confidence estimates depends on the accuracy of the model for the measurement error. Eddy current measurements of electrically anisotropic metals, such as titanium, exhibit a phenomenon called grain noise in which the measurement error is spatially correlated even with no flaw present. We show that the most commonly used statistical model for the measurement error, which fails to account for this correlation, results in overconfidence in the flaw geometry estimates from eddy current data, thereby reducing the effectiveness of the Bayesian approach. We then describe a method of modeling the grain noise as a Gaussian process (GP) using spectral mixture kernels, a type of non-parametric model for the covariance kernel of a GP This provides a broadly applicable, data-driven way of modeling correlation in measurement error. Our results show that incorporation of this noise model results in a more reliable estimate of the flaw and better agreement with the available validation data.
Rate-optimal Bayesian intensity smoothing for inhomogeneous Poisson processes
E. Belitser; P. Serra; H. van Zanten
2015-01-01
We apply nonparametric Bayesian methods to study the problem of estimating the intensity function of an inhomogeneous Poisson process. To motivate our results we start by analyzing count data coming from a call center which we model as a Poisson process. This analysis is carried out using a certain
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
Advances in Bayesian Modeling in Educational Research
Levy, Roy
2016-01-01
In this article, I provide a conceptually oriented overview of Bayesian approaches to statistical inference and contrast them with frequentist approaches that currently dominate conventional practice in educational research. The features and advantages of Bayesian approaches are illustrated with examples spanning several statistical modeling…
Nonparametric modeling of dynamic functional connectivity in fmri data
DEFF Research Database (Denmark)
Nielsen, Søren Føns Vind; Madsen, Kristoffer H.; Røge, Rasmus
2015-01-01
in Bayesian statistical modeling we use the predictive likelihood to investigate if the model can discriminate between a motor task and rest both within and across subjects. We further investigate what drives dynamic states using the model on the entire data collated across subjects and task/rest. We find...
Nonparametric Detection of Geometric Structures Over Networks
Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent
2017-10-01
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Bayesian Statistics in Adjustment of Premium%贝叶斯方法在调整保险费率中的应用
Institute of Scientific and Technical Information of China (English)
陈正; 汪飞飞
2012-01-01
Adjustment of premium by the situation of the market management is very important for the in- surance company. This paper illustrates Bayesian premium adjusted method by using example analyzes the feasibility of premium valuation under Bayesian premium adjusted method. Both method and conclusions could be applied in small sample premium valuation of non-life insurance practice.%根据市场经营情况适时调整保险费系统对保险公司至关重要。对贝叶斯调整保险费方法进行阐述，运用实例分析说明贝叶斯调整保险费方法估计保险费率的可行性。本文的方法和结论可运用于非寿险实务中小样本数据的保险费估计工作。
Bayesian Inference: with ecological applications
Link, William A.; Barker, Richard J.
2010-01-01
This text provides a mathematically rigorous yet accessible and engaging introduction to Bayesian inference with relevant examples that will be of interest to biologists working in the fields of ecology, wildlife management and environmental studies as well as students in advanced undergraduate statistics.. This text opens the door to Bayesian inference, taking advantage of modern computational efficiencies and easily accessible software to evaluate complex hierarchical models.
Bayesian analysis of CCDM Models
Jesus, J. F.; Valentim, R.; Andrade-Oliveira, F.
2016-01-01
Creation of Cold Dark Matter (CCDM), in the context of Einstein Field Equations, leads to negative creation pressure, which can be used to explain the accelerated expansion of the Universe. In this work we tested six different spatially flat models for matter creation using statistical tools, at light of SN Ia data: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Bayesian Evidence (BE). These approaches allow to compare models considering goodness of fit and numbe...
Parametric and Non-Parametric System Modelling
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
A non-parametric peak finder algorithm and its application in searches for new physics
Chekanov, S
2011-01-01
We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.
Bayesian methods for measures of agreement
Broemeling, Lyle D
2009-01-01
Using WinBUGS to implement Bayesian inferences of estimation and testing hypotheses, Bayesian Methods for Measures of Agreement presents useful methods for the design and analysis of agreement studies. It focuses on agreement among the various players in the diagnostic process.The author employs a Bayesian approach to provide statistical inferences based on various models of intra- and interrater agreement. He presents many examples that illustrate the Bayesian mode of reasoning and explains elements of a Bayesian application, including prior information, experimental information, the likelihood function, posterior distribution, and predictive distribution. The appendices provide the necessary theoretical foundation to understand Bayesian methods as well as introduce the fundamentals of programming and executing the WinBUGS software.Taking a Bayesian approach to inference, this hands-on book explores numerous measures of agreement, including the Kappa coefficient, the G coefficient, and intraclass correlation...
Vargas Cardona, Hernán Darío; Orozco, Álvaro Ángel; Álvarez, Mauricio A
2013-01-01
Automatic identification of biosignals is one of the more studied fields in biomedical engineering. In this paper, we present an approach for the unsupervised recognition of biomedical signals: Microelectrode Recordings (MER) and Electrocardiography signals (ECG). The unsupervised learning is based in classic and bayesian estimation theory. We employ gaussian mixtures models with two estimation methods. The first is derived from the frequentist estimation theory, known as Expectation-Maximization (EM) algorithm. The second is obtained from bayesian probabilistic estimation and it is called variational inference. In this framework, both methods are used for parameters estimation of Gaussian mixtures. The mixtures models are used for unsupervised pattern classification, through the responsibility matrix. The algorithms are applied in two real databases acquired in Parkinson's disease surgeries and electrocardiograms. The results show an accuracy over 85% in MER and 90% in ECG for identification of two classes. These results are statistically equal or even better than parametric (Naive Bayes) and nonparametric classifiers (K-nearest neighbor).
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
Villalba, Jesús
2015-01-01
In this document we are going to derive the equations needed to implement a Variational Bayes estimation of the parameters of the simplified probabilistic linear discriminant analysis (SPLDA) model. This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe. Our approach is similar to Bishop's VB PPCA.
Directory of Open Access Journals (Sweden)
Adrion Christine
2012-09-01
Full Text Available Abstract Background A statistical analysis plan (SAP is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs. The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC or probability integral transform (PIT, and by using proper scoring rules (e.g. the logarithmic score. Results The instruments under study
Congdon, Peter
2014-01-01
This book provides an accessible approach to Bayesian computing and data analysis, with an emphasis on the interpretation of real data sets. Following in the tradition of the successful first edition, this book aims to make a wide range of statistical modeling applications accessible using tested code that can be readily adapted to the reader's own applications. The second edition has been thoroughly reworked and updated to take account of advances in the field. A new set of worked examples is included. The novel aspect of the first edition was the coverage of statistical modeling using WinBU
Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package
Directory of Open Access Journals (Sweden)
Christopher J. Paciorek
2007-04-01
Full Text Available The spectral representation of stationary Gaussian processes via the Fourier basis provides a computationally efficient specification of spatial surfaces and nonparametric regression functions for use in various statistical models. I describe the representation in detail and introduce the spectralGP package in R for computations. Because of the large number of basis coefficients, some form of shrinkage is necessary; I focus on a natural Bayesian approach via a particular parameterized prior structure that approximates stationary Gaussian processes on a regular grid. I review several models from the literature for data that do not lie on a grid, suggest a simple model modification, and provide example code demonstrating MCMC sampling using the spectralGP package. I describe reasons that mixing can be slow in certain situations and provide some suggestions for MCMC techniques to improve mixing, also with example code, and some general recommendations grounded in experience.
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Bayesian theory and applications
Dellaportas, Petros; Polson, Nicholas G; Stephens, David A
2013-01-01
The development of hierarchical models and Markov chain Monte Carlo (MCMC) techniques forms one of the most profound advances in Bayesian analysis since the 1970s and provides the basis for advances in virtually all areas of applied and theoretical Bayesian statistics. This volume guides the reader along a statistical journey that begins with the basic structure of Bayesian theory, and then provides details on most of the past and present advances in this field. The book has a unique format. There is an explanatory chapter devoted to each conceptual advance followed by journal-style chapters that provide applications or further advances on the concept. Thus, the volume is both a textbook and a compendium of papers covering a vast range of topics. It is appropriate for a well-informed novice interested in understanding the basic approach, methods and recent applications. Because of its advanced chapters and recent work, it is also appropriate for a more mature reader interested in recent applications and devel...
Energy Technology Data Exchange (ETDEWEB)
Procaccia, H.; Cordier, R.; Muller, S.
1994-07-01
Statistical decision theory could be a alternative for the optimization of preventive maintenance periodicity. In effect, this theory concerns the situation in which a decision maker has to make a choice between a set of reasonable decisions, and where the loss associated to a given decision depends on a probabilistic risk, called state of nature. In the case of maintenance optimization, the decisions to be analyzed are different periodicities proposed by the experts, given the observed feedback experience, the states of nature are the associated failure probabilities, and the losses are the expectations of the induced cost of maintenance and of consequences of the failures. As failure probabilities concern rare events, at the ultimate state of RCM analysis (failure of sub-component), and as expected foreseeable behaviour of equipment has to be evaluated by experts, Bayesian approach is successfully used to compute states of nature. In Bayesian decision theory, a prior distribution for failure probabilities is modeled from expert knowledge, and is combined with few stochastic information provided by feedback experience, giving a posterior distribution of failure probabilities. The optimized decision is the decision that minimizes the expected loss over the posterior distribution. This methodology has been applied to inspection and maintenance optimization of cylinders of diesel generator engines of 900 MW nuclear plants. In these plants, auxiliary electric power is supplied by 2 redundant diesel generators which are tested every 2 weeks during about 1 hour. Until now, during yearly refueling of each plant, one endoscopic inspection of diesel cylinders is performed, and every 5 operating years, all cylinders are replaced. RCM has shown that cylinder failures could be critical. So Bayesian decision theory has been applied, taking into account expert opinions, and possibility of aging when maintenance periodicity is extended. (authors). 8 refs., 5 figs., 1 tab.
Hedlund, Jonas
2014-01-01
This paper introduces private sender information into a sender-receiver game of Bayesian persuasion with monotonic sender preferences. I derive properties of increasing differences related to the precision of signals and use these to fully characterize the set of equilibria robust to the intuitive criterion. In particular, all such equilibria are either separating, i.e., the sender's choice of signal reveals his private information to the receiver, or fully disclosing, i.e., the outcome of th...
Kirstein, Roland
2005-01-01
This paper presents a modification of the inspection game: The ?Bayesian Monitoring? model rests on the assumption that judges are interested in enforcing compliant behavior and making correct decisions. They may base their judgements on an informative but imperfect signal which can be generated costlessly. In the original inspection game, monitoring is costly and generates a perfectly informative signal. While the inspection game has only one mixed strategy equilibrium, three Perfect Bayesia...
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Further Research into a Non-Parametric Statistical Screening System.
1979-12-14
Let X = V if birth weight is high X2 = 0 if gestation length is short V2 if gestation length is long Normal babies have high birth weight and long... gestation length or low birth weight and short gestation length . Abnormal babies have either of the other two combinations ((0, 1) or (1, 0)). The LDF
A Statistical, Nonparametric Methodology for Document Degradation Model Validation
1999-01-01
j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno &;c’)%m�wb%jl%V_=i"jl_t� nvj,�Ac*,d’&)w4j!*,’j!�¢�7d*j,*�%nv&;no&;c£p8mq%_,_�n¥¤)i...dmvmoiwi"�(^)i"*,nv�Ui"&(jl_�j,d _�j!’)w4+j,�bi¡h;*!i%r(w4d� &|^)dnv&4j�_ad� j,�biU_=+;_=j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno ...34 mvno �;dd(w°�7’&Rp8j,nvd&-d� j!�;i¦w;%j�%4�U© jj!’*,&R_ d’j�j!�)%j\\j!�;nq_\\w4n�_�jl%&)p8i��7’&Rp8j,nvd&-nq_tj!�;iU_!%�UiL%_ j!�;i�d&;i
a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows
Molina, J. M.; Ramirez, J. A.; Raff, D. A.
2011-12-01
A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both
Computational Advances for and from Bayesian Analysis
Andrieu, C.; Doucet, A.; Robert, C. P.
2004-01-01
The emergence in the past years of Bayesian analysis in many methodological and applied fields as the solution to the modeling of complex problems cannot be dissociated from major changes in its computational implementation. We show in this review how the advances in Bayesian analysis and statistical computation are intermingled.
Directory of Open Access Journals (Sweden)
Roberto da Costa Quinino
1997-12-01
Full Text Available Nos testes para atributos é importante avaliar a eficiência dos inspetores que julgam a qualidade do produto. Este trabalho apresenta um método bayesiano para avaliação de inspetores em testes de conformidade e não-conformidade. Avaliações em que não se encontram disponíveis a real classificação dos produtos também são discutidas.When testing for attributes, it is important to assess the inspectors' efficiency as they judge the product quality by classifying it as conforming or non-conforming. This work presents a bayesian method for evaluating sensory inspectors, including discussions about situations in which classifications are made for which the real state of the product is not known.
3rd Bayesian Young Statisticians Meeting
Lanzarone, Ettore; Villalobos, Isadora; Mattei, Alessandra
2017-01-01
This book is a selection of peer-reviewed contributions presented at the third Bayesian Young Statisticians Meeting, BAYSM 2016, Florence, Italy, June 19-21. The meeting provided a unique opportunity for young researchers, M.S. students, Ph.D. students, and postdocs dealing with Bayesian statistics to connect with the Bayesian community at large, to exchange ideas, and to network with others working in the same field. The contributions develop and apply Bayesian methods in a variety of fields, ranging from the traditional (e.g., biostatistics and reliability) to the most innovative ones (e.g., big data and networks).
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
A Censored Nonparametric Software Reliability Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Approximate Bayesian computation.
Directory of Open Access Journals (Sweden)
Mikael Sunnåker
Full Text Available Approximate Bayesian computation (ABC constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology.
A Gentle Introduction to Bayesian Analysis : Applications to Developmental Research
Van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A G
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, t
Prior approval: the growth of Bayesian methods in psychology.
Andrews, Mark; Baguley, Thom
2013-02-01
Within the last few years, Bayesian methods of data analysis in psychology have proliferated. In this paper, we briefly review the history or the Bayesian approach to statistics, and consider the implications that Bayesian methods have for the theory and practice of data analysis in psychology.
Bessiere, Pierre; Ahuactzin, Juan Manuel; Mekhnacha, Kamel
2013-01-01
Probability as an Alternative to Boolean LogicWhile logic is the mathematical foundation of rational reasoning and the fundamental principle of computing, it is restricted to problems where information is both complete and certain. However, many real-world problems, from financial investments to email filtering, are incomplete or uncertain in nature. Probability theory and Bayesian computing together provide an alternative framework to deal with incomplete and uncertain data. Decision-Making Tools and Methods for Incomplete and Uncertain DataEmphasizing probability as an alternative to Boolean
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Decompounding random sums: A nonparametric approach
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Structure learning for Bayesian networks as models of biological networks.
Larjo, Antti; Shmulevich, Ilya; Lähdesmäki, Harri
2013-01-01
Bayesian networks are probabilistic graphical models suitable for modeling several kinds of biological systems. In many cases, the structure of a Bayesian network represents causal molecular mechanisms or statistical associations of the underlying system. Bayesian networks have been applied, for example, for inferring the structure of many biological networks from experimental data. We present some recent progress in learning the structure of static and dynamic Bayesian networks from data.
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...
Woldegebriel, Michael; Zomer, Paul; Mol, Hans G J; Vivó-Truyols, Gabriel
2016-08-02
In this work, we introduce an automated, efficient, and elegant model to combine all pieces of evidence (e.g., expected retention times, peak shapes, isotope distributions, fragment-to-parent ratio) obtained from liquid chromatography-tandem mass spectrometry (LC-MS/MS/MS) data for screening purposes. Combining all these pieces of evidence requires a careful assessment of the uncertainties in the analytical system as well as all possible outcomes. To-date, the majority of the existing algorithms are highly dependent on user input parameters. Additionally, the screening process is tackled as a deterministic problem. In this work we present a Bayesian framework to deal with the combination of all these pieces of evidence. Contrary to conventional algorithms, the information is treated in a probabilistic way, and a final probability assessment of the presence/absence of a compound feature is computed. Additionally, all the necessary parameters except the chromatographic band broadening for the method are learned from the data in training and learning phase of the algorithm, avoiding the introduction of a large number of user-defined parameters. The proposed method was validated with a large data set and has shown improved sensitivity and specificity in comparison to a threshold-based commercial software package.
Bayesian stable isotope mixing models
In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixtur...
Directory of Open Access Journals (Sweden)
Rabia Ece OMAY
2013-06-01
Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Nonparametric estimation of quantum states, processes and measurements
Lougovski, Pavel; Bennink, Ryan
Quantum state, process, and measurement estimation methods traditionally use parametric models, in which the number and role of relevant parameters is assumed to be known. When such an assumption cannot be justified, a common approach in many disciplines is to fit the experimental data to multiple models with different sets of parameters and utilize an information criterion to select the best fitting model. However, it is not always possible to assume a model with a finite (countable) number of parameters. This typically happens when there are unobserved variables that stem from hidden correlations that can only be unveiled after collecting experimental data. How does one perform quantum characterization in this situation? We present a novel nonparametric method of experimental quantum system characterization based on the Dirichlet Process (DP) that addresses this problem. Using DP as a prior in conjunction with Bayesian estimation methods allows us to increase model complexity (number of parameters) adaptively as the number of experimental observations grows. We illustrate our approach for the one-qubit case and show how a probability density function for an unknown quantum process can be estimated.
The Statistical Consulting Center for Astronomy (SCCA)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to
Harnessing graphics processing units for improved neuroimaging statistics.
Eklund, Anders; Villani, Mattias; Laconte, Stephen M
2013-09-01
Simple models and algorithms based on restrictive assumptions are often used in the field of neuroimaging for studies involving functional magnetic resonance imaging, voxel based morphometry, and diffusion tensor imaging. Nonparametric statistical methods or flexible Bayesian models can be applied rather easily to yield more trustworthy results. The spatial normalization step required for multisubject studies can also be improved by taking advantage of more robust algorithms for image registration. A common drawback of algorithms based on weaker assumptions, however, is the increase in computational complexity. In this short overview, we will therefore present some examples of how inexpensive PC graphics hardware, normally used for demanding computer games, can be used to enable practical use of more realistic models and accurate algorithms, such that the outcome of neuroimaging studies really can be trusted.
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Perception, illusions and Bayesian inference.
Nour, Matthew M; Nour, Joseph M
2015-01-01
Descriptive psychopathology makes a distinction between veridical perception and illusory perception. In both cases a perception is tied to a sensory stimulus, but in illusions the perception is of a false object. This article re-examines this distinction in light of new work in theoretical and computational neurobiology, which views all perception as a form of Bayesian statistical inference that combines sensory signals with prior expectations. Bayesian perceptual inference can solve the 'inverse optics' problem of veridical perception and provides a biologically plausible account of a number of illusory phenomena, suggesting that veridical and illusory perceptions are generated by precisely the same inferential mechanisms.
Non-parametric Morphologies of Mergers in the Illustris Simulation
Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G
2016-01-01
We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James
2013-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks [Scargle 1998]-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piece- wise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.
STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS
Energy Technology Data Exchange (ETDEWEB)
Scargle, Jeffrey D. [Space Science and Astrobiology Division, MS 245-3, NASA Ames Research Center, Moffett Field, CA 94035-1000 (United States); Norris, Jay P. [Physics Department, Boise State University, 2110 University Drive, Boise, ID 83725-1570 (United States); Jackson, Brad [The Center for Applied Mathematics and Computer Science, Department of Mathematics, San Jose State University, One Washington Square, MH 308, San Jose, CA 95192-0103 (United States); Chiang, James, E-mail: jeffrey.d.scargle@nasa.gov [W. W. Hansen Experimental Physics Laboratory, Kavli Institute for Particle Astrophysics and Cosmology, Department of Physics and SLAC National Accelerator Laboratory, Stanford University, Stanford, CA 94305 (United States)
2013-02-20
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
Statistics Anxiety and Business Statistics: The International Student
Bell, James A.
2008-01-01
Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…
portfolio optimization based on nonparametric estimation methods
Directory of Open Access Journals (Sweden)
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Bayesian estimation in IRT models with missing values in background variables
Directory of Open Access Journals (Sweden)
Christian Aßmann
2015-12-01
Full Text Available Large scale assessment studies typically aim at investigating the relationship between persons competencies and explaining variables. Individual competencies are often estimated by explicitly including explaining background variables into corresponding Item Response Theory models. Since missing values in background variables inevitably occur, strategies to handle the uncertainty related to missing values in parameter estimation are required. We propose to adapt a Bayesian estimation strategy based on Markov Chain Monte Carlo techniques. Sampling from the posterior distribution of parameters is thereby enriched by sampling from the full conditional distribution of the missing values. We consider non-parametric as well as parametric approximations for the full conditional distributions of missing values, thus allowing for a flexible incorporation of metric as well as categorical background variables. We evaluate the validity of our approach with respect to statistical accuracy by a simulation study controlling the missing values generating mechanism. We show that the proposed Bayesian strategy allows for effective comparison of nested model specifications via gauging highest posterior density intervals of all involved model parameters. An illustration of the suggested approach uses data from the National Educational Panel Study on mathematical competencies of fifth grade students.
Towards Bayesian Inference of the Fast-Ion Distribution Function
DEFF Research Database (Denmark)
Stagner, L.; Heidbrink, W.W.; Salewski, Mirko
2012-01-01
. However, when theory and experiment disagree (for one or more diagnostics), it is unclear how to proceed. Bayesian statistics provides a framework to infer the DF, quantify errors, and reconcile discrepant diagnostic measurements. Diagnostic errors and ``weight functions" that describe the phase space...... sensitivity of the measurements are incorporated into Bayesian likelihood probabilities, while prior probabilities enforce physical constraints. As an initial step, this poster uses Bayesian statistics to infer the DIII-D electron density profile from multiple diagnostic measurements. Likelihood functions...
A Non-Parametric Spatial Independence Test Using Symbolic Entropy
Directory of Open Access Journals (Sweden)
López Hernández, Fernando
2008-01-01
Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.
Nonparametric Estimation of Distributions in Random Effects Models
Hart, Jeffrey D.
2011-01-01
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.
Curve registration by nonparametric goodness-of-fit testing
Dalalyan, Arnak
2011-01-01
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the p-value of the chi squared test. We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 1. Some numerical experiments on synthetic datasets are reported as well.
Revealing components of the galaxy population through nonparametric techniques
Bamford, Steven P; Nichol, Robert C; Miller, Christopher J; Wasserman, Larry; Genovese, Christopher R; Freeman, Peter E
2008-01-01
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
DEFF Research Database (Denmark)
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Wei, Jiawei; Carroll, Raymond J; Maity, Arnab
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Testing Equality of Nonparametric Functions in Two Partially Linear Models%检验两个部分线性模型中非参函数相等
Institute of Scientific and Technical Information of China (English)
施三支; 宋立新; 杨华
2008-01-01
We propose the test statistic to check whether the nonparametric func-tions in two partially linear models are equality or not in this paper. We estimate the nonparametric function both in null hypothesis and the alternative by the local linear method, where we ignore the parametric components, and then estimate the parameters by the two stage method. The test statistic is derived, and it is shown to be asymptotically normal under the null hypothesis.
What Is the Probability You Are a Bayesian?
Wulff, Shaun S.; Robinson, Timothy J.
2014-01-01
Bayesian methodology continues to be widely used in statistical applications. As a result, it is increasingly important to introduce students to Bayesian thinking at early stages in their mathematics and statistics education. While many students in upper level probability courses can recite the differences in the Frequentist and Bayesian…
Bayesian artificial intelligence
Korb, Kevin B
2010-01-01
Updated and expanded, Bayesian Artificial Intelligence, Second Edition provides a practical and accessible introduction to the main concepts, foundation, and applications of Bayesian networks. It focuses on both the causal discovery of networks and Bayesian inference procedures. Adopting a causal interpretation of Bayesian networks, the authors discuss the use of Bayesian networks for causal modeling. They also draw on their own applied research to illustrate various applications of the technology.New to the Second EditionNew chapter on Bayesian network classifiersNew section on object-oriente
Bayesian artificial intelligence
Korb, Kevin B
2003-01-01
As the power of Bayesian techniques has become more fully realized, the field of artificial intelligence has embraced Bayesian methodology and integrated it to the point where an introduction to Bayesian techniques is now a core course in many computer science programs. Unlike other books on the subject, Bayesian Artificial Intelligence keeps mathematical detail to a minimum and covers a broad range of topics. The authors integrate all of Bayesian net technology and learning Bayesian net technology and apply them both to knowledge engineering. They emphasize understanding and intuition but also provide the algorithms and technical background needed for applications. Software, exercises, and solutions are available on the authors' website.
DEFF Research Database (Denmark)
Jensen, Finn Verner; Nielsen, Thomas Dyhre
2016-01-01
Mathematically, a Bayesian graphical model is a compact representation of the joint probability distribution for a set of variables. The most frequently used type of Bayesian graphical models are Bayesian networks. The structural part of a Bayesian graphical model is a graph consisting of nodes...... is largely due to the availability of efficient inference algorithms for answering probabilistic queries about the states of the variables in the network. Furthermore, to support the construction of Bayesian network models, learning algorithms are also available. We give an overview of the Bayesian network...
Bayesian analysis for the social sciences
Jackman, Simon
2009-01-01
Bayesian methods are increasingly being used in the social sciences, as the problems encountered lend themselves so naturally to the subjective qualities of Bayesian methodology. This book provides an accessible introduction to Bayesian methods, tailored specifically for social science students. It contains lots of real examples from political science, psychology, sociology, and economics, exercises in all chapters, and detailed descriptions of all the key concepts, without assuming any background in statistics beyond a first course. It features examples of how to implement the methods using WinBUGS - the most-widely used Bayesian analysis software in the world - and R - an open-source statistical software. The book is supported by a Website featuring WinBUGS and R code, and data sets.
Nonparametric Bayes inference for concave distribution functions
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Lauritzen, Steffen Lilholt
2002-01-01
Bayesian inference for concave distribution functions is investigated. This is made by transforming a mixture of Dirichlet processes on the space of distribution functions to the space of concave distribution functions. We give a method for sampling from the posterior distribution using a Pólya urn...
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Non-Parametric Estimation of Correlation Functions
DEFF Research Database (Denmark)
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Directory of Open Access Journals (Sweden)
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
Institute of Scientific and Technical Information of China (English)
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Preliminary results on nonparametric facial occlusion detection
Directory of Open Access Journals (Sweden)
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Applied Bayesian Hierarchical Methods
Congdon, Peter D
2010-01-01
Bayesian methods facilitate the analysis of complex models and data structures. Emphasizing data applications, alternative modeling specifications, and computer implementation, this book provides a practical overview of methods for Bayesian analysis of hierarchical models.
Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B
2013-01-01
FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear
Bayesian analysis for kaon photoproduction
Energy Technology Data Exchange (ETDEWEB)
Marsainy, T., E-mail: tmart@fisika.ui.ac.id; Mart, T., E-mail: tmart@fisika.ui.ac.id [Department Fisika, FMIPA, Universitas Indonesia, Depok 16424 (Indonesia)
2014-09-25
We have investigated contribution of the nucleon resonances in the kaon photoproduction process by using an established statistical decision making method, i.e. the Bayesian method. This method does not only evaluate the model over its entire parameter space, but also takes the prior information and experimental data into account. The result indicates that certain resonances have larger probabilities to contribute to the process.
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Ortega, Pedro A
2011-01-01
Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans use strong prior knowledge; and rather than encoding hard causal relationships, they encode beliefs over causal structures, allowing for sound generalization from the observations they obtain from directly acting in the world. In this work we propose a Bayesian approach to causal induction which allows modeling beliefs over multiple causal hypotheses and predicting the behavior of the world under causal interventions. We then illustrate how this method extracts causal information from data containing interventions and observations.
Vexler, Albert; Kim, Young Min; Yu, Jihnhee; Lazar, Nicole A; Hutson, Aland
2014-12-01
Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p-values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian-type procedures. The p-values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood-type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian-type procedure with a distribution-free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.
Bayesian structural equation modeling in sport and exercise psychology.
Stenling, Andreas; Ivarsson, Andreas; Johnson, Urban; Lindwall, Magnus
2015-08-01
Bayesian statistics is on the rise in mainstream psychology, but applications in sport and exercise psychology research are scarce. In this article, the foundations of Bayesian analysis are introduced, and we will illustrate how to apply Bayesian structural equation modeling in a sport and exercise psychology setting. More specifically, we contrasted a confirmatory factor analysis on the Sport Motivation Scale II estimated with the most commonly used estimator, maximum likelihood, and a Bayesian approach with weakly informative priors for cross-loadings and correlated residuals. The results indicated that the model with Bayesian estimation and weakly informative priors provided a good fit to the data, whereas the model estimated with a maximum likelihood estimator did not produce a well-fitting model. The reasons for this discrepancy between maximum likelihood and Bayesian estimation are discussed as well as potential advantages and caveats with the Bayesian approach.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.
Fan, Jianqing; Ma, Yunbei; Dai, Wei
2014-01-01
The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Directory of Open Access Journals (Sweden)
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Directory of Open Access Journals (Sweden)
Aaditya Ramdas
2017-01-01
Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A
2017-01-01
The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.
Variable selection in identification of a high dimensional nonlinear non-parametric system
Institute of Scientific and Technical Information of China (English)
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Discriminative Bayesian Dictionary Learning for Classification.
Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal
2016-12-01
We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.
The Bayesian bridge between simple and universal kriging
Energy Technology Data Exchange (ETDEWEB)
Omre, H.; Halvorsen, K.B. (Norwegian Computing Center, Oslo (Norway))
1989-10-01
Kriging techniques are suited well for evaluation of continuous, spatial phenomena. Bayesian statistics are characterized by using prior qualified guesses on the model parameters. By merging kriging techniques and Bayesian theory, prior guesses may be used in a spatial setting. Partial knowledge of model parameters defines a continuum of models between what is named simple and universal kriging in geostatistical terminology. The Bayesian approach to kriging is developed and discussed, and a case study concerning depth conversion of seismic reflection times is presented.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
On Parametric (and Non-Parametric Variation
Directory of Open Access Journals (Sweden)
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
An assessment of the statistical procedures used in original papers ...
African Journals Online (AJOL)
descriptive statistics, contingency table analysis, simple epidemiological ... reported data and the statistical review of articles before publication will .... Nonparametric correlation. 4. 9. 5 .... Correlation coefficient as measure of agreement. 1.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
Bayesian Source Separation and Localization
Knuth, K H
1998-01-01
The problem of mixed signals occurs in many different contexts; one of the most familiar being acoustics. The forward problem in acoustics consists of finding the sound pressure levels at various detectors resulting from sound signals emanating from the active acoustic sources. The inverse problem consists of using the sound recorded by the detectors to separate the signals and recover the original source waveforms. In general, the inverse problem is unsolvable without additional information. This general problem is called source separation, and several techniques have been developed that utilize maximum entropy, minimum mutual information, and maximum likelihood. In previous work, it has been demonstrated that these techniques can be recast in a Bayesian framework. This paper demonstrates the power of the Bayesian approach, which provides a natural means for incorporating prior information into a source model. An algorithm is developed that utilizes information regarding both the statistics of the amplitudes...
Bayesian inference on proportional elections.
Directory of Open Access Journals (Sweden)
Gabriel Hideki Vatanabe Brunello
Full Text Available Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.
Software for Spatial Statistics
Directory of Open Access Journals (Sweden)
Edzer Pebesma
2015-02-01
Full Text Available We give an overview of the papers published in this special issue on spatial statistics, of the Journal of Statistical Software. 21 papers address issues covering visualization (micromaps, links to Google Maps or Google Earth, point pattern analysis, geostatistics, analysis of areal aggregated or lattice data, spatio-temporal statistics, Bayesian spatial statistics, and Laplace approximations. We also point to earlier publications in this journal on the same topic.
Software for Spatial Statistics
Edzer Pebesma; Roger Bivand; Paulo Justiniano Ribeiro
2015-01-01
We give an overview of the papers published in this special issue on spatial statistics, of the Journal of Statistical Software. 21 papers address issues covering visualization (micromaps, links to Google Maps or Google Earth), point pattern analysis, geostatistics, analysis of areal aggregated or lattice data, spatio-temporal statistics, Bayesian spatial statistics, and Laplace approximations. We also point to earlier publications in this journal on the same topic.
THE GROWTH POINTS OF STATISTICAL METHODS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2014-11-01
Full Text Available On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
Inductive Logic and Statistics
Romeijn, J. -W.
2009-01-01
This chapter concerns inductive logic in relation to mathematical statistics. I start by introducing a general notion of probabilistic induc- tive inference. Then I introduce Carnapian inductive logic, and I show that it can be related to Bayesian statistical inference via de Finetti's representatio
Variations on Bayesian Prediction and Inference
2016-05-09
Variations on Bayesian prediction and inference” Ryan Martin Department of Mathematics, Statistics , and Computer Science University of Illinois at Chicago...using statistical ideas/methods. We recently learned that this new project will be supported, in part, by the National Science Foundation. 2.2 Problem 2...41. Kalli, M., Griffin, J. E., Walker, S. G. (2011). Slice sampling mixture models. Statistics and Computing 21, 93–105. Koenker, R. (2005). Quantile
Large-scale hybrid Bayesian network for traffic load modeling from weigh-in-motion system data
Morales-Nápoles, O.; Steenbergen, R.D.J.M.
2014-01-01
Traffic load plays an important role not only in the design of new bridges but also in the reliability assessment of existing structures. Weigh-in-motion systems are used to collect data to determine traffic loads. In this paper, the potential of hybrid nonparametric Bayesian networks (BNs) is
Using Alien Coins to Test Whether Simple Inference Is Bayesian
Cassey, Peter; Hawkins, Guy E.; Donkin, Chris; Brown, Scott D.
2016-01-01
Reasoning and inference are well-studied aspects of basic cognition that have been explained as statistically optimal Bayesian inference. Using a simplified experimental design, we conducted quantitative comparisons between Bayesian inference and human inference at the level of individuals. In 3 experiments, with more than 13,000 participants, we…
Bayesian Games with Intentions
Directory of Open Access Journals (Sweden)
Adam Bjorndahl
2016-06-01
Full Text Available We show that standard Bayesian games cannot represent the full spectrum of belief-dependent preferences. However, by introducing a fundamental distinction between intended and actual strategies, we remove this limitation. We define Bayesian games with intentions, generalizing both Bayesian games and psychological games, and prove that Nash equilibria in psychological games correspond to a special class of equilibria as defined in our setting.
Doing bayesian data analysis a tutorial with R and BUGS
Kruschke, John K
2011-01-01
There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis, A Tutorial Introduction with R and BUGS provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. The text delivers comprehensive coverage of all
Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space
Olhede, Sofia C
2010-01-01
High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging. Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function. Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and also analytically tractable. The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density. Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed. More subtle characteristics of white-matter microstructure, such as the degre...
All of statistics a concise course in statistical inference
Wasserman, Larry
2004-01-01
This book is for people who want to learn probability and statistics quickly It brings together many of the main ideas in modern statistics in one place The book is suitable for students and researchers in statistics, computer science, data mining and machine learning This book covers a much wider range of topics than a typical introductory text on mathematical statistics It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses The reader is assumed to know calculus and a little linear algebra No previous knowledge of probability and statistics is required The text can be used at the advanced undergraduate and graduate level Larry Wasserman is Professor of Statistics at Carnegie Mellon University He is also a member of the Center for Automated Learning and Discovery in the School of Computer Science His research areas include nonparametric inference, asymptotic theory, causality, and applications to astrophysics, bi...
Bayesian modeling in conjoint analysis
Directory of Open Access Journals (Sweden)
Janković-Milić Vesna
2010-01-01
Full Text Available Statistical analysis in marketing is largely influenced by the availability of various types of data. There is sudden increase in the number and types of information available to market researchers in the last decade. In such conditions, traditional statistical methods have limited ability to solve problems related to the expression of market uncertainty. The aim of this paper is to highlight the advantages of bayesian inference, as an alternative approach to classical inference. Multivariate statistic methods offer extremely powerful tools to achieve many goals of marketing research. One of these methods is the conjoint analysis, which provides a quantitative measure of the relative importance of product or service attributes in relation to the other attribute. The application of this method involves interviewing consumers, where they express their preferences, and statistical analysis provides numerical indicators of each attribute utility. One of the main objections to the method of discrete choice in the conjoint analysis is to use this method to estimate the utility only at the aggregate level and by expressing the average utility for all respondents in the survey. Application of hierarchical Bayesian models enables capturing of individual utility ratings for each attribute level.
Noise and speckle reduction in synthetic aperture radar imagery by nonparametric Wiener filtering.
Caprari, R S; Goh, A S; Moffatt, E K
2000-12-10
We present a Wiener filter that is especially suitable for speckle and noise reduction in multilook synthetic aperture radar (SAR) imagery. The proposed filter is nonparametric, not being based on parametrized analytical models of signal statistics. Instead, the Wiener-Hopf equation is expressed entirely in terms of observed signal statistics, with no reference to the possibly unobservable pure signal and noise. This Wiener filter is simple in concept and implementation, exactly minimum mean-square error, and directly applicable to signal-dependent and multiplicative noise. We demonstrate the filtering of a genuine two-look SAR image and show how a nonnegatively constrained version of the filter substantially reduces ringing.
Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method
Institute of Scientific and Technical Information of China (English)
WANG Wei(王玮); CAI LianHong(蔡莲红)
2003-01-01
Bayesian approach is an important method in statistics. The Bayesian belief network is a powerful knowledge representation and reasoning tool under the conditions of uncertainty.It is a graphics model that encodes probabilistic relationships among variables of interest. In this paper, an approach to Bayesian network construction is given for discovering the Chinese linguistic parameter relationship in the corpus.
Bayesian astrostatistics: a backward look to the future
Loredo, Thomas J
2012-01-01
This perspective chapter briefly surveys: (1) past growth in the use of Bayesian methods in astrophysics; (2) current misconceptions about both frequentist and Bayesian statistical inference that hinder wider adoption of Bayesian methods by astronomers; and (3) multilevel (hierarchical) Bayesian modeling as a major future direction for research in Bayesian astrostatistics, exemplified in part by presentations at the first ISI invited session on astrostatistics, commemorated in this volume. It closes with an intentionally provocative recommendation for astronomical survey data reporting, motivated by the multilevel Bayesian perspective on modeling cosmic populations: that astronomers cease producing catalogs of estimated fluxes and other source properties from surveys. Instead, summaries of likelihood functions (or marginal likelihood functions) for source properties should be reported (not posterior probability density functions), including nontrivial summaries (not simply upper limits) for candidate objects ...
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Nonparametric methods in actigraphy: An update
Directory of Open Access Journals (Sweden)
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Directory of Open Access Journals (Sweden)
Kühnast, Corinna
2008-04-01
Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.
Nonparametric Bayesian Inference for Mean Residual Life Functions in Survival Analysis
Poynor, Valerie; Kottas, Athanasios
2014-01-01
Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life function which provides the expected remaining lifetime given that a subject has survived (i.e., is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the mean residual life function characterizes the sur...
Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning
CSIR Research Space (South Africa)
Ranchod, P
2015-10-01
Full Text Available We present a method for segmenting a set of unstructured demonstration trajectories to discover reusable skills using inverse reinforcement learning (IRL). Each skill is characterised by a latent reward function which the demonstrator is assumed...
Directory of Open Access Journals (Sweden)
William A Griffin
Full Text Available Sequential affect dynamics generated during the interaction of intimate dyads, such as married couples, are associated with a cascade of effects-some good and some bad-on each partner, close family members, and other social contacts. Although the effects are well documented, the probabilistic structures associated with micro-social processes connected to the varied outcomes remain enigmatic. Using extant data we developed a method of classifying and subsequently generating couple dynamics using a Hierarchical Dirichlet Process Hidden semi-Markov Model (HDP-HSMM. Our findings indicate that several key aspects of existing models of marital interaction are inadequate: affect state emissions and their durations, along with the expected variability differences between distressed and nondistressed couples are present but highly nuanced; and most surprisingly, heterogeneity among highly satisfied couples necessitate that they be divided into subgroups. We review how this unsupervised learning technique generates plausible dyadic sequences that are sensitive to relationship quality and provide a natural mechanism for computational models of behavioral and affective micro-social processes.
Non-parametric Bayesian networks for parameter estimation in reservoir engineering
Zilko, A.A.; Hanea, A.M.; Hanea, R.G.
2013-01-01
The ultimate goal in reservoir engineering is to optimize hydrocarbon recovery from a reservoir. To achieve the goal, good knowledge of the subsurface properties is crucial. One of these properties is the permeability. Ensemble Kalman Filter (EnKF) is the most common tool used to deal with this
Bayesian Non-parametric model to Target Gamification Notifications Using Big Data
Nia, Meisam Hejazi; Ratchford, Brian
2016-01-01
I suggest an approach that helps the online marketers to target their Gamification elements to users by modifying the order of the list of tasks that they send to users. It is more realistic and flexible as it allows the model to learn more parameters when the online marketers collect more data. The targeting approach is scalable and quick, and it can be used over streaming data.
Moraes, L.E.; Kebreab, E.; Strathe, A.B.; France, J.; Dijkstra, J.; Casper, D.; Fadel, J.G.
2014-01-01
Linear and non-linear models have been extensively utilised for the estimation of net and metabolisable energy requirements and for the estimation of the efficiencies of utilising dietary energy for maintenance and tissue gain. In growing animals, biological principles imply that energy retention ra
Bayesian second law of thermodynamics.
Bartolotta, Anthony; Carroll, Sean M; Leichenauer, Stefan; Pollack, Jason
2016-08-01
We derive a generalization of the second law of thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution. By allowing an experimenter's knowledge to be updated by the measurement process, this formulation resolves a tension between the fact that the entropy of a statistical system can sometimes fluctuate downward and the information-theoretic idea that knowledge of a stochastically evolving system degrades over time. The Bayesian second law can be written as ΔH(ρ_{m},ρ)+〈Q〉_{F|m}≥0, where ΔH(ρ_{m},ρ) is the change in the cross entropy between the original phase-space probability distribution ρ and the measurement-updated distribution ρ_{m} and 〈Q〉_{F|m} is the expectation value of a generalized heat flow out of the system. We also derive refined versions of the second law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of integral fluctuation theorems. We demonstrate the formalism using simple analytical and numerical examples.
Bayesian second law of thermodynamics
Bartolotta, Anthony; Carroll, Sean M.; Leichenauer, Stefan; Pollack, Jason
2016-08-01
We derive a generalization of the second law of thermodynamics that uses Bayesian updates to explicitly incorporate the effects of a measurement of a system at some point in its evolution. By allowing an experimenter's knowledge to be updated by the measurement process, this formulation resolves a tension between the fact that the entropy of a statistical system can sometimes fluctuate downward and the information-theoretic idea that knowledge of a stochastically evolving system degrades over time. The Bayesian second law can be written as Δ H (ρm,ρ ) + F |m≥0 , where Δ H (ρm,ρ ) is the change in the cross entropy between the original phase-space probability distribution ρ and the measurement-updated distribution ρm and F |m is the expectation value of a generalized heat flow out of the system. We also derive refined versions of the second law that bound the entropy increase from below by a non-negative number, as well as Bayesian versions of integral fluctuation theorems. We demonstrate the formalism using simple analytical and numerical examples.
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of
A non-parametric approach to investigating fish population dynamics
National Research Council Canada - National Science Library
Cook, R.M; Fryer, R.J
2001-01-01
.... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
Neural network classification - A Bayesian interpretation
Wan, Eric A.
1990-01-01
The relationship between minimizing a mean squared error and finding the optimal Bayesian classifier is reviewed. This provides a theoretical interpretation for the process by which neural networks are used in classification. A number of confidence measures are proposed to evaluate the performance of the neural network classifier within a statistical framework.
Automatic Thesaurus Construction Using Bayesian Networks.
Park, Young C.; Choi, Key-Sun
1996-01-01
Discusses automatic thesaurus construction and characterizes the statistical behavior of terms by using an inference network. Highlights include low-frequency terms and data sparseness, Bayesian networks, collocation maps and term similarity, constructing a thesaurus from a collocation map, and experiments with test collections. (Author/LRW)
Adaptive bayesian analysis for binomial proportions
CSIR Research Space (South Africa)
Das, Sonali
2008-10-01
Full Text Available The authors consider the problem of statistical inference of binomial proportions for non-matched, correlated samples, under the Bayesian framework. Such inference can arise when the same group is observed at a different number of times with the aim...
Approximation for Bayesian Ability Estimation.
1987-02-18
posterior pdfs of ande are given by p(-[Y) p(F) F P((y lei’ j)P )d. SiiJ i (4) a r~d p(e Iy) - p(t0) 1 J i P(Yij ei, (5) As shown in Tsutakawa and Lin...inverse A Hessian of the log of (27) with respect to , evaulatedat a Then, under regularity conditions, the marginal posterior pdf of O is...two-way contingency tables. Journal of Educational Statistics, 11, 33-56. Lindley, D.V. (1980). Approximate Bayesian methods. Trabajos Estadistica , 31
PV power forecast using a nonparametric PV model
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
Forecasting the AC power output of a PV plant accurately is important both for plant owners and electric system operators. Two main categories of PV modeling are available: the parametric and the nonparametric. In this paper, a methodology using a nonparametric PV model is proposed, using as inputs several forecasts of meteorological variables from a Numerical Weather Forecast model, and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quant...
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Directory of Open Access Journals (Sweden)
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Dai, Wenlin
2017-09-01
Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.
Proceedings of the First Astrostatistics School: Bayesian Methods in Cosmology
Hortúa, Héctor J
2014-01-01
These are the proceedings of the First Astrostatistics School: Bayesian Methods in Cosmology, held in Bogot\\'a D.C., Colombia, June 9-13, 2014. The first astrostatistics school has been the first event in Colombia where statisticians and cosmologists from some universities in Bogot\\'a met to discuss the statistic methods applied to cosmology, especially the use of Bayesian statistics in the study of Cosmic Microwave Background (CMB), Baryonic Acoustic Oscillations (BAO), Large Scale Structure (LSS) and weak lensing.
Konstruksi Bayesian Network Dengan Algoritma Bayesian Association Rule Mining Network
Octavian
2015-01-01
Beberapa tahun terakhir, Bayesian Network telah menjadi konsep yang populer digunakan dalam berbagai bidang kehidupan seperti dalam pengambilan sebuah keputusan dan menentukan peluang suatu kejadian dapat terjadi. Sayangnya, pengkonstruksian struktur dari Bayesian Network itu sendiri bukanlah hal yang sederhana. Oleh sebab itu, penelitian ini mencoba memperkenalkan algoritma Bayesian Association Rule Mining Network untuk memudahkan kita dalam mengkonstruksi Bayesian Network berdasarkan data ...
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
Machine learning a Bayesian and optimization perspective
Theodoridis, Sergios
2015-01-01
This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches, which rely on optimization techniques, as well as Bayesian inference, which is based on a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as shor...
Directory of Open Access Journals (Sweden)
Okut Hayrettin
2011-10-01
Full Text Available Abstract Background In the study of associations between genomic data and complex phenotypes there may be relationships that are not amenable to parametric statistical modeling. Such associations have been investigated mainly using single-marker and Bayesian linear regression models that differ in their distributions, but that assume additive inheritance while ignoring interactions and non-linearity. When interactions have been included in the model, their effects have entered linearly. There is a growing interest in non-parametric methods for predicting quantitative traits based on reproducing kernel Hilbert spaces regressions on markers and radial basis functions. Artificial neural networks (ANN provide an alternative, because these act as universal approximators of complex functions and can capture non-linear relationships between predictors and responses, with the interplay among variables learned adaptively. ANNs are interesting candidates for analysis of traits affected by cryptic forms of gene action. Results We investigated various Bayesian ANN architectures using for predicting phenotypes in two data sets consisting of milk production in Jersey cows and yield of inbred lines of wheat. For the Jerseys, predictor variables were derived from pedigree and molecular marker (35,798 single nucleotide polymorphisms, SNPS information on 297 individually cows. The wheat data represented 599 lines, each genotyped with 1,279 markers. The ability of predicting fat, milk and protein yield was low when using pedigrees, but it was better when SNPs were employed, irrespective of the ANN trained. Predictive ability was even better in wheat because the trait was a mean, as opposed to an individual phenotype in cows. Non-linear neural networks outperformed a linear model in predictive ability in both data sets, but more clearly in wheat. Conclusion Results suggest that neural networks may be useful for predicting complex traits using high
Model Diagnostics for Bayesian Networks
Sinharay, Sandip
2006-01-01
Bayesian networks are frequently used in educational assessments primarily for learning about students' knowledge and skills. There is a lack of works on assessing fit of Bayesian networks. This article employs the posterior predictive model checking method, a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A…
Directory of Open Access Journals (Sweden)
Sergio A. Alvarado
2010-12-01
Full Text Available Objetivo: Evaluar la eficiencia predictiva de modelos estadísticos paramétricos y no paramétricos para predecir episodios críticos de contaminación por material particulado PM10 del día siguiente, que superen en Santiago de Chile la norma de calidad diaria. Una predicción adecuada de tales episodios permite a la autoridad decretar medidas restrictivas que aminoren la gravedad del episodio, y consecuentemente proteger la salud de la comunidad. Método: Se trabajó con las concentraciones de material particulado PM10 registradas en una estación asociada a la red de monitorización de la calidad del aire MACAM-2, considerando 152 observaciones diarias de 14 variables, y con información meteorológica registrada durante los años 2001 a 2004. Se ajustaron modelos estadísticos paramétricos Gamma usando el paquete estadístico STATA v11, y no paramétricos usando una demo del software estadístico MARS v 2.0 distribuida por Salford-Systems. Resultados: Ambos métodos de modelación presentan una alta correlación entre los valores observados y los predichos. Los modelos Gamma presentan mejores aciertos que MARS para las concentraciones de PM10 con valores Objective: To evaluate the predictive efficiency of two statistical models (one parametric and the other non-parametric to predict critical episodes of air pollution exceeding daily air quality standards in Santiago, Chile by using the next day PM10 maximum 24h value. Accurate prediction of such episodes would allow restrictive measures to be applied by health authorities to reduce their seriousness and protect the community´s health. Methods: We used the PM10 concentrations registered by a station of the Air Quality Monitoring Network (152 daily observations of 14 variables and meteorological information gathered from 2001 to 2004. To construct predictive models, we fitted a parametric Gamma model using STATA v11 software and a non-parametric MARS model by using a demo version of Salford
Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection
Dhavala, Soma S.
2010-09-01
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.
Bayesian Estimation of Thermonuclear Reaction Rates
Iliadis, Christian; Coc, Alain; Timmes, Frank; Starrfield, Sumner
2016-01-01
The problem of estimating non-resonant astrophysical S-factors and thermonuclear reaction rates, based on measured nuclear cross sections, is of major interest for nuclear energy generation, neutrino physics, and element synthesis. Many different methods have been applied in the past to this problem, all of them based on traditional statistics. Bayesian methods, on the other hand, are now in widespread use in the physical sciences. In astronomy, for example, Bayesian statistics is applied to the observation of extra-solar planets, gravitational waves, and type Ia supernovae. However, nuclear physics, in particular, has been slow to adopt Bayesian methods. We present the first astrophysical S-factors and reaction rates based on Bayesian statistics. We develop a framework that incorporates robust parameter estimation, systematic effects, and non-Gaussian uncertainties in a consistent manner. The method is applied to the d(p,$\\gamma$)$^3$He, $^3$He($^3$He,2p)$^4$He, and $^3$He($\\alpha$,$\\gamma$)$^7$Be reactions,...
Bayesian Estimation of Thermonuclear Reaction Rates
Iliadis, C.; Anderson, K. S.; Coc, A.; Timmes, F. X.; Starrfield, S.
2016-11-01
The problem of estimating non-resonant astrophysical S-factors and thermonuclear reaction rates, based on measured nuclear cross sections, is of major interest for nuclear energy generation, neutrino physics, and element synthesis. Many different methods have been applied to this problem in the past, almost all of them based on traditional statistics. Bayesian methods, on the other hand, are now in widespread use in the physical sciences. In astronomy, for example, Bayesian statistics is applied to the observation of extrasolar planets, gravitational waves, and Type Ia supernovae. However, nuclear physics, in particular, has been slow to adopt Bayesian methods. We present astrophysical S-factors and reaction rates based on Bayesian statistics. We develop a framework that incorporates robust parameter estimation, systematic effects, and non-Gaussian uncertainties in a consistent manner. The method is applied to the reactions d(p,γ)3He, 3He(3He,2p)4He, and 3He(α,γ)7Be, important for deuterium burning, solar neutrinos, and Big Bang nucleosynthesis.
Bayesian data analysis for newcomers.
Kruschke, John K; Liddell, Torrin M
2017-04-12
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
When mechanism matters: Bayesian forecasting using models of ecological diffusion
Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.
2017-01-01
Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Bayesian estimation of the discrete coefficient of determination.
Chen, Ting; Braga-Neto, Ulisses M
2016-12-01
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Energy Technology Data Exchange (ETDEWEB)
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
Poage, J. L.
1975-01-01
A sequential nonparametric pattern classification procedure is presented. The method presented is an estimated version of the Wald sequential probability ratio test (SPRT). This method utilizes density function estimates, and the density estimate used is discussed, including a proof of convergence in probability of the estimate to the true density function. The classification procedure proposed makes use of the theory of order statistics, and estimates of the probabilities of misclassification are given. The procedure was tested on discriminating between two classes of Gaussian samples and on discriminating between two kinds of electroencephalogram (EEG) responses.
Fox, G.J.A.; Berg, van den S.M.; Veldkamp, B.P.; Irwing, P.; Booth, T.; Hughes, D.
2015-01-01
In educational and psychological studies, psychometric methods are involved in the measurement of constructs, and in constructing and validating measurement instruments. Assessment results are typically used to measure student proficiency levels and test characteristics. Recently, Bayesian item resp
Fox, Gerardus J.A.; van den Berg, Stéphanie Martine; Veldkamp, Bernard P.; Irwing, P.; Booth, T.; Hughes, D.
2015-01-01
In educational and psychological studies, psychometric methods are involved in the measurement of constructs, and in constructing and validating measurement instruments. Assessment results are typically used to measure student proficiency levels and test characteristics. Recently, Bayesian item
Computationally efficient Bayesian inference for inverse problems.
Energy Technology Data Exchange (ETDEWEB)
Marzouk, Youssef M.; Najm, Habib N.; Rahn, Larry A.
2007-10-01
Bayesian statistics provides a foundation for inference from noisy and incomplete data, a natural mechanism for regularization in the form of prior information, and a quantitative assessment of uncertainty in the inferred results. Inverse problems - representing indirect estimation of model parameters, inputs, or structural components - can be fruitfully cast in this framework. Complex and computationally intensive forward models arising in physical applications, however, can render a Bayesian approach prohibitive. This difficulty is compounded by high-dimensional model spaces, as when the unknown is a spatiotemporal field. We present new algorithmic developments for Bayesian inference in this context, showing strong connections with the forward propagation of uncertainty. In particular, we introduce a stochastic spectral formulation that dramatically accelerates the Bayesian solution of inverse problems via rapid evaluation of a surrogate posterior. We also explore dimensionality reduction for the inference of spatiotemporal fields, using truncated spectral representations of Gaussian process priors. These new approaches are demonstrated on scalar transport problems arising in contaminant source inversion and in the inference of inhomogeneous material or transport properties. We also present a Bayesian framework for parameter estimation in stochastic models, where intrinsic stochasticity may be intermingled with observational noise. Evaluation of a likelihood function may not be analytically tractable in these cases, and thus several alternative Markov chain Monte Carlo (MCMC) schemes, operating on the product space of the observations and the parameters, are introduced.
Bayesian networks in educational assessment
Almond, Russell G; Steinberg, Linda S; Yan, Duanli; Williamson, David M
2015-01-01
Bayesian inference networks, a synthesis of statistics and expert systems, have advanced reasoning under uncertainty in medicine, business, and social sciences. This innovative volume is the first comprehensive treatment exploring how they can be applied to design and analyze innovative educational assessments. Part I develops Bayes nets’ foundations in assessment, statistics, and graph theory, and works through the real-time updating algorithm. Part II addresses parametric forms for use with assessment, model-checking techniques, and estimation with the EM algorithm and Markov chain Monte Carlo (MCMC). A unique feature is the volume’s grounding in Evidence-Centered Design (ECD) framework for assessment design. This “design forward” approach enables designers to take full advantage of Bayes nets’ modularity and ability to model complex evidentiary relationships that arise from performance in interactive, technology-rich assessments such as simulations. Part III describes ECD, situates Bayes nets as ...
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
Mathematical statistics and stochastic processes
Bosq, Denis
2013-01-01
Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners.Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and rob
Statistical concepts a second course
Lomax, Richard G
2012-01-01
Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Bayesian microsaccade detection
Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji
2017-01-01
Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483
Determining the Mass of Kepler-78b with Nonparametric Gaussian Process Estimation
Grunblatt, Samuel Kai; Howard, Andrew; Haywood, Raphaëlle
2016-01-01
Kepler-78b is a transiting planet that is 1.2 times the radius of Earth and orbits a young, active K dwarf every 8 hr. The mass of Kepler-78b has been independently reported by two teams based on radial velocity (RV) measurements using the HIRES and HARPS-N spectrographs. Due to the active nature of the host star, a stellar activity model is required to distinguish and isolate the planetary signal in RV data. Whereas previous studies tested parametric stellar activity models, we modeled this system using nonparametric Gaussian process (GP) regression. We produced a GP regression of relevant Kepler photometry. We then use the posterior parameter distribution for our photometric fit as a prior for our simultaneous GP + Keplerian orbit models of the RV data sets. We tested three simple kernel functions for our GP regressions. Based on a Bayesian likelihood analysis, we selected a quasi-periodic kernel model with GP hyperparameters coupled between the two RV data sets, giving a Doppler amplitude of 1.86 ± 0.25 m s-1 and supporting our belief that the correlated noise we are modeling is astrophysical. The corresponding mass of 1.87-0.26+0.27 ME is consistent with that measured in previous studies, and more robust due to our nonparametric signal estimation. Based on our mass and the radius measurement from transit photometry, Kepler-78b has a bulk density of 6.0-1.4+1.9 g cm-3. We estimate that Kepler-78b is 32% ± 26% iron using a two-component rock-iron model. This is consistent with an Earth-like composition, with uncertainty spanning Moon-like to Mercury-like compositions.
Forecasting turbulent modes with nonparametric diffusion models: Learning from noisy data
Berry, Tyrus; Harlim, John
2016-04-01
In this paper, we apply a recently developed nonparametric modeling approach, the "diffusion forecast", to predict the time-evolution of Fourier modes of turbulent dynamical systems. While the diffusion forecasting method assumes the availability of a noise-free training data set observing the full state space of the dynamics, in real applications we often have only partial observations which are corrupted by noise. To alleviate these practical issues, following the theory of embedology, the diffusion model is built using the delay-embedding coordinates of the data. We show that this delay embedding biases the geometry of the data in a way which extracts the most stable component of the dynamics and reduces the influence of independent additive observation noise. The resulting diffusion forecast model approximates the semigroup solutions of the generator of the underlying dynamics in the limit of large data and when the observation noise vanishes. As in any standard forecasting problem, the forecasting skill depends crucially on the accuracy of the initial conditions. We introduce a novel Bayesian method for filtering the discrete-time noisy observations which works with the diffusion forecast to determine the forecast initial densities. Numerically, we compare this nonparametric approach with standard stochastic parametric models on a wide-range of well-studied turbulent modes, including the Lorenz-96 model in weakly chaotic to fully turbulent regimes and the barotropic modes of a quasi-geostrophic model with baroclinic instabilities. We show that when the only available data is the low-dimensional set of noisy modes that are being modeled, the diffusion forecast is indeed competitive to the perfect model.
Recent Developments in Applied Probability and Statistics
Devroye, Luc; Kohler, Michael; Korn, Ralf
2010-01-01
This book presents surveys on recent developments in applied probability and statistics. The contributions include topics such as nonparametric regression and density estimation, option pricing, probabilistic methods for multivariate interpolation, robust graphical modelling and stochastic differential equations. Due to its broad coverage of different topics the book offers an excellent overview of recent developments in applied probability and statistics.
Microprocessors as an Adjunct to Statistics Instruction.
Miller, William G.
Examinations of costs and acquisition of facilities indicate that an Altair 8800A microcomputer with a program library of parametric, non-parametric, mathematical, and teaching programs can be used effectively for teaching college-level statistics. Statistical packages presently in use require extensive computing knowledge beyond the students' and…
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Bayesian phylogeography finds its roots.
Directory of Open Access Journals (Sweden)
Philippe Lemey
2009-09-01
Full Text Available As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...
Nonparametric estimation of a convex bathtub-shaped hazard function.
Jankowski, Hanna K; Wellner, Jon A
2009-11-01
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n(2/5) at points x(0) where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Bayesian Face Sketch Synthesis.
Wang, Nannan; Gao, Xinbo; Sun, Leiyu; Li, Jie
2017-03-01
Exemplar-based face sketch synthesis has been widely applied to both digital entertainment and law enforcement. In this paper, we propose a Bayesian framework for face sketch synthesis, which provides a systematic interpretation for understanding the common properties and intrinsic difference in different methods from the perspective of probabilistic graphical models. The proposed Bayesian framework consists of two parts: the neighbor selection model and the weight computation model. Within the proposed framework, we further propose a Bayesian face sketch synthesis method. The essential rationale behind the proposed Bayesian method is that we take the spatial neighboring constraint between adjacent image patches into consideration for both aforementioned models, while the state-of-the-art methods neglect the constraint either in the neighbor selection model or in the weight computation model. Extensive experiments on the Chinese University of Hong Kong face sketch database demonstrate that the proposed Bayesian method could achieve superior performance compared with the state-of-the-art methods in terms of both subjective perceptions and objective evaluations.
Editorial: Bayesian benefits for child psychology and psychiatry researchers.
Oldehinkel, Albertine J
2016-09-01
For many scientists, performing statistical tests has become an almost automated routine. However, p-values are frequently used and interpreted incorrectly; and even when used appropriately, p-values tend to provide answers that do not match researchers' questions and hypotheses well. Bayesian statistics present an elegant and often more suitable alternative. The Bayesian approach has rarely been applied in child psychology and psychiatry research so far, but the development of user-friendly software packages and tutorials has placed it well within reach now. Because Bayesian analyses require a more refined definition of hypothesized probabilities of possible outcomes than the classical approach, going Bayesian may offer the additional benefit of sparkling the development and refinement of theoretical models in our field.
A Gaussian Mixed Model for Learning Discrete Bayesian Networks.
Balov, Nikolay
2011-02-01
In this paper we address the problem of learning discrete Bayesian networks from noisy data. Considered is a graphical model based on mixture of Gaussian distributions with categorical mixing structure coming from a discrete Bayesian network. The network learning is formulated as a Maximum Likelihood estimation problem and performed by employing an EM algorithm. The proposed approach is relevant to a variety of statistical problems for which Bayesian network models are suitable - from simple regression analysis to learning gene/protein regulatory networks from microarray data.
Mocapy++ - a toolkit for inference and learning in dynamic Bayesian networks
DEFF Research Database (Denmark)
Paluszewski, Martin; Hamelryck, Thomas Wim
2010-01-01
Background Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations...