A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
Seldin, Yevgeny
2010-01-01
We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generaliza...
Bayesian Analysis of Multiple Populations in Galactic Globular Clusters
Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan
2016-01-01
We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.
Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H
2016-01-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...
Bayesian Nonparametric Graph Clustering
Banerjee, Sayantan; Akbani, Rehan; Baladandayuthapani, Veerabhadran
2015-01-01
We present clustering methods for multivariate data exploiting the underlying geometry of the graphical structure between variables. As opposed to standard approaches that assume known graph structures, we first estimate the edge structure of the unknown graph using Bayesian neighborhood selection approaches, wherein we account for the uncertainty of graphical structure learning through model-averaged estimates of the suitable parameters. Subsequently, we develop a nonparametric graph cluster...
Bayesian Agglomerative Clustering with Coalescents
Teh, Yee Whye; Daumé III, Hal; Roy, Daniel
2009-01-01
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.
A Nonparametric Bayesian Model for Nested Clustering.
Lee, Juhee; Müller, Peter; Zhu, Yitan; Ji, Yuan
2016-01-01
We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering. PMID:26519174
Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.
2016-07-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ˜0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.
Wagner-Kaiser, R; Robinson, E; von Hippel, T; Sarajedini, A; van Dyk, D A; Stein, N; Jefferys, W H
2016-01-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\\sim$0.05 to 0.11 for these three clusters. Model grids with solar $\\alpha$-element abundances ([$\\alpha$/Fe] =0.0) and enhanced $\\alpha$-elements ([$\\alpha$/Fe]=0.4) are adopted.
Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.
2016-07-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ∼0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.
Fowler, Anna; Menon, Vilas; Heard, Nicholas A
2013-10-01
Clusters of time series data may change location and memberships over time; in gene expression data, this occurs as groups of genes or samples respond differently to stimuli or experimental conditions at different times. In order to uncover this underlying temporal structure, we consider dynamic clusters with time-dependent parameters which split and merge over time, enabling cluster memberships to change. These interesting time-dependent structures are useful in understanding the development of organisms or complex organs, and could not be identified using traditional clustering methods. In cell cycle data, these time-dependent structure may provide links between genes and stages of the cell cycle, whilst in developmental data sets they may highlight key developmental transitions. PMID:24131050
Bayesian Decision Theoretical Framework for Clustering
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.
2016-07-01
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).
Yuan, Ying; MacKinnon, David P.
2009-01-01
This article proposes Bayesian analysis of mediation effects. Compared to conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian mediation analysis, inference is straightforward and exact, which makes it appealing for studies with small samples. Third, the Bayesian approach is conceptua...
Gelman, Andrew; Stern, Hal S; Dunson, David B; Vehtari, Aki; Rubin, Donald B
2013-01-01
FUNDAMENTALS OF BAYESIAN INFERENCEProbability and InferenceSingle-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian ApproachesHierarchical ModelsFUNDAMENTALS OF BAYESIAN DATA ANALYSISModel Checking Evaluating, Comparing, and Expanding ModelsModeling Accounting for Data Collection Decision AnalysisADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional ApproximationsREGRESSION MODELS Introduction to Regression Models Hierarchical Linear
Yuan, Ying; MacKinnon, David P.
2009-01-01
In this article, we propose Bayesian analysis of mediation effects. Compared with conventional frequentist mediation analysis, the Bayesian approach has several advantages. First, it allows researchers to incorporate prior information into the mediation analysis, thus potentially improving the efficiency of estimates. Second, under the Bayesian…
Sun, Xun; Lall, Upmanu; Merz, Bruno; Dung, Nguyen Viet
2015-08-01
Especially for extreme precipitation or floods, there is considerable spatial and temporal variability in long term trends or in the response of station time series to large-scale climate indices. Consequently, identifying trends or sensitivity of these extremes to climate parameters can be marked by high uncertainty. When one develops a nonstationary frequency analysis model, a key step is the identification of potential trends or effects of climate indices on the station series. An automatic clustering procedure that effectively pools stations where there are similar responses is desirable to reduce the estimation variance, thus improving the identification of trends or responses, and accounting for spatial dependence. This paper presents a new hierarchical Bayesian approach for exploring homogeneity of response in large area data sets, through a multicomponent mixture model. The approach allows the reduction of uncertainties through both full pooling and partial pooling of stations across automatically chosen subsets of the data. We apply the model to study the trends in annual maximum daily stream flow at 68 gauges over Germany. The effects of changing the number of clusters and the parameters used for clustering are demonstrated. The results show that there are large, mainly upward trends in the gauges of the River Rhine Basin in Western Germany and along the main stream of the Danube River in the south, while there are also some small upward trends at gauges in Central and Northern Germany.
Bayesian exploratory factor analysis
Gabriella Conti; Sylvia Frühwirth-Schnatter; James Heckman; Rémi Piatek
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identifi cation criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study c...
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study co...
Bayesian Exploratory Factor Analysis
Gabriella Conti; Sylvia Fruehwirth-Schnatter; Heckman, James J.; Remi Piatek
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on \\emph{ad hoc} classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo s...
Bayesian exploratory factor analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo st...
Bayesian exploratory factor analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James; Piatek, Rémi
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study co...
R/BHC: fast Bayesian hierarchical clustering for microarray data
Grant Murray
2009-08-01
Full Text Available Abstract Background Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. Results We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression microarray data. The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. Conclusion Biologically plausible results are presented from a well studied data set: expression profiles of A. thaliana subjected to a variety of biotic and abiotic stresses. Our method avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric.
Market Segmentation Using Bayesian Model Based Clustering
Van Hattum, P.
2009-01-01
This dissertation deals with two basic problems in marketing, that are market segmentation, which is the grouping of persons who share common aspects, and market targeting, which is focusing your marketing efforts on one or more attractive market segments. For the grouping of persons who share common aspects a Bayesian model based clustering approach is proposed such that it can be applied to data sets that are specifically used for market segmentation. The cluster algorithm can handle very l...
Bayesian Exploratory Factor Analysis
Conti, Gabriella; Frühwirth-Schnatter, Sylvia; Heckman, James J.;
2014-01-01
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the...... corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates...
Bayesian Independent Component Analysis
Winther, Ole; Petersen, Kaare Brandt
2007-01-01
In this paper we present an empirical Bayesian framework for independent component analysis. The framework provides estimates of the sources, the mixing matrix and the noise parameters, and is flexible with respect to choice of source prior and the number of sources and sensors. Inside the engine...
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Bayesian Benchmark Dose Analysis
Fang, Qijun; Piegorsch, Walter W.; Barnes, Katherine Y.
2014-01-01
An important objective in environmental risk assessment is estimation of minimum exposure levels, called Benchmark Doses (BMDs) that induce a pre-specified Benchmark Response (BMR) in a target population. Established inferential approaches for BMD analysis typically involve one-sided, frequentist confidence limits, leading in practice to what are called Benchmark Dose Lower Limits (BMDLs). Appeal to Bayesian modeling and credible limits for building BMDLs is far less developed, however. Indee...
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Bayesian analysis toolkit - BAT
Statistical treatment of data is an essential part of any data analysis and interpretation. Different statistical methods and approaches can be used, however the implementation of these approaches is complicated and at times inefficient. The Bayesian analysis toolkit (BAT) is a software package developed in C++ framework that facilitates the statistical analysis of the data using Bayesian theorem. The tool evaluates the posterior probability distributions for models and their parameters using Markov Chain Monte Carlo which in turn provide straightforward parameter estimation, limit setting and uncertainty propagation. Additional algorithms, such as simulated annealing, allow extraction of the global mode of the posterior. BAT sets a well-tested environment for flexible model definition and also includes a set of predefined models for standard statistical problems. The package is interfaced to other software packages commonly used in high energy physics, such as ROOT, Minuit, RooStats and CUBA. We present a general overview of BAT and its algorithms. A few physics examples are shown to introduce the spectrum of its applications. In addition, new developments and features are summarized.
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms. PMID:27046838
BAT - Bayesian Analysis Toolkit
One of the most vital steps in any data analysis is the statistical analysis and comparison with the prediction of a theoretical model. The many uncertainties associated with the theoretical model and the observed data require a robust statistical analysis tool. The Bayesian Analysis Toolkit (BAT) is a powerful statistical analysis software package based on Bayes' Theorem, developed to evaluate the posterior probability distribution for models and their parameters. It implements Markov Chain Monte Carlo to get the full posterior probability distribution that in turn provides a straightforward parameter estimation, limit setting and uncertainty propagation. Additional algorithms, such as Simulated Annealing, allow to evaluate the global mode of the posterior. BAT is developed in C++ and allows for a flexible definition of models. A set of predefined models covering standard statistical cases are also included in BAT. It has been interfaced to other commonly used software packages such as ROOT, Minuit, RooStats and CUBA. An overview of the software and its algorithms is provided along with several physics examples to cover a range of applications of this statistical tool. Future plans, new features and recent developments are briefly discussed.
Bayesian analysis of volcanic eruptions
Ho, Chih-Hsiang
1990-10-01
The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.
Bayesian Inference of Kinematics and Memberships of Open Cluster
Shao, Z. Y.; Chen, L.; Zhong, J.; Hou, J. L.
2014-07-01
Based on the Bayesian Inference (BI) method, the Multiple-modelling approach is improved to combine coordinative positions, proper motions (PM) and radial velocities (RV), to separate the motion of the open cluster from field stars, as well as to describe the intrinsic kinematic status of the cluster.
Bayesian Analysis of Experimental Data
Lalmohan Bhar
2013-10-01
Full Text Available Analysis of experimental data from Bayesian point of view has been considered. Appropriate methodology has been developed for application into designed experiments. Normal-Gamma distribution has been considered for prior distribution. Developed methodology has been applied to real experimental data taken from long term fertilizer experiments.
Bayesian analysis of rare events
Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.
ANALYSIS OF BAYESIAN CLASSIFIER ACCURACY
Felipe Schneider Costa
2013-01-01
Full Text Available The naÃ¯ve Bayes classifier is considered one of the most effective classification algorithms today, competing with more modern and sophisticated classifiers. Despite being based on unrealistic (naÃ¯ve assumption that all variables are independent, given the output class, the classifier provides proper results. However, depending on the scenario utilized (network structure, number of samples or training cases, number of variables, the network may not provide appropriate results. This study uses a process variable selection, using the chi-squared test to verify the existence of dependence between variables in the data model in order to identify the reasons which prevent a Bayesian network to provide good performance. A detailed analysis of the data is also proposed, unlike other existing work, as well as adjustments in case of limit values between two adjacent classes. Furthermore, variable weights are used in the calculation of a posteriori probabilities, calculated with mutual information function. Tests were applied in both a naÃ¯ve Bayesian network and a hierarchical Bayesian network. After testing, a significant reduction in error rate has been observed. The naÃ¯ve Bayesian network presented a drop in error rates from twenty five percent to five percent, considering the initial results of the classification process. In the hierarchical network, there was not only a drop in fifteen percent error rate, but also the final result came to zero.
Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K-mean method' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods
Everitt, Brian S; Leese, Morven; Stahl, Daniel
2011-01-01
Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons
Bayesian Model Averaging for Propensity Score Analysis
Kaplan, David; Chen, Jianshen
2013-01-01
The purpose of this study is to explore Bayesian model averaging in the propensity score context. Previous research on Bayesian propensity score analysis does not take into account model uncertainty. In this regard, an internally consistent Bayesian framework for model building and estimation must also account for model uncertainty. The…
Bayesian Group Factor Analysis
Virtanen, Seppo; Klami, Arto; Khan, Suleiman A; Kaski, Samuel
2011-01-01
We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by co-occurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming group-wise sparse factors, active in a subset of the sets, the variat...
Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;
2012-01-01
) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...
Advances in Bayesian Model Based Clustering Using Particle Learning
Merl, D M
2009-11-19
Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original
Bayesian networks with applications in reliability analysis
Langseth, Helge
2002-01-01
A common goal of the papers in this thesis is to propose, formalize and exemplify the use of Bayesian networks as a modelling tool in reliability analysis. The papers span work in which Bayesian networks are merely used as a modelling tool (Paper I), work where models are specially designed to utilize the inference algorithms of Bayesian networks (Paper II and Paper III), and work where the focus has been on extending the applicability of Bayesian networks to very large domains (Paper IV and ...
Bayesian Statistics for Biological Data: Pedigree Analysis
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Bayesian analysis of exoplanet and binary orbits
Schulze-Hartung, Tim; Henning, Thomas
2012-01-01
We introduce BASE (Bayesian astrometric and spectroscopic exoplanet detection and characterisation tool), a novel program for the combined or separate Bayesian analysis of astrometric and radial-velocity measurements of potential exoplanet hosts and binary stars. The capabilities of BASE are demonstrated using all publicly available data of the binary Mizar A.
Bayesian inference of mass segregation of open clusters
Shao, Zhengyi; Chen, Li; Lin, Chien-Cheng; Zhong, Jing; Hou, Jinliang
2015-08-01
Based on the Bayesian inference (BI) method, the mixture-modeling approach is improved to combine all kinematic data, including the coordinative position, proper motion (PM) and radial velocity (RV), to separate the motion of the cluster from field stars in its area, as well as to describe the intrinsic kinematic status. Meanwhile, the membership probabilities of individual stars are determined as by product results. This method has been testified by simulation of toy models and it was found that the joint usage of multiple kinematic data can significantly reduce the missing rate of membership determination, say from ~15% for single data type to 1% for using all position, proper motion and radial velocity data.By combining kinematic data from multiple sources of photometric and redshift surveys, such as WIYN and APOGEE, M67 and NGC188 are revisited. Mass segregation is identified clearly for both of these two old open clusters, either in position or in PM spaces, since the Bayesian evidence (BE) of the model, which includes the segregation parameters, is much larger than that without it. The ongoing work is applying this method to the LAMOST released data which contains a large amount of RVs cover ~200 nearby open clusters. If the coming GAIA data can be used, the accuracy of tangential velocity will be largely improved and the intrinsic kinematics of open clusters can be well investigated, though they are usually less than 1 km/s.
Detecting Galaxy Clusters in the DLS and CARS: a Bayesian Cluster Finder
Ascaso, Begoña; Benítez, Narciso
2010-01-01
The detection of galaxy clusters in present and future surveys enables measuring mass-to-light ratios, clustering properties or galaxy cluster abundances and therefore, constraining cosmological parameters. We present a new technique for detecting galaxy clusters, which is based on the Matched Filter Algorithm from a Bayesian point of view. The method is able to determine the position, redshift and richness of the cluster through the maximization of a filter depending on galaxy luminosity, density and photometric redshift combined with a galaxy cluster prior. We tested the algorithm through realistic mock galaxy catalogs, revealing that the detections are 100% complete and 80% pure for clusters up to z 25 (Abell Richness > 0). We applied the algorithm to the CFHTLS Archive Research Survey (CARS) data, recovering similar detections as previously published using the same data plus additional clusters that are very probably real. We also applied this algorithm to the Deep Lens Survey (DLS), obtaining the first ...
An agglomerative hierarchical approach to visualization in Bayesian clustering problems.
Dawson, K J; Belkhir, K
2009-07-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals--the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
Low-Complexity Bayesian Estimation of Cluster-Sparse Channels
Ballal, Tarig
2015-09-18
This paper addresses the problem of channel impulse response estimation for cluster-sparse channels under the Bayesian estimation framework. We develop a novel low-complexity minimum mean squared error (MMSE) estimator by exploiting the sparsity of the received signal profile and the structure of the measurement matrix. It is shown that due to the banded Toeplitz/circulant structure of the measurement matrix, a channel impulse response, such as underwater acoustic channel impulse responses, can be partitioned into a number of orthogonal or approximately orthogonal clusters. The orthogonal clusters, the sparsity of the channel impulse response and the structure of the measurement matrix, all combined, result in a computationally superior realization of the MMSE channel estimator. The MMSE estimator calculations boil down to simpler in-cluster calculations that can be reused in different clusters. The reduction in computational complexity allows for a more accurate implementation of the MMSE estimator. The proposed approach is tested using synthetic Gaussian channels, as well as simulated underwater acoustic channels. Symbol-error-rate performance and computation time confirm the superiority of the proposed method compared to selected benchmark methods in systems with preamble-based training signals transmitted over clustersparse channels.
Bayesian Analysis of Multivariate Probit Models
Siddhartha Chib; Edward Greenberg
1996-01-01
This paper provides a unified simulation-based Bayesian and non-Bayesian analysis of correlated binary data using the multivariate probit model. The posterior distribution is simulated by Markov chain Monte Carlo methods, and maximum likelihood estimates are obtained by a Markov chain Monte Carlo version of the E-M algorithm. Computation of Bayes factors from the simulation output is also considered. The methods are applied to a bivariate data set, to a 534-subject, four-year longitudinal dat...
Subjective Bayesian Analysis: Principles and Practice
Goldstein, Michael
2006-01-01
We address the position of subjectivism within Bayesian statistics. We argue, first, that the subjectivist Bayes approach is the only feasible method for tackling many important practical problems. Second, we describe the essential role of the subjectivist approach in scientific analysis. Third, we consider possible modifications to the Bayesian approach from a subjectivist viewpoint. Finally, we address the issue of pragmatism in implementing the subjectivist approach.
Clustering and Bayesian network for image of faces classification
Jayech, Khlifia
2012-01-01
In a content based image classification system, target images are sorted by feature similarities with respect to the query (CBIR). In this paper, we propose to use new approach combining distance tangent, k-means algorithm and Bayesian network for image classification. First, we use the technique of tangent distance to calculate several tangent spaces representing the same image. The objective is to reduce the error in the classification phase. Second, we cut the image in a whole of blocks. For each block, we compute a vector of descriptors. Then, we use K-means to cluster the low-level features including color and texture information to build a vector of labels for each image. Finally, we apply five variants of Bayesian networks classifiers (Na\\"ive Bayes, Global Tree Augmented Na\\"ive Bayes (GTAN), Global Forest Augmented Na\\"ive Bayes (GFAN), Tree Augmented Na\\"ive Bayes for each class (TAN), and Forest Augmented Na\\"ive Bayes for each class (FAN) to classify the image of faces using the vector of labels. ...
Bayesian analysis of contingency tables
Gómez Villegas, Miguel A.; González Pérez, Beatriz
2005-01-01
The display of the data by means of contingency tables is used in different approaches to statistical inference, for example, to broach the test of homogeneity of independent multinomial distributions. We develop a Bayesian procedure to test simple null hypotheses versus bilateral alternatives in contingency tables. Given independent samples of two binomial distributions and taking a mixed prior distribution, we calculate the posterior probability that the proportion of successes in the first...
On Bayesian System Reliability Analysis
The view taken in this thesis is that reliability, the probability that a system will perform a required function for a stated period of time, depends on a person's state of knowledge. Reliability changes as this state of knowledge changes, i.e. when new relevant information becomes available. Most existing models for system reliability prediction are developed in a classical framework of probability theory and they overlook some information that is always present. Probability is just an analytical tool to handle uncertainty, based on judgement and subjective opinions. It is argued that the Bayesian approach gives a much more comprehensive understanding of the foundations of probability than the so called frequentistic school. A new model for system reliability prediction is given in two papers. The model encloses the fact that component failures are dependent because of a shared operational environment. The suggested model also naturally permits learning from failure data of similar components in non identical environments. 85 refs
Feroz, F.; Hobson, M. P.; Zwart, J T L; Saunders, R. D. E.; Grainge, K. J. B.
2008-01-01
We present a Bayesian approach to modelling galaxy clusters using multi-frequency pointed observations from telescopes that exploit the Sunyaev--Zel'dovich effect. We use the recently developed MultiNest technique (Feroz, Hobson & Bridges, 2008) to explore the high-dimensional parameter spaces and also to calculate the Bayesian evidence. This permits robust parameter estimation as well as model comparison. Tests on simulated Arcminute Microkelvin Imager observations of a cluster, in the prese...
Bayesian analysis of cosmic structures
Kitaura, Francisco-Shu
2011-01-01
We revise the Bayesian inference steps required to analyse the cosmological large-scale structure. Here we make special emphasis in the complications which arise due to the non-Gaussian character of the galaxy and matter distribution. In particular we investigate the advantages and limitations of the Poisson-lognormal model and discuss how to extend this work. With the lognormal prior using the Hamiltonian sampling technique and on scales of about 4 h^{-1} Mpc we find that the over-dense regions are excellent reconstructed, however, under-dense regions (void statistics) are quantitatively poorly recovered. Contrary to the maximum a posteriori (MAP) solution which was shown to over-estimate the density in the under-dense regions we obtain lower densities than in N-body simulations. This is due to the fact that the MAP solution is conservative whereas the full posterior yields samples which are consistent with the prior statistics. The lognormal prior is not able to capture the full non-linear regime at scales ...
BEAST: Bayesian evolutionary analysis by sampling trees
Drummond Alexei J; Rambaut Andrew
2007-01-01
Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based m...
BEAST: Bayesian evolutionary analysis by sampling trees
Drummond, Alexei J.; Rambaut, Andrew
2007-01-01
Background: The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models su...
A Gentle Introduction to Bayesian Analysis : Applications to Developmental Research
Van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A G
2014-01-01
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, t
A SAS Interface for Bayesian Analysis with WinBUGS
Zhang, Zhiyong; McArdle, John J.; Wang, Lijuan; Hamagami, Fumiaki
2008-01-01
Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model…
Bayesian analysis of matrix data with rstiefel
Hoff, Peter D.
2013-01-01
We illustrate the use of the R-package "rstiefel" for matrix-variate data analysis in the context of two examples. The first example considers estimation of a reduced-rank mean matrix in the presence of normally distributed noise. The second example considers the modeling of a social network of friendships among teenagers. Bayesian estimation for these models requires the ability to simulate from the matrix-variate von Mises-Fisher distributions and the matrix-variate Bingham distributions on...
Cluster analysis for applications
Anderberg, Michael R
1973-01-01
Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o
Book review: Bayesian analysis for population ecology
Link, William A.
2011-01-01
Brian Dennis described the field of ecology as “fertile, uncolonized ground for Bayesian ideas.” He continued: “The Bayesian propagule has arrived at the shore. Ecologists need to think long and hard about the consequences of a Bayesian ecology. The Bayesian outlook is a successful competitor, but is it a weed? I think so.” (Dennis 2004)
Bayesian Analysis of Individual Level Personality Dynamics
Cripps, Edward; Wood, Robert E.; Beckmann, Nadin; Lau, John; Beckmann, Jens F.; Cripps, Sally Ann
2016-01-01
A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques. PMID:27486415
Bayesian Analysis of Individual Level Personality Dynamics.
Cripps, Edward; Wood, Robert E; Beckmann, Nadin; Lau, John; Beckmann, Jens F; Cripps, Sally Ann
2016-01-01
A Bayesian technique with analyses of within-person processes at the level of the individual is presented. The approach is used to examine whether the patterns of within-person responses on a 12-trial simulation task are consistent with the predictions of ITA theory (Dweck, 1999). ITA theory states that the performance of an individual with an entity theory of ability is more likely to spiral down following a failure experience than the performance of an individual with an incremental theory of ability. This is because entity theorists interpret failure experiences as evidence of a lack of ability which they believe is largely innate and therefore relatively fixed; whilst incremental theorists believe in the malleability of abilities and interpret failure experiences as evidence of more controllable factors such as poor strategy or lack of effort. The results of our analyses support ITA theory at both the within- and between-person levels of analyses and demonstrate the benefits of Bayesian techniques for the analysis of within-person processes. These include more formal specification of the theory and the ability to draw inferences about each individual, which allows for more nuanced interpretations of individuals within a personality category, such as differences in the individual probabilities of spiraling. While Bayesian techniques have many potential advantages for the analyses of processes at the level of the individual, ease of use is not one of them for psychologists trained in traditional frequentist statistical techniques. PMID:27486415
Guillaume Marrelec
Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
Bayesian analysis for extreme climatic events: A review
Chu, Pao-Shin; Zhao, Xin
2011-11-01
This article reviews Bayesian analysis methods applied to extreme climatic data. We particularly focus on applications to three different problems related to extreme climatic events including detection of abrupt regime shifts, clustering tropical cyclone tracks, and statistical forecasting for seasonal tropical cyclone activity. For identifying potential change points in an extreme event count series, a hierarchical Bayesian framework involving three layers - data, parameter, and hypothesis - is formulated to demonstrate the posterior probability of the shifts throughout the time. For the data layer, a Poisson process with a gamma distributed rate is presumed. For the hypothesis layer, multiple candidate hypotheses with different change-points are considered. To calculate the posterior probability for each hypothesis and its associated parameters we developed an exact analytical formula, a Markov Chain Monte Carlo (MCMC) algorithm, and a more sophisticated reversible jump Markov Chain Monte Carlo (RJMCMC) algorithm. The algorithms are applied to several rare event series: the annual tropical cyclone or typhoon counts over the central, eastern, and western North Pacific; the annual extremely heavy rainfall event counts at Manoa, Hawaii; and the annual heat wave frequency in France. Using an Expectation-Maximization (EM) algorithm, a Bayesian clustering method built on a mixture Gaussian model is applied to objectively classify historical, spaghetti-like tropical cyclone tracks (1945-2007) over the western North Pacific and the South China Sea into eight distinct track types. A regression based approach to forecasting seasonal tropical cyclone frequency in a region is developed. Specifically, by adopting large-scale environmental conditions prior to the tropical cyclone season, a Poisson regression model is built for predicting seasonal tropical cyclone counts, and a probit regression model is alternatively developed toward a binary classification problem. With a non
Bayesian Analysis of Type Ia Supernova Data
王晓峰; 周旭; 李宗伟; 陈黎
2003-01-01
Recently, the distances to type Ia supernova (SN Ia) at z ～ 0.5 have been measured with the motivation of estimating cosmological parameters. However, different sleuthing techniques tend to give inconsistent measurements for SN Ia distances (～0.3 mag), which significantly affects the determination of cosmological parameters.A Bayesian "hyper-parameter" procedure is used to analyse jointly the current SN Ia data, which considers the relative weights of different datasets. For a flat Universe, the combining analysis yields ΩM = 0.20 ± 0.07.
Current trends in Bayesian methodology with applications
Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia
2015-01-01
Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on
Doing bayesian data analysis a tutorial with R and BUGS
Kruschke, John K
2011-01-01
There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis, A Tutorial Introduction with R and BUGS provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. The text delivers comprehensive coverage of all
ASteCA - Automated Stellar Cluster Analysis
Perren, Gabriel I; Piatti, Andrés E
2014-01-01
We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...
The Application of Bayesian Analysis to Issues in Developmental Research
Walker, Lawrence J.; Gustafson, Paul; Frimer, Jeremy A.
2007-01-01
This article reviews the concepts and methods of Bayesian statistical analysis, which can offer innovative and powerful solutions to some challenging analytical problems that characterize developmental research. In this article, we demonstrate the utility of Bayesian analysis, explain its unique adeptness in some circumstances, address some…
Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering.
Knowles, David A; Ghahramani, Zoubin
2015-02-01
In this paper we introduce the Pitman Yor Diffusion Tree (PYDT), a Bayesian non-parametric prior over tree structures which generalises the Dirichlet Diffusion Tree [30] and removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model including showing its construction as the continuum limit of a nested Chinese restaurant process model. We then present two alternative MCMC samplers which allow us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary. PMID:26353241
BAT-The Bayesian Analysis Toolkit
The main goals of data analysis are to infer the free parameters of models from data, to draw conclusions on the models' validity, and to compare their predictions allowing to select the most appropriate model. The Bayesian Analysis Toolkit, BAT, is a tool developed to evaluate the posterior probability distribution for models and their parameters. It is centered around Bayes' Theorem and is realized with the use of Markov Chain Monte Carlo giving access to the full posterior probability distribution. This enables straightforward parameter estimation, limit setting and uncertainty propagation. Additional algorithms, such as Simulated Annealing, allow to evaluate the global mode of the posterior. BAT is implemented in C++ and allows for a flexible definition of models. It is interfaced to software packages commonly used in high-energy physics: ROOT, Minuit, RooStats and CUBA. A set of predefined models exists to cover standard statistical problems.
Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.
2011-01-01
Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.
Confirmation via Analogue Simulation: A Bayesian Analysis
Dardashti, Radin; Thebault, Karim P Y; Winsberg, Eric
2016-01-01
Analogue simulation is a novel mode of scientific inference found increasingly within modern physics, and yet all but neglected in the philosophical literature. Experiments conducted upon a table-top 'source system' are taken to provide insight into features of an inaccessible 'target system', based upon a syntactic isomorphism between the relevant modelling frameworks. An important example is the use of acoustic 'dumb hole' systems to simulate gravitational black holes. In a recent paper it was argued that there exists circumstances in which confirmation via analogue simulation can obtain; in particular when the robustness of the isomorphism is established via universality arguments. The current paper supports these claims via an analysis in terms of Bayesian confirmation theory.
BEAST: Bayesian evolutionary analysis by sampling trees
Drummond Alexei J
2007-11-01
Full Text Available Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.
Bayesian data analysis in population ecology: motivations, methods, and benefits
Dorazio, Robert
2016-01-01
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.
Ockham's razor and Bayesian analysis. [statistical theory for systems evaluation
Jefferys, William H.; Berger, James O.
1992-01-01
'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference; Bayesian analysis can, moreover, clarify the nature of the 'simplest' hypothesis consistent with the given data. By choosing the prior probabilities of hypotheses, it becomes possible to quantify the scientific judgment that simpler hypotheses are more likely to be correct. Bayesian analysis also shows that a hypothesis with fewer adjustable parameters intrinsically possesses an enhanced posterior probability, due to the clarity of its predictions.
Bayesian analysis for EMP damaged function based on Weibull distribution
Weibull distribution is one of the most commonly used statistical distribution in EMP vulnerability analysis. In the paper, the EMP damage function based on Weibull distribution of solid state relays was solved by bayesian computation using gibbs sampling algorithm. (authors)
Euler, Christoph
2015-01-01
Using Monte Carlo simulations of globular clusters we developed a method separating metallicity effects from age effects on observed integrated ugriz colors. We demonstrate that these colors do not evolve with time significantly after an age of 4 Gyr and use Bayesian statistics to calculate a probability distribution function of the metallicity. We tested the method using the M31 globular cluster system and then applied to explain the observed color bimodality in globular cluster sets and tidal effects on it. We show that the color bimodality is an effect of a nonlinearity in the color-metallicity relation caused by stellar dynamics on the Giant Branch, that colors including only the UV show a weaker bimodality than those subtracting from visual bands and that cluster sets with a distinct bimodality are in principle older than those with only a weak bimodal distribution. Furthermore a bimodal color distribution of coeval clusters implies a bimodal metallicity distribution, but a unimodal color distribution do...
Analysis of KATRIN data using Bayesian inference
Riis, Anna Sejersen; Weinheimer, Christian
2011-01-01
The KATRIN (KArlsruhe TRItium Neutrino) experiment will be analyzing the tritium beta-spectrum to determine the mass of the neutrino with a sensitivity of 0.2 eV (90% C.L.). This approach to a measurement of the absolute value of the neutrino mass relies only on the principle of energy conservation and can in some sense be called model-independent as compared to cosmology and neutrino-less double beta decay. However by model independent we only mean in case of the minimal extension of the standard model. One should therefore also analyse the data for non-standard couplings to e.g. righthanded or sterile neutrinos. As an alternative to the frequentist minimization methods used in the analysis of the earlier experiments in Mainz and Troitsk we have been investigating Markov Chain Monte Carlo (MCMC) methods which are very well suited for probing multi-parameter spaces. We found that implementing the KATRIN chi squared function in the COSMOMC package - an MCMC code using Bayesian parameter inference - solved the ...
Objective Bayesian Analysis of Skew- t Distributions
BRANCO, MARCIA D'ELIA
2012-02-27
We study the Jeffreys prior and its properties for the shape parameter of univariate skew-t distributions with linear and nonlinear Student\\'s t skewing functions. In both cases, we show that the resulting priors for the shape parameter are symmetric around zero and proper. Moreover, we propose a Student\\'s t approximation of the Jeffreys prior that makes an objective Bayesian analysis easy to perform. We carry out a Monte Carlo simulation study that demonstrates an overall better behaviour of the maximum a posteriori estimator compared with the maximum likelihood estimator. We also compare the frequentist coverage of the credible intervals based on the Jeffreys prior and its approximation and show that they are similar. We further discuss location-scale models under scale mixtures of skew-normal distributions and show some conditions for the existence of the posterior distribution and its moments. Finally, we present three numerical examples to illustrate the implications of our results on inference for skew-t distributions. © 2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Bayesian Analysis of Multiple Populations I: Statistical and Computational Methods
Stenning, D C; Robinson, E; van Dyk, D A; von Hippel, T; Sarajedini, A; Stein, N
2016-01-01
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and al...
Bayesian analysis of Markov point processes
Berthelsen, Kasper Klitgaard; Møller, Jesper
2006-01-01
Recently Møller, Pettitt, Berthelsen and Reeves introduced a new MCMC methodology for drawing samples from a posterior distribution when the likelihood function is only specified up to a normalising constant. We illustrate the method in the setting of Bayesian inference for Markov point processes...
Medical decision making tools: Bayesian analysis and ROC analysis
During the diagnostic process of the various oral and maxillofacial lesions, we should consider the following: 'When should we order diagnostic tests? What tests should be ordered? How should we interpret the results clinically? And how should we use this frequently imperfect information to make optimal medical decision?' For the clinicians to make proper judgement, several decision making tools are suggested. This article discusses the concept of the diagnostic accuracy (sensitivity and specificity values) with several decision making tools such as decision matrix, ROC analysis and Bayesian analysis. The article also explain the introductory concept of ORAD program
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Seldin, Yevgeny; Shawe-Taylor, John; Peters, Jan; Auer, Peter
2011-01-01
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many ...
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781
MATHEMATICAL RISK ANALYSIS: VIA NICHOLAS RISK MODEL AND BAYESIAN ANALYSIS
Anass BAYAGA
2010-07-01
Full Text Available The objective of this second part of a two-phased study was to explorethe predictive power of quantitative risk analysis (QRA method andprocess within Higher Education Institution (HEI. The method and process investigated the use impact analysis via Nicholas risk model and Bayesian analysis, with a sample of hundred (100 risk analysts in a historically black South African University in the greater Eastern Cape Province.The first findings supported and confirmed previous literature (KingIII report, 2009: Nicholas and Steyn, 2008: Stoney, 2007: COSA, 2004 that there was a direct relationship between risk factor, its likelihood and impact, certiris paribus. The second finding in relation to either controlling the likelihood or the impact of occurrence of risk (Nicholas risk model was that to have a brighter risk reward, it was important to control the likelihood ofoccurrence of risks as compared with its impact so to have a direct effect on entire University. On the Bayesian analysis, thus third finding, the impact of risk should be predicted along three aspects. These aspects included the human impact (decisions made, the property impact (students and infrastructural based and the business impact. Lastly, the study revealed that although in most business cases, where as business cycles considerably vary dependingon the industry and or the institution, this study revealed that, most impacts in HEI (University was within the period of one academic.The recommendation was that application of quantitative risk analysisshould be related to current legislative framework that affects HEI.
Baltic sea algae analysis using Bayesian spatial statistics methods
Eglė Baltmiškytė
2013-03-01
Full Text Available Spatial statistics is one of the fields in statistics dealing with spatialy spread data analysis. Recently, Bayes methods are often applied for data statistical analysis. A spatial data model for predicting algae quantity in the Baltic Sea is made and described in this article. Black Carrageen is a dependent variable and depth, sand, pebble, boulders are independent variables in the described model. Two models with different covariation functions (Gaussian and exponential are built to estimate the best model fitting for algae quantity prediction. Unknown model parameters are estimated and Bayesian kriging prediction posterior distribution is computed in OpenBUGS modeling environment by using Bayesian spatial statistics methods.
Analysis of Gumbel Model for Software Reliability Using Bayesian Paradigm
Raj Kumar
2012-12-01
Full Text Available In this paper, we have illustrated the suitability of Gumbel Model for software reliability data. The model parameters are estimated using likelihood based inferential procedure: classical as well as Bayesian. The quasi Newton-Raphson algorithm is applied to obtain the maximum likelihood estimates and associated probability intervals. The Bayesian estimates of the parameters of Gumbel model are obtained using Markov Chain Monte Carlo(MCMC simulation method in OpenBUGS(established software for Bayesian analysis using Markov Chain Monte Carlo methods. The R functions are developed to study the statistical properties, model validation and comparison tools of the model and the output analysis of MCMC samples generated from OpenBUGS. Details of applying MCMC to parameter estimation for the Gumbel model are elaborated and a real software reliability data set is considered to illustrate the methods of inference discussed in this paper.
Nested sampling applied in Bayesian room-acoustics decay analysis.
Jasa, Tomislav; Xiang, Ning
2012-11-01
Room-acoustic energy decays often exhibit single-rate or multiple-rate characteristics in a wide variety of rooms/halls. Both the energy decay order and decay parameter estimation are of practical significance in architectural acoustics applications, representing two different levels of Bayesian probabilistic inference. This paper discusses a model-based sound energy decay analysis within a Bayesian framework utilizing the nested sampling algorithm. The nested sampling algorithm is specifically developed to evaluate the Bayesian evidence required for determining the energy decay order with decay parameter estimates as a secondary result. Taking the energy decay analysis in architectural acoustics as an example, this paper demonstrates that two different levels of inference, decay model-selection and decay parameter estimation, can be cohesively accomplished by the nested sampling algorithm. PMID:23145609
Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation
Srivastava, Akash; Zou, James; Adams, Ryan P.; Sutton, Charles
2016-01-01
A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when they see one. We present a new approach to interactive clustering for data exploration called TINDER, based on a particularly simple feedback mechanism, in which an analyst can reject a given clusteri...
Uncertainties in ozone concentrations predicted with a Lagrangian photochemical air quality model have been estimated using Bayesian Monte Carlo (BMC) analysis. Bayesian Monte Carlo analysis provides a means of combining subjective "prior" uncertainty estimates developed ...
Phycas: software for Bayesian phylogenetic analysis.
Lewis, Paul O; Holder, Mark T; Swofford, David L
2015-05-01
Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length. PMID:25577605
Research & development and growth: A Bayesian model averaging analysis
Horváth, Roman
2011-01-01
Roč. 28, č. 6 (2011), s. 2669-2673. ISSN 0264-9993. [Society for Non-linear Dynamics and Econometrics Annual Conferencen. Washington DC, 16.03.2011-18.03.2011] R&D Projects: GA ČR GA402/09/0965 Institutional research plan: CEZ:AV0Z10750506 Keywords : Research and development * Growth * Bayesian model averaging Subject RIV: AH - Economics Impact factor: 0.701, year: 2011 http://library.utia.cas.cz/separaty/2011/E/horvath-research & development and growth a bayesian model averaging analysis.pdf
On Bayesian analysis of on-off measurements
Nosek, Dalibor
2016-01-01
We propose an analytical solution to the on-off problem within the framework of Bayesian statistics. Both the statistical significance for the discovery of new phenomena and credible intervals on model parameters are presented in a consistent way. We use a large enough family of prior distributions of relevant parameters. The proposed analysis is designed to provide Bayesian solutions that can be used for any number of observed on-off events, including zero. The procedure is checked using Monte Carlo simulations. The usefulness of the method is demonstrated on examples from gamma-ray astronomy.
Integrating cluster formation and cluster evaluation in interactive visual analysis
Turkay, C.; Parulek, J.; Reuter, N.; Hauser, H.
2011-01-01
Cluster analysis is a popular method for data investigation where data items are structured into groups called clusters. This analysis involves two sequential steps, namely cluster formation and cluster evaluation. In this paper, we propose the tight integration of cluster formation and cluster evaluation in interactive visual analysis in order to overcome the challenges that relate to the black-box nature of clustering algorithms. We present our conceptual framework in the form of an interac...
Bayesian analysis of multimodal data and brain imaging
Assadi, Amir H.; Eghbalnia, Hamid; Backonja, Miroslav; Wakai, Ronald T.; Rutecki, Paul; Haughton, Victor
2000-06-01
It is often the case that information about a process can be obtained using a variety of methods. Each method is employed because of specific advantages over the competing alternatives. An example in medical neuro-imaging is the choice between fMRI and MEG modes where fMRI can provide high spatial resolution in comparison to the superior temporal resolution of MEG. The combination of data from varying modes provides the opportunity to infer results that may not be possible by means of any one mode alone. We discuss a Bayesian and learning theoretic framework for enhanced feature extraction that is particularly suited to multi-modal investigations of massive data sets from multiple experiments. In the following Bayesian approach, acquired knowledge (information) regarding various aspects of the process are all directly incorporated into the formulation. This information can come from a variety of sources. In our case, it represents statistical information obtained from other modes of data collection. The information is used to train a learning machine to estimate a probability distribution, which is used in turn by a second machine as a prior, in order to produce a more refined estimation of the distribution of events. The computational demand of the algorithm is handled by proposing a distributed parallel implementation on a cluster of workstations that can be scaled to address real-time needs if required. We provide a simulation of these methods on a set of synthetically generated MEG and EEG data. We show how spatial and temporal resolutions improve by using prior distributions. The method on fMRI signals permits one to construct the probability distribution of the non-linear hemodynamics of the human brain (real data). These computational results are in agreement with biologically based measurements of other labs, as reported to us by researchers from UK. We also provide preliminary analysis involving multi-electrode cortical recording that accompanies
Bayesian clustering of fuzzy feature vectors using a quasi-likelihood approach.
Marttinen, Pekka; Tang, Jing; De Baets, Bernard; Dawyndt, Peter; Corander, Jukka
2009-01-01
Bayesian model-based classifiers, both unsupervised and supervised, have been studied extensively and their value and versatility have been demonstrated on a wide spectrum of applications within science and engineering. A majority of the classifiers are built on the assumption of intrinsic discreteness of the considered data features or on the discretization of them prior to the modeling. On the other hand, Gaussian mixture classifiers have also been utilized to a large extent for continuous features in the Bayesian framework. Often the primary reason for discretization in the classification context is the simplification of the analytical and numerical properties of the models. However, the discretization can be problematic due to its \\textit{ad hoc} nature and the decreased statistical power to detect the correct classes in the resulting procedure. We introduce an unsupervised classification approach for fuzzy feature vectors that utilizes a discrete model structure while preserving the continuous characteristics of data. This is achieved by replacing the ordinary likelihood by a binomial quasi-likelihood to yield an analytical expression for the posterior probability of a given clustering solution. The resulting model can be justified from an information-theoretic perspective. Our method is shown to yield highly accurate clusterings for challenging synthetic and empirical data sets. PMID:19029547
Clustering analysis using Swarm Intelligence
Farmani, Mohammad Reza
2016-01-01
This thesis is concerned with the application of the swarm intelligence methods in clustering analysis of datasets. The main objectives of the thesis are ∙ Take the advantage of a novel evolutionary algorithm, called artificial bee colony, to improve the capability of K-means in finding global optimum clusters in nonlinear partitional clustering problems. ∙ Consider partitional clustering as an optimization problem and an improved antbased algorithm, named Opposition-Based A...
Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina
2014-07-15
In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. PMID:24650600
A Bayesian Predictive Discriminant Analysis with Screened Data
Hea-Jung Kim
2015-09-01
Full Text Available In the application of discriminant analysis, a situation sometimes arises where individual measurements are screened by a multidimensional screening scheme. For this situation, a discriminant analysis with screened populations is considered from a Bayesian viewpoint, and an optimal predictive rule for the analysis is proposed. In order to establish a flexible method to incorporate the prior information of the screening mechanism, we propose a hierarchical screened scale mixture of normal (HSSMN model, which makes provision for flexible modeling of the screened observations. An Markov chain Monte Carlo (MCMC method using the Gibbs sampler and the Metropolis–Hastings algorithm within the Gibbs sampler is used to perform a Bayesian inference on the HSSMN models and to approximate the optimal predictive rule. A simulation study is given to demonstrate the performance of the proposed predictive discrimination procedure.
Ildikó Ungvári; Gábor Hullám; Péter Antal; Petra Sz Kiszel; András Gézsi; Éva Hadadi; Viktor Virág; Gergely Hajós; András Millinghoffer; Adrienne Nagy; András Kiss; Semsei, Ágnes F.; Gergely Temesi; Béla Melegh; Péter Kisfali
2012-01-01
Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls). The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA). Th...
Bayesian Investigation of Isochrone Consistency Using the Old Open Cluster NGC 188
Hills, Shane; Courteau, Stephane; Geller, Aaron M
2015-01-01
This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color-magnitude diagrams depending on the filters and models used. We examine the consistency and reliability of fitting three widely-used stellar evolution models to fifteen combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of the theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to the present study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities which enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically-favored three band combinations of UBV and VRI can be meaningfully inconsistent with each other and with lo...
Fang, Jun; Zhang, Lizao; Duan, Huiping; Huang, Lei; Li, Hongbin
2016-05-01
The application of sparse representation to SAR/ISAR imaging has attracted much attention over the past few years. This new class of sparse representation based imaging methods present a number of unique advantages over conventional range-Doppler methods, the basic idea behind these works is to formulate SAR/ISAR imaging as a sparse signal recovery problem. In this paper, we propose a new two-dimensional pattern-coupled sparse Bayesian learning(SBL) method to capture the underlying cluster patterns of the ISAR target images. Based on this model, an expectation-maximization (EM) algorithm is developed to infer the maximum a posterior (MAP) estimate of the hyperparameters, along with the posterior distribution of the sparse signal. Experimental results demonstrate that the proposed method is able to achieve a substantial performance improvement over existing algorithms, including the conventional SBL method.
A Clustering Method of Highly Dimensional Patent Data Using Bayesian Approach
Sunghae Jun
2012-01-01
Patent data have diversely technological information of any technology field. So, many companies have managed the patent data to build their RD policy. Patent analysis is an approach to the patent management. Also, patent analysis is an important tool for technology forecasting. Patent clustering is one of the works for patent analysis. In this paper, we propose an efficient clustering method of patent documents. Generally, patent data are consisted of text document. The patent documents have...
Bayesian Variable Selection in Cost-Effectiveness Analysis
Miguel A. Negrín
2010-04-01
Full Text Available Linear regression models are often used to represent the cost and effectiveness of medical treatment. The covariates used may include sociodemographic variables, such as age, gender or race; clinical variables, such as initial health status, years of treatment or the existence of concomitant illnesses; and a binary variable indicating the treatment received. However, most studies estimate only one model, which usually includes all the covariates. This procedure ignores the question of uncertainty in model selection. In this paper, we examine four alternative Bayesian variable selection methods that have been proposed. In this analysis, we estimate the inclusion probability of each covariate in the real model conditional on the data. Variable selection can be useful for estimating incremental effectiveness and incremental cost, through Bayesian model averaging, as well as for subgroup analysis.
Bayesian phylogeny analysis via stochastic approximation Monte Carlo
Cheon, Sooyoung
2009-11-01
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time. © 2009 Elsevier Inc. All rights reserved.
Combination Clustering Analysis Method and its Application
Bang-Chun Wen; Li-Yuan Dong; Qin-Liang Li; Yang Liu
2013-01-01
The traditional clustering analysis method can not automatically determine the optimal clustering number. In this study, we provided a new clustering analysis method which is combination clustering analysis method to solve this problem. Through analyzed 25 kinds of automobile data samples by combination clustering analysis method, the correctness of the analysis result was verified. It showed that combination clustering analysis method could objectively determine the number of clustering firs...
Non-stationarity in GARCH models: A Bayesian analysis
Kleibergen, Frank; Dijk, Herman
1993-01-01
textabstractFirst, the non-stationarity properties of the conditional variances in the GARCH(1,1) model are analysed using the concept of infinite persistence of shocks. Given a time sequence of probabilities for increasing/decreasing conditional variances, a theoretical formula for quasi-strict non-stationarity is defined. The resulting conditions for the GARCH(1,1) model are shown to differ from the weak stationarity conditions mainly used in the literature. Bayesian statistical analysis us...
Using Bayesian Population Viability Analysis to Define Relevant Conservation Objectives
Green, Adam W.; Bailey, Larissa L.
2015-01-01
Adaptive management provides a useful framework for managing natural resources in the face of uncertainty. An important component of adaptive management is identifying clear, measurable conservation objectives that reflect the desired outcomes of stakeholders. A common objective is to have a sustainable population, or metapopulation, but it can be difficult to quantify a threshold above which such a population is likely to persist. We performed a Bayesian metapopulation viability analysis (BM...
REMITTANCES, DUTCH DISEASE, AND COMPETITIVENESS: A BAYESIAN ANALYSIS
FARID MAKHLOUF; MAZHAR MUGHAL
2013-01-01
The paper studies symptoms of Dutch disease in the Pakistani economy arising from international remittances. An IV Bayesian analysis is carried out to take care of the endogeneity and uncertainty due to the managed float of Pakistani Rupee. We find evidence for both spending and resource movement effects in both the short and the long-run. These impacts are stronger and different from those the Official Development Assistance and the FDI exert. We find that while aggregate remittances and the...
Optimizing Nuclear Reaction Analysis (NRA) using Bayesian Experimental Design
von Toussaint, U.; Schwarz-Selinger, T.; Gori, S.
2008-01-01
Nuclear Reaction Analysis with ${}^{3}$He holds the promise to measure Deuterium depth profiles up to large depths. However, the extraction of the depth profile from the measured data is an ill-posed inversion problem. Here we demonstrate how Bayesian Experimental Design can be used to optimize the number of measurements as well as the measurement energies to maximize the information gain. Comparison of the inversion properties of the optimized design with standard settings reveals huge possi...
Integrative cluster analysis in bioinformatics
Abu-Jamous, Basel; Nandi, Asoke K
2015-01-01
Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o
Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang
2016-07-01
This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.
Bayesian History Reconstruction of Complex Human Gene Clusters on a Phylogeny
Vinař, Tomáš; Song, Giltae; Siepel, Adam
2009-01-01
Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance, since they have been shown to be the source of evolutionary innovation, and have been linked to multiple diseases, including HIV and a variety of cancers. Previously, Zhang et al. (2008) developed an algorithm for reconstructing parsimonious evolutionary histories of such gene clusters, using only human genomic sequence data. In this paper, we propose a probabilistic model for the evolution of gene clusters on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate that our method will be useful in analyzing these valuable new data sets.
Bayesian-network-based safety risk analysis in construction projects
This paper presents a systemic decision support approach for safety risk analysis under uncertainty in tunnel construction. Fuzzy Bayesian Networks (FBN) is used to investigate causal relationships between tunnel-induced damage and its influential variables based upon the risk/hazard mechanism analysis. Aiming to overcome limitations on the current probability estimation, an expert confidence indicator is proposed to ensure the reliability of the surveyed data for fuzzy probability assessment of basic risk factors. A detailed fuzzy-based inference procedure is developed, which has a capacity of implementing deductive reasoning, sensitivity analysis and abductive reasoning. The “3σ criterion” is adopted to calculate the characteristic values of a triangular fuzzy number in the probability fuzzification process, and the α-weighted valuation method is adopted for defuzzification. The construction safety analysis progress is extended to the entire life cycle of risk-prone events, including the pre-accident, during-construction continuous and post-accident control. A typical hazard concerning the tunnel leakage in the construction of Wuhan Yangtze Metro Tunnel in China is presented as a case study, in order to verify the applicability of the proposed approach. The results demonstrate the feasibility of the proposed approach and its application potential. A comparison of advantages and disadvantages between FBN and fuzzy fault tree analysis (FFTA) as risk analysis tools is also conducted. The proposed approach can be used to provide guidelines for safety analysis and management in construction projects, and thus increase the likelihood of a successful project in a complex environment. - Highlights: • A systemic Bayesian network based approach for safety risk analysis is developed. • An expert confidence indicator for probability fuzzification is proposed. • Safety risk analysis progress is extended to entire life cycle of risk-prone events. • A typical
BaTMAn: Bayesian Technique for Multi-image Analysis
Casado, J; García-Benito, R; Guidi, G; Choudhury, O S; Bellocchi, E; Sánchez, S; Díaz, A I
2016-01-01
This paper describes the Bayesian Technique for Multi-image Analysis (BaTMAn), a novel image segmentation technique based on Bayesian statistics, whose main purpose is to characterize an astronomical dataset containing spatial information and perform a tessellation based on the measurements and errors provided as input. The algorithm will iteratively merge spatial elements as long as they are statistically consistent with carrying the same information (i.e. signal compatible with being identical within the errors). We illustrate its operation and performance with a set of test cases that comprises both synthetic and real Integral-Field Spectroscopic (IFS) data. Our results show that the segmentations obtained by BaTMAn adapt to the underlying structure of the data, regardless of the precise details of their morphology and the statistical properties of the noise. The quality of the recovered signal represents an improvement with respect to the input, especially in those regions where the signal is actually con...
Bayesian investigation of isochrone consistency using the old open cluster NGC 188
Hills, Shane; Courteau, Stéphane [Department of Physics, Engineering Physics and Astronomy, Queen’s University, Kingston, ON K7L 3N6 Canada (Canada); Von Hippel, Ted [Department of Physical Sciences, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114 (United States); Geller, Aaron M., E-mail: shane.hills@queensu.ca, E-mail: courteau@astro.queensu.ca, E-mail: ted.vonhippel@erau.edu, E-mail: a-geller@northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA) and Department of Physics and Astronomy, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208 (United States)
2015-03-01
This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities that enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHK{sub S}. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHK{sub S} photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M){sub V} = (11.441 ± 0.007, 11.525 ± 0.005) mag, and A{sub V} = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.
Bayesian investigation of isochrone consistency using the old open cluster NGC 188
This paper provides a detailed comparison of the differences in parameters derived for a star cluster from its color–magnitude diagrams (CMDs) depending on the filters and models used. We examine the consistency and reliability of fitting three widely used stellar evolution models to 15 combinations of optical and near-IR photometry for the old open cluster NGC 188. The optical filter response curves match those of theoretical systems and are thus not the source of fit inconsistencies. NGC 188 is ideally suited to this study thanks to a wide variety of high-quality photometry and available proper motions and radial velocities that enable us to remove non-cluster members and many binaries. Our Bayesian fitting technique yields inferred values of age, metallicity, distance modulus, and absorption as a function of the photometric band combinations and stellar models. We show that the historically favored three-band combinations of UBV and VRI can be meaningfully inconsistent with each other and with longer baseline data sets such as UBVRIJHKS. Differences among model sets can also be substantial. For instance, fitting Yi et al. (2001) and Dotter et al. (2008) models to UBVRIJHKS photometry for NGC 188 yields the following cluster parameters: age = (5.78 ± 0.03, 6.45 ± 0.04) Gyr, [Fe/H] = (+0.125 ± 0.003, −0.077 ± 0.003) dex, (m−M)V = (11.441 ± 0.007, 11.525 ± 0.005) mag, and AV = (0.162 ± 0.003, 0.236 ± 0.003) mag, respectively. Within the formal fitting errors, these two fits are substantially and statistically different. Such differences among fits using different filters and models are a cautionary tale regarding our current ability to fit star cluster CMDs. Additional modeling of this kind, with more models and star clusters, and future Gaia parallaxes are critical for isolating and quantifying the most relevant uncertainties in stellar evolutionary models.
Bayesian networks for omics data analysis
Gavai, A.K.
2009-01-01
This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics,
Nonparametric Bayesian Negative Binomial Factor Analysis
Zhou, Mingyuan
2016-01-01
A common approach to analyze an attribute-instance count matrix, an element of which represents how many times an attribute appears in an instance, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for an attribute present in an instance to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a...
Bayesian networks for omics data analysis
Gavai, A.K.
2009-01-01
This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throu...
Valečková, Markéta; Kárný, Miroslav; Sutanto, E. L.
2001-01-01
Roč. 37, č. 6 (2001), s. 1071-1078. ISSN 0005-1098 R&D Projects: GA ČR GA102/99/1564 Grant ostatní: IST(XE) 1999/12058 Institutional research plan: AV0Z1075907 Keywords : Markov chain * clustering * Bayesian mixture estimation Subject RIV: BC - Control Systems Theory Impact factor: 1.449, year: 2001
Zhang, Zhen; Lim, Chae Young; Maiti, Tapabrata; Kato, Seiji
2016-01-01
In climate change study, the infrared spectral signatures of climate change have recently been conceptually adopted, and widely applied to identifying and attributing atmospheric composition change. We propose a Bayesian hierarchical model for spatial clustering of the high-dimensional functional data based on the effects of functional covariates and local features. We couple the functional mixed-effects model with a generalized spatial partitioning method for: (1) producing spatially contigu...
Bayesian analysis to detect abrupt changes in extreme hydrological processes
Jo, Seongil; Kim, Gwangsu; Jeon, Jong-June
2016-07-01
In this study, we develop a new method for a Bayesian change point analysis. The proposed method is easy to implement and can be extended to a wide class of distributions. Using a generalized extreme-value distribution, we investigate the annual maximum of precipitations observed at stations in the South Korean Peninsula, and find significant changes in the considered sites. We evaluate the hydrological risk in predictions using the estimated return levels. In addition, we explain that the misspecification of the probability model can lead to a bias in the number of change points and using a simple example, show that this problem is difficult to avoid by technical data transformation.
A Bayesian analysis of pentaquark signals from CLAS data
We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a Θ+ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a Θ+. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner
A Bayesian analysis of pentaquark signals from CLAS data
David Ireland; Bryan McKinnon; Dan Protopopescu; Pawel Ambrozewicz; Marco Anghinolfi; G. Asryan; Harutyun Avakian; H. Bagdasaryan; Nathan Baillie; Jacques Ball; Nathan Baltzell; V. Batourine; Marco Battaglieri; Ivan Bedlinski; Ivan Bedlinskiy; Matthew Bellis; Nawal Benmouna; Barry Berman; Angela Biselli; Lukasz Blaszczyk; Sylvain Bouchigny; Sergey Boyarinov; Robert Bradford; Derek Branford; William Briscoe; William Brooks; Volker Burkert; Cornel Butuceanu; John Calarco; Sharon Careccia; Daniel Carman; Liam Casey; Shifeng Chen; Lu Cheng; Philip Cole; Patrick Collins; Philip Coltharp; Donald Crabb; Volker Crede; Natalya Dashyan; Rita De Masi; Raffaella De Vita; Enzo De Sanctis; Pavel Degtiarenko; Alexandre Deur; Richard Dickson; Chaden Djalali; Gail Dodge; Joseph Donnelly; David Doughty; Michael Dugger; Oleksandr Dzyubak; Hovanes Egiyan; Kim Egiyan; Lamiaa Elfassi; Latifa Elouadrhiri; Paul Eugenio; Gleb Fedotov; Gerald Feldman; Ahmed Fradi; Herbert Funsten; Michel Garcon; Gagik Gavalian; Nerses Gevorgyan; Gerard Gilfoyle; Kevin Giovanetti; Francois-Xavier Girod; John Goetz; Wesley Gohn; Atilla Gonenc; Ralf Gothe; Keith Griffioen; Michel Guidal; Nevzat Guler; Lei Guo; Vardan Gyurjyan; Kawtar Hafidi; Hayk Hakobyan; Charles Hanretty; Neil Hassall; F. Hersman; Ishaq Hleiqawi; Maurik Holtrop; Charles Hyde; Yordanka Ilieva; Boris Ishkhanov; Eugeny Isupov; D. Jenkins; Hyon-Suk Jo; John Johnstone; Kyungseon Joo; Henry Juengst; Narbe Kalantarians; James Kellie; Mahbubul Khandaker; Wooyoung Kim; Andreas Klein; Franz Klein; Mikhail Kossov; Zebulun Krahn; Laird Kramer; Valery Kubarovsky; Joachim Kuhn; Sergey Kuleshov; Viacheslav Kuznetsov; Jeff Lachniet; Jean Laget; Jorn Langheinrich; D. Lawrence; Kenneth Livingston; Haiyun Lu; Marion MacCormick; Nikolai Markov; Paul Mattione; Bernhard Mecking; Mac Mestayer; Curtis Meyer; Tsutomu Mibe; Konstantin Mikhaylov; Marco Mirazita; Rory Miskimen; Viktor Mokeev; Brahim Moreno; Kei Moriya; Steven Morrow; Maryam Moteabbed; Edwin Munevar Espitia; Gordon Mutchler; Pawel Nadel-Turonski; Rakhsha Nasseripour; Silvia Niccolai; Gabriel Niculescu; Maria-Ioana Niculescu; Bogdan Niczyporuk; Megh Niroula; Rustam Niyazov; Mina Nozar; Mikhail Osipenko; Alexander Ostrovidov; Kijun Park; Evgueni Pasyuk; Craig Paterson; Sergio Pereira; Joshua Pierce; Nikolay Pivnyuk; Oleg Pogorelko; Sergey Pozdnyakov; John Price; Sebastien Procureur; Yelena Prok; Brian Raue; Giovanni Ricco; Marco Ripani; Barry Ritchie; Federico Ronchetti; Guenther Rosner; Patrizia Rossi; Franck Sabatie; Julian Salamanca; Carlos Salgado; Joseph Santoro; Vladimir Sapunenko; Reinhard Schumacher; Vladimir Serov; Youri Sharabian; Dmitri Sharov; Nikolay Shvedunov; Elton Smith; Lee Smith; Daniel Sober; Daria Sokhan; Aleksey Stavinskiy; Samuel Stepanyan; Stepan Stepanyan; Burnham Stokes; Paul Stoler; Steffen Strauch; Mauro Taiuti; David Tedeschi; Ulrike Thoma; Avtandil Tkabladze; Svyatoslav Tkachenko; Clarisse Tur; Maurizio Ungaro; Michael Vineyard; Alexander Vlassov; Daniel Watts; Lawrence Weinstein; Dennis Weygand; M. Williams; Elliott Wolin; M.H. Wood; Amrit Yegneswaran; Lorenzo Zana; Jixie Zhang; Bo Zhao; Zhiwen Zhao
2008-02-01
We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.
A Bayesian analysis of pentaquark signals from CLAS data
Ireland, D G; Protopopescu, D; Ambrozewicz, P; Anghinolfi, M; Asryan, G; Avakian, H; Bagdasaryan, H; Baillie, N; Ball, J P; Baltzell, N A; Batourine, V; Battaglieri, M; Bedlinskiy, I; Bellis, M; Benmouna, N; Berman, B L; Biselli, A S; Blaszczyk, L; Bouchigny, S; Boiarinov, S; Bradford, R; Branford, D; Briscoe, W J; Brooks, W K; Burkert, V D; Butuceanu, C; Calarco, J R; Careccia, S L; Carman, D S; Casey, L; Chen, S; Cheng, L; Cole, P L; Collins, P; Coltharp, P; Crabb, D; Credé, V; Dashyan, N; De Masi, R; De Vita, R; De Sanctis, E; Degtyarenko, P V; Deur, A; Dickson, R; Djalali, C; Dodge, G E; Donnelly, J; Doughty, D; Dugger, M; Dzyubak, O P; Egiyan, H; Egiyan, K S; El Fassi, L; Elouadrhiri, L; Eugenio, P; Fedotov, G; Feldman, G; Fradi, A; Funsten, H; Garçon, M; Gavalian, G; Gevorgyan, N; Gilfoyle, G P; Giovanetti, K L; Girod, F X; Goetz, J T; Gohn, W; Gonenc, A; Gothe, R W; Griffioen, K A; Guidal, M; Guler, N; Guo, L; Gyurjyan, V; Hafidi, K; Hakobyan, H; Hanretty, C; Hassall, N; Hersman, F W; Hleiqawi, I; Holtrop, M; Hyde-Wright, C E; Ilieva, Y; Ishkhanov, B S; Isupov, E L; Jenkins, D; Jo, H S; Johnstone, J R; Joo, K; Jüngst, H G; Kalantarians, N; Kellie, J D; Khandaker, M; Kim, W; Klein, A; Klein, F J; Kossov, M; Krahn, Z; Kramer, L H; Kubarovski, V; Kühn, J; Kuleshov, S V; Kuznetsov, V; Lachniet, J; Laget, J M; Langheinrich, J; Lawrence, D; Livingston, K; Lu, H Y; MacCormick, M; Markov, N; Mattione, P; Mecking, B A; Mestayer, M D; Meyer, C A; Mibe, T; Mikhailov, K; Mirazita, M; Miskimen, R; Mokeev, V; Moreno, B; Moriya, K; Morrow, S A; Moteabbed, M; Munevar, E; Mutchler, G S; Nadel-Turonski, P; Nasseripour, R; Niccolai, S; Niculescu, G; Niculescu, I; Niczyporuk, B B; Niroula, M R; Niyazov, R A; Nozar, M; Osipenko, M; Ostrovidov, A I; Park, K; Pasyuk, E; Paterson, C; Anefalos Pereira, S; Pierce, J; Pivnyuk, N; Pogorelko, O; Pozdniakov, S; Price, J W; Procureur, S; Prok, Y; Raue, B A; Ricco, G; Ripani, M; Ritchie, B G; Ronchetti, F; Rosner, G; Rossi, P; Sabatie, F; Salamanca, J; Salgado, C; Santoro, J P; Sapunenko, V; Schumacher, R A; Serov, V S; Sharabyan, Yu G; Sharov, D; Shvedunov, N V; Smith, E S; Smith, L C; Sober, D I; Sokhan, D; Stavinsky, A; Stepanyan, S S; Stepanyan, S; Stokes, B E; Stoler, P; Strauch, S; Taiuti, M; Tedeschi, D J; Thoma, U; Tkabladze, A; Tkachenko, S; Tur, C; Ungaro, M; Vineyard, M F; Vlassov, A V; Watts, D P; Weinstein, L B; Weygand, D P; Williams, M; Wolin, E; Wood, M H; Yegneswaran, A; Zana, L; Zhang, J; Zhao, B; Zhao, Z W
2007-01-01
We examine the results of two measurements by the CLAS collaboration, one of which claimed evidence for a $\\Theta^{+}$ pentaquark, whilst the other found no such evidence. The unique feature of these two experiments was that they were performed with the same experimental setup. Using a Bayesian analysis we find that the results of the two experiments are in fact compatible with each other, but that the first measurement did not contain sufficient information to determine unambiguously the existence of a $\\Theta^{+}$. Further, we suggest a means by which the existence of a new candidate particle can be tested in a rigorous manner.
Development of bayesian update database for PRA data analysis (BUDDA)
It is necessary what independent plant PRA (Probabilistic Risk Assessment) for risk informed applications of nuclear power plant. Therefore, it must build the environment that the utilities can efficiently collect PRA data, and can estimate PRA parameters without statistical expertise. This report explains development of failure events analysis DB for PRA failure rate computation using bayesian update technique. BUDDA has the function to compute failure rate with a combination of multiple DB (include the pre-installed data based on NUCIA), and to manage independent plant DB (failure events, number of components, operation time, number of demand , prior distributions). (author)
Safety Analysis of Liquid Rocket Engine Using Bayesian Networks
WANG Hua-wei; YAN Zhi-qiang
2007-01-01
Safety analysis for liquid rocket engine has a great meaning for shortening development cycle, saving development expenditure and reducing development risk. The relationship between the structure and component of liquid rocket engine is much more complex, furthermore test data are absent in development phase. Thereby, the uncertainties exist in safety analysis for liquid rocket engine. A safety analysis model integrated with FMEA(failure mode and effect analysis)based on Bayesian networks (BN) is brought forward for liquid rocket engine, which can combine qualitative analysis with quantitative decision. The method has the advantages of fusing multi-information, saving sample amount and having high veracity. An example shows that the method is efficient.
Implementation of a Bayesian Engine for Uncertainty Analysis
Leng Vang; Curtis Smith; Steven Prescott
2014-08-01
In probabilistic risk assessment, it is important to have an environment where analysts have access to a shared and secured high performance computing and a statistical analysis tool package. As part of the advanced small modular reactor probabilistic risk analysis framework implementation, we have identified the need for advanced Bayesian computations. However, in order to make this technology available to non-specialists, there is also a need of a simplified tool that allows users to author models and evaluate them within this framework. As a proof-of-concept, we have implemented an advanced open source Bayesian inference tool, OpenBUGS, within the browser-based cloud risk analysis framework that is under development at the Idaho National Laboratory. This development, the “OpenBUGS Scripter” has been implemented as a client side, visual web-based and integrated development environment for creating OpenBUGS language scripts. It depends on the shared server environment to execute the generated scripts and to transmit results back to the user. The visual models are in the form of linked diagrams, from which we automatically create the applicable OpenBUGS script that matches the diagram. These diagrams can be saved locally or stored on the server environment to be shared with other users.
Bayesian analysis of inflationary features in Planck and SDSS data
Benetti, Micol
2016-01-01
We perform a Bayesian analysis to study possible features in the primordial inflationary power spectrum of scalar perturbations. In particular, we analyse the possibility of detecting the imprint of these primordial features in the anisotropy temperature power spectrum of the Cosmic Microwave Background (CMB) and also in the matter power spectrum P (k). We use the most recent CMB data provided by the Planck Collaboration and P (k) measurements from the eleventh data release of the Sloan Digital Sky Survey. We focus our analysis on a class of potentials whose features are localised at different intervals of angular scales, corresponding to multipoles in the ranges 10 < l < 60 (Oscill-1) and 150 < l < 300 (Oscill-2). Our results show that one of the step-potentials (Oscill-1) provides a better fit to the CMB data than does the featureless LCDM scenario, with a moderate Bayesian evidence in favor of the former. Adding the P (k) data to the analysis weakens the evidence of the Oscill-1 potential relat...
Analysis of Wave Directional Spreading by Bayesian Parameter Estimation
钱桦; 莊士贤; 高家俊
2002-01-01
A spatial array of wave gauges installed on an observatoion platform has been designed and arranged to measure the lo-cal features of winter monsoon directional waves off Taishi coast of Taiwan. A new method, named the Bayesian ParameterEstimation Method( BPEM), is developed and adopted to determine the main direction and the directional spreading parame-ter of directional spectra. The BPEM could be considered as a regression analysis to find the maximum joint probability ofparameters, which best approximates the observed data from the Bayesian viewpoint. The result of the analysis of field wavedata demonstrates the highly dependency of the characteristics of normalized directional spreading on the wave age. The Mit-suyasu type empirical formula of directional spectnun is therefore modified to be representative of monsoon wave field. More-over, it is suggested that Smax could be expressed as a function of wave steepness. The values of Smax decrease with increas-ing steepness. Finally, a local directional spreading model, which is simple to be utilized in engineering practice, is prop-osed.
Bayesian analysis of physiologically based toxicokinetic and toxicodynamic models.
Hack, C Eric
2006-04-17
Physiologically based toxicokinetic (PBTK) and toxicodynamic (TD) models of bromate in animals and humans would improve our ability to accurately estimate the toxic doses in humans based on available animal studies. These mathematical models are often highly parameterized and must be calibrated in order for the model predictions of internal dose to adequately fit the experimentally measured doses. Highly parameterized models are difficult to calibrate and it is difficult to obtain accurate estimates of uncertainty or variability in model parameters with commonly used frequentist calibration methods, such as maximum likelihood estimation (MLE) or least squared error approaches. The Bayesian approach called Markov chain Monte Carlo (MCMC) analysis can be used to successfully calibrate these complex models. Prior knowledge about the biological system and associated model parameters is easily incorporated in this approach in the form of prior parameter distributions, and the distributions are refined or updated using experimental data to generate posterior distributions of parameter estimates. The goal of this paper is to give the non-mathematician a brief description of the Bayesian approach and Markov chain Monte Carlo analysis, how this technique is used in risk assessment, and the issues associated with this approach. PMID:16466842
Node Augmentation Technique in Bayesian Network Evidence Analysis and Marshaling
Keselman, Dmitry [Los Alamos National Laboratory; Tompkins, George H [Los Alamos National Laboratory; Leishman, Deborah A [Los Alamos National Laboratory
2010-01-01
Given a Bayesian network, sensitivity analysis is an important activity. This paper begins by describing a network augmentation technique which can simplifY the analysis. Next, we present two techniques which allow the user to determination the probability distribution of a hypothesis node under conditions of uncertain evidence; i.e. the state of an evidence node or nodes is described by a user specified probability distribution. Finally, we conclude with a discussion of three criteria for ranking evidence nodes based on their influence on a hypothesis node. All of these techniques have been used in conjunction with a commercial software package. A Bayesian network based on a directed acyclic graph (DAG) G is a graphical representation of a system of random variables that satisfies the following Markov property: any node (random variable) is independent of its non-descendants given the state of all its parents (Neapolitan, 2004). For simplicities sake, we consider only discrete variables with a finite number of states, though most of the conclusions may be generalized.
Inference algorithms and learning theory for Bayesian sparse factor analysis
Bayesian sparse factor analysis has many applications; for example, it has been applied to the problem of inferring a sparse regulatory network from gene expression data. We describe a number of inference algorithms for Bayesian sparse factor analysis using a slab and spike mixture prior. These include well-established Markov chain Monte Carlo (MCMC) and variational Bayes (VB) algorithms as well as a novel hybrid of VB and Expectation Propagation (EP). For the case of a single latent factor we derive a theory for learning performance using the replica method. We compare the MCMC and VB/EP algorithm results with simulated data to the theoretical prediction. The results for MCMC agree closely with the theory as expected. Results for VB/EP are slightly sub-optimal but show that the new algorithm is effective for sparse inference. In large-scale problems MCMC is infeasible due to computational limitations and the VB/EP algorithm then provides a very useful computationally efficient alternative.
Williford, W. O.; Hsieh, P.; Carter, M. C.
1974-01-01
A Bayesian analysis of the two discrete probability models, the negative binomial and the modified negative binomial distributions, which have been used to describe thunderstorm activity at Cape Kennedy, Florida, is presented. The Bayesian approach with beta prior distributions is compared to the classical approach which uses a moment method of estimation or a maximum-likelihood method. The accuracy and simplicity of the Bayesian method is demonstrated.
Bayesian networks inference algorithm to implement Dempster Shafer theory in reliability analysis
This paper deals with the use of Bayesian networks to compute system reliability. The reliability analysis problem is described and the usual methods for quantitative reliability analysis are presented within a case study. Some drawbacks that justify the use of Bayesian networks are identified. The basic concepts of the Bayesian networks application to reliability analysis are introduced and a model to compute the reliability for the case study is presented. Dempster Shafer theory to treat epistemic uncertainty in reliability analysis is then discussed and its basic concepts that can be applied thanks to the Bayesian network inference algorithm are introduced. Finally, it is shown, with a numerical example, how Bayesian networks' inference algorithms compute complex system reliability and what the Dempster Shafer theory can provide to reliability analysis
Risk analysis of dust explosion scenarios using Bayesian networks.
Yuan, Zhi; Khakzad, Nima; Khan, Faisal; Amyotte, Paul
2015-02-01
In this study, a methodology has been proposed for risk analysis of dust explosion scenarios based on Bayesian network. Our methodology also benefits from a bow-tie diagram to better represent the logical relationships existing among contributing factors and consequences of dust explosions. In this study, the risks of dust explosion scenarios are evaluated, taking into account common cause failures and dependencies among root events and possible consequences. Using a diagnostic analysis, dust particle properties, oxygen concentration, and safety training of staff are identified as the most critical root events leading to dust explosions. The probability adaptation concept is also used for sequential updating and thus learning from past dust explosion accidents, which is of great importance in dynamic risk assessment and management. We also apply the proposed methodology to a case study to model dust explosion scenarios, to estimate the envisaged risks, and to identify the vulnerable parts of the system that need additional safety measures. PMID:25264172
Afreen, Nazia; Naqvi, Irshad H; Broor, Shobha; Ahmed, Anwar; Kazim, Syed Naqui; Dohare, Ravins; Kumar, Manoj; Parveen, Shama
2016-03-01
Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India. PMID:26977703
Nazia Afreen
2016-03-01
Full Text Available Dengue fever is the most important arboviral disease in the tropical and sub-tropical countries of the world. Delhi, the metropolitan capital state of India, has reported many dengue outbreaks, with the last outbreak occurring in 2013. We have recently reported predominance of dengue virus serotype 2 during 2011-2014 in Delhi. In the present study, we report molecular characterization and evolutionary analysis of dengue serotype 2 viruses which were detected in 2011-2014 in Delhi. Envelope genes of 42 DENV-2 strains were sequenced in the study. All DENV-2 strains grouped within the Cosmopolitan genotype and further clustered into three lineages; Lineage I, II and III. Lineage III replaced lineage I during dengue fever outbreak of 2013. Further, a novel mutation Thr404Ile was detected in the stem region of the envelope protein of a single DENV-2 strain in 2014. Nucleotide substitution rate and time to the most recent common ancestor were determined by molecular clock analysis using Bayesian methods. A change in effective population size of Indian DENV-2 viruses was investigated through Bayesian skyline plot. The study will be a vital road map for investigation of epidemiology and evolutionary pattern of dengue viruses in India.
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
Scargle, Jeffrey D; Jackson, Brad; Chiang, James
2012-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it - an improved and generalized version of Bayesian Blocks (Scargle 1998) - that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multi-variate time series data, analysis of vari...
Bayesian large-scale structure inference and cosmic web analysis
Leclercq, Florent
2015-01-01
Surveys of the cosmic large-scale structure carry opportunities for building and testing cosmological theories about the origin and evolution of the Universe. This endeavor requires appropriate data assimilation tools, for establishing the contact between survey catalogs and models of structure formation. In this thesis, we present an innovative statistical approach for the ab initio simultaneous analysis of the formation history and morphology of the cosmic web: the BORG algorithm infers the primordial density fluctuations and produces physical reconstructions of the dark matter distribution that underlies observed galaxies, by assimilating the survey data into a cosmological structure formation model. The method, based on Bayesian probability theory, provides accurate means of uncertainty quantification. We demonstrate the application of BORG to the Sloan Digital Sky Survey data and describe the primordial and late-time large-scale structure in the observed volume. We show how the approach has led to the fi...
Bayesian analysis of factors associated with fibromyalgia syndrome subjects
Jayawardana, Veroni; Mondal, Sumona; Russek, Leslie
2015-01-01
Factors contributing to movement-related fear were assessed by Russek, et al. 2014 for subjects with Fibromyalgia (FM) based on the collected data by a national internet survey of community-based individuals. The study focused on the variables, Activities-Specific Balance Confidence scale (ABC), Primary Care Post-Traumatic Stress Disorder screen (PC-PTSD), Tampa Scale of Kinesiophobia (TSK), a Joint Hypermobility Syndrome screen (JHS), Vertigo Symptom Scale (VSS-SF), Obsessive-Compulsive Personality Disorder (OCPD), Pain, work status and physical activity dependent from the "Revised Fibromyalgia Impact Questionnaire" (FIQR). The study presented in this paper revisits same data with a Bayesian analysis where appropriate priors were introduced for variables selected in the Russek's paper.
A Bayesian analysis of regularised source inversions in gravitational lensing
Suyu, S H; Hobson, M P; Marshall, P J
2006-01-01
Strong gravitational lens systems with extended sources are of special interest because they provide additional constraints on the models of the lens systems. To use a gravitational lens system for measuring the Hubble constant, one would need to determine the lens potential and the source intensity distribution simultaneously. A linear inversion method to reconstruct a pixellated source distribution of a given lens potential model was introduced by Warren and Dye. In the inversion process, a regularisation on the source intensity is often needed to ensure a successful inversion with a faithful resulting source. In this paper, we use Bayesian analysis to determine the optimal regularisation constant (strength of regularisation) of a given form of regularisation and to objectively choose the optimal form of regularisation given a selection of regularisations. We consider and compare quantitatively three different forms of regularisation previously described in the literature for source inversions in gravitatio...
A Bayesian Seismic Hazard Analysis for the city of Naples
Faenza, Licia; Pierdominici, Simona; Hainzl, Sebastian; Cinti, Francesca R.; Sandri, Laura; Selva, Jacopo; Tonini, Roberto; Perfetti, Paolo
2016-04-01
In the last years many studies have been focused on determination and definition of the seismic, volcanic and tsunamogenic hazard in the city of Naples. The reason is that the town of Naples with its neighboring area is one of the most densely populated places in Italy. In addition, the risk is increased also by the type and condition of buildings and monuments in the city. It is crucial therefore to assess which active faults in Naples and surrounding area could trigger an earthquake able to shake and damage the urban area. We collect data from the most reliable and complete databases of macroseismic intensity records (from 79 AD to present). For each seismic event an active tectonic structure has been associated. Furthermore a set of active faults, well-known from geological investigations, located around the study area that they could shake the city, not associated with any earthquake, has been taken into account for our studies. This geological framework is the starting point for our Bayesian seismic hazard analysis for the city of Naples. We show the feasibility of formulating the hazard assessment procedure to include the information of past earthquakes into the probabilistic seismic hazard analysis. This strategy allows on one hand to enlarge the information used in the evaluation of the hazard, from alternative models for the earthquake generation process to past shaking and on the other hand to explicitly account for all kinds of information and their uncertainties. The Bayesian scheme we propose is applied to evaluate the seismic hazard of Naples. We implement five different spatio-temporal models to parameterize the occurrence of earthquakes potentially dangerous for Naples. Subsequently we combine these hazard curves with ShakeMap of past earthquakes that have been felt in Naples. The results are posterior hazard assessment for three exposure times, e.g., 50, 10 and 5 years, in a dense grid that cover the municipality of Naples, considering bedrock soil
A Bayesian latent group analysis for detecting poor effort in the assessment of malingering
A. Ortega; E.-J. Wagenmakers; M.D. Lee; H.J. Markowitsch; M. Piefke
2012-01-01
Despite their theoretical appeal, Bayesian methods for the assessment of poor effort and malingering are still rarely used in neuropsychological research and clinical diagnosis. In this article, we outline a novel and easy-to-use Bayesian latent group analysis of malingering whose goal is to identif
Analysis of Various Clustering Algorithms
Asst Prof. Sunila Godara,; Ms. Amita Verma,
2013-01-01
Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups. This paper reviews four types of clustering techniques- k-Means Clustering, Farther first clustering, Density Based Clustering, Filtered clusterer. These clustering techniques are implemented and analyzed using a clustering tool WEKA. Performance of the 4 techniques are presented and compared.
Multivariate meta-analysis of mixed outcomes: a Bayesian approach.
Bujkiewicz, Sylwia; Thompson, John R; Sutton, Alex J; Cooper, Nicola J; Harrison, Mark J; Symmons, Deborah P M; Abrams, Keith R
2013-09-30
Multivariate random effects meta-analysis (MRMA) is an appropriate way for synthesizing data from studies reporting multiple correlated outcomes. In a Bayesian framework, it has great potential for integrating evidence from a variety of sources. In this paper, we propose a Bayesian model for MRMA of mixed outcomes, which extends previously developed bivariate models to the trivariate case and also allows for combination of multiple outcomes that are both continuous and binary. We have constructed informative prior distributions for the correlations by using external evidence. Prior distributions for the within-study correlations were constructed by employing external individual patent data and using a double bootstrap method to obtain the correlations between mixed outcomes. The between-study model of MRMA was parameterized in the form of a product of a series of univariate conditional normal distributions. This allowed us to place explicit prior distributions on the between-study correlations, which were constructed using external summary data. Traditionally, independent 'vague' prior distributions are placed on all parameters of the model. In contrast to this approach, we constructed prior distributions for the between-study model parameters in a way that takes into account the inter-relationship between them. This is a flexible method that can be extended to incorporate mixed outcomes other than continuous and binary and beyond the trivariate case. We have applied this model to a motivating example in rheumatoid arthritis with the aim of incorporating all available evidence in the synthesis and potentially reducing uncertainty around the estimate of interest. PMID:23630081
Ildikó Ungvári
Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.
Survey and Analysis of University Clustering
Srinatha Karur
2013-07-01
Full Text Available This paper gives on Clustering of Universities in the world with respect to their country policies OR local polices OR continent level polices with sub aims. So clustering method can generally apply when objective is specifically mentioned. For general objectives clusters are available in the form of logical or physical groups without networks. In this paper we emphasis on only University Clusters directly or University Clusters with some other clusters. Data miming methods are used for useful for Sampling Analysis and Clustering of Universities and Colleges with respect to local clusters [1] pp 1.
Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis
Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William
2009-01-01
This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).
Bayesian meta-analysis models for microarray data: a comparative study
Song Joon J; Conlon Erin M; Liu Anna
2007-01-01
Abstract Background With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods. Results Two Bayesian meta-analysis models for microarray data have recently ...
Bayesian Analysis of Graphical Models of Marginal Independence for Three Way Contingency Tables
Tarantola, Claudia; Ntzoufras, Ioannis
2012-01-01
This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such models, involving suitable choices of prior parameters, estimation, model determination, as well as the allied computational issues. The posterior di...
Arturo Medrano-Soto
2004-12-01
Full Text Available Based on mixture models, we present a Bayesian method (called BClass to classify biological entities (e.g. genes when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.
New Ephemeris for LSI+61 303, A Bayesian Analysis
Gregory, P. C.
1997-12-01
The luminous early-type binary LSI+61 303 is an interesting radio, X-ray and possible gamma-ray source. At radio wavelengths it exhibits periodic outbursts with an approximate period of 26.5 days as well as a longer term modulation of the outburst peaks of approximately 4 years. Recently Paredes et al. have found evidence that the X-ray outbursts are very likely to recur with the same radio outburst period from an analysis of RXTE all sky monitoring data. The system has been observed by many groups at all wavelengths but still the energy source powering the radio outbursts and their relation to the high energy emission remains a mystery. For more details see the "LSI+61 303 Resource Page" at http://www.srl.caltech.edu/personnel/paulr/lsi.html . There has been increasing evidence for a change in the period of the system. We will present a new ephemeris for the system based on a Bayesian analysis of 20 years of radio observations including the GBI-NASA radio monitoring data.
Thermodynamically consistent Bayesian analysis of closed biochemical reaction systems
Goutsias John
2010-11-01
Full Text Available Abstract Background Estimating the rate constants of a biochemical reaction system with known stoichiometry from noisy time series measurements of molecular concentrations is an important step for building predictive models of cellular function. Inference techniques currently available in the literature may produce rate constant values that defy necessary constraints imposed by the fundamental laws of thermodynamics. As a result, these techniques may lead to biochemical reaction systems whose concentration dynamics could not possibly occur in nature. Therefore, development of a thermodynamically consistent approach for estimating the rate constants of a biochemical reaction system is highly desirable. Results We introduce a Bayesian analysis approach for computing thermodynamically consistent estimates of the rate constants of a closed biochemical reaction system with known stoichiometry given experimental data. Our method employs an appropriately designed prior probability density function that effectively integrates fundamental biophysical and thermodynamic knowledge into the inference problem. Moreover, it takes into account experimental strategies for collecting informative observations of molecular concentrations through perturbations. The proposed method employs a maximization-expectation-maximization algorithm that provides thermodynamically feasible estimates of the rate constant values and computes appropriate measures of estimation accuracy. We demonstrate various aspects of the proposed method on synthetic data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, and examine its robustness under conditions that violate key assumptions. Software, coded in MATLAB®, which implements all Bayesian analysis techniques discussed in this paper, is available free of charge at http://www.cis.jhu.edu/~goutsias/CSS%20lab/software.html. Conclusions Our approach provides an attractive statistical methodology for
Bayesian Analysis of Dynamic Multivariate Models with Multiple Structural Breaks
Sugita, Katsuhiro
2006-01-01
This paper considers a vector autoregressive model or a vector error correction model with multiple structural breaks in any subset of parameters, using a Bayesian approach with Markov chain Monte Carlo simulation technique. The number of structural breaks is determined as a sort of model selection by the posterior odds. For a cointegrated model, cointegrating rank is also allowed to change with breaks. Bayesian approach by Strachan (Journal of Business and Economic Statistics 21 (2003) 185) ...
Gasparini, Mauro; Eisele, J
2003-01-01
Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data. Many of the issues that are discussed with reference to the statistical analysis of compositional data have a natural counterpart in the construction of a Bayesian statistical model for categorical data. This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986) in his seminal book on compositional data. Particular emphasis is put on the pro...
Bayesian Analysis of Marginal Log-Linear Graphical Models for Three Way Contingency Tables
Ntzoufras, Ioannis; Tarantola, Claudia
2008-01-01
This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. We use a marginal log-linear parametrization, under which the model is defined through suitable zero-constraints on the interaction parameters calculated within marginal distributions. We undertake a comprehensive Bayesian analysis of these models, involving suitable choices of prior distributions, estimation, model determination, as well as the allied computational issue...
A Bayesian analysis of plutonium exposures in Sellafield workers.
Puncher, M; Riddell, A E
2016-03-01
The joint Russian (Mayak Production Association) and British (Sellafield) plutonium worker epidemiological analysis, undertaken as part of the European Union Framework Programme 7 (FP7) SOLO project, aims to investigate potential associations between cancer incidence and occupational exposures to plutonium using estimates of organ/tissue doses. The dose reconstruction protocol derived for the study makes best use of the most recent biokinetic models derived by the International Commission on Radiological Protection (ICRP) including a recent update to the human respiratory tract model (HRTM). This protocol was used to derive the final point estimates of absorbed doses for the study. Although uncertainties on the dose estimates were not included in the final epidemiological analysis, a separate Bayesian analysis has been performed for each of the 11 808 Sellafield plutonium workers included in the study in order to assess: A. The reliability of the point estimates provided to the epidemiologists and B. The magnitude of the uncertainty on dose estimates. This analysis, which accounts for uncertainties in biokinetic model parameters, intakes and measurement uncertainties, is described in the present paper. The results show that there is excellent agreement between the point estimates of dose and posterior mean values of dose. However, it is also evident that there are significant uncertainties associated with these dose estimates: the geometric range of the 97.5%:2.5% posterior values are a factor of 100 for lung dose, 30 for doses to liver and red bone marrow, and 40 for intakes: these uncertainties are not reflected in estimates of risk when point doses are used to assess them. It is also shown that better estimates of certain key HRTM absorption parameters could significantly reduce the uncertainties on lung dose in future studies. PMID:26584413
Light curve demography via Bayesian functional data analysis
Loredo, Thomas; Budavari, Tamas; Hendry, Martin A.; Kowal, Daniel; Ruppert, David
2015-08-01
Synoptic time-domain surveys provide astronomers, not simply more data, but a different kind of data: large ensembles of multivariate, irregularly and asynchronously sampled light curves. We describe a statistical framework for light curve demography—optimal accumulation and extraction of information, not only along individual light curves as conventional methods do, but also across large ensembles of related light curves. We build the framework using tools from functional data analysis (FDA), a rapidly growing area of statistics that addresses inference from datasets that sample ensembles of related functions. Our Bayesian FDA framework builds hierarchical models that describe light curve ensembles using multiple levels of randomness: upper levels describe the source population, and lower levels describe the observation process, including measurement errors and selection effects. Schematically, a particular object's light curve is modeled as the sum of a parameterized template component (modeling population-averaged behavior) and a peculiar component (modeling variability across the population), subsequently subjected to an observation model. A functional shrinkage adjustment to individual light curves emerges—an adaptive, functional generalization of the kind of adjustments made for Eddington or Malmquist bias in single-epoch photometric surveys. We are applying the framework to a variety of problems in synoptic time-domain survey astronomy, including optimal detection of weak sources in multi-epoch data, and improved estimation of Cepheid variable star luminosities from detailed demographic modeling of ensembles of Cepheid light curves.
STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it—an improved and generalized version of Bayesian Blocks—that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.
STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS
Scargle, Jeffrey D. [Space Science and Astrobiology Division, MS 245-3, NASA Ames Research Center, Moffett Field, CA 94035-1000 (United States); Norris, Jay P. [Physics Department, Boise State University, 2110 University Drive, Boise, ID 83725-1570 (United States); Jackson, Brad [The Center for Applied Mathematics and Computer Science, Department of Mathematics, San Jose State University, One Washington Square, MH 308, San Jose, CA 95192-0103 (United States); Chiang, James, E-mail: jeffrey.d.scargle@nasa.gov [W. W. Hansen Experimental Physics Laboratory, Kavli Institute for Particle Astrophysics and Cosmology, Department of Physics and SLAC National Accelerator Laboratory, Stanford University, Stanford, CA 94305 (United States)
2013-02-20
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piecewise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by Arias-Castro et al. In the spirit of Reproducible Research all of the code and data necessary to reproduce all of the figures in this paper are included as supplementary material.
Dynamic sensor action selection with Bayesian decision analysis
Kristensen, Steen; Hansen, Volker; Kondak, Konstantin
1998-10-01
The aim of this work is to create a framework for the dynamic planning of sensor actions for an autonomous mobile robot. The framework uses Bayesian decision analysis, i.e., a decision-theoretic method, to evaluate possible sensor actions and selecting the most appropriate ones given the available sensors and what is currently known about the state of the world. Since sensing changes the knowledge of the system and since the current state of the robot (task, position, etc.) determines what knowledge is relevant, the evaluation and selection of sensing actions is an on-going process that effectively determines the behavior of the robot. The framework has been implemented on a real mobile robot and has been proven to be able to control in real-time the sensor actions of the system. In current work we are investigating methods to reduce or automatically generate the necessary model information needed by the decision- theoretic method to select the appropriate sensor actions.
Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations
Scargle, Jeffrey D.; Norris, Jay P.; Jackson, Brad; Chiang, James
2013-01-01
This paper addresses the problem of detecting and characterizing local variability in time series and other forms of sequential data. The goal is to identify and characterize statistically significant variations, at the same time suppressing the inevitable corrupting observational errors. We present a simple nonparametric modeling technique and an algorithm implementing it-an improved and generalized version of Bayesian Blocks [Scargle 1998]-that finds the optimal segmentation of the data in the observation interval. The structure of the algorithm allows it to be used in either a real-time trigger mode, or a retrospective mode. Maximum likelihood or marginal posterior functions to measure model fitness are presented for events, binned counts, and measurements at arbitrary times with known error distributions. Problems addressed include those connected with data gaps, variable exposure, extension to piece- wise linear and piecewise exponential representations, multivariate time series data, analysis of variance, data on the circle, other data modes, and dispersed data. Simulations provide evidence that the detection efficiency for weak signals is close to a theoretical asymptotic limit derived by [Arias-Castro, Donoho and Huo 2003]. In the spirit of Reproducible Research [Donoho et al. (2008)] all of the code and data necessary to reproduce all of the figures in this paper are included as auxiliary material.
Using Bayesian Population Viability Analysis to Define Relevant Conservation Objectives.
Adam W Green
Full Text Available Adaptive management provides a useful framework for managing natural resources in the face of uncertainty. An important component of adaptive management is identifying clear, measurable conservation objectives that reflect the desired outcomes of stakeholders. A common objective is to have a sustainable population, or metapopulation, but it can be difficult to quantify a threshold above which such a population is likely to persist. We performed a Bayesian metapopulation viability analysis (BMPVA using a dynamic occupancy model to quantify the characteristics of two wood frog (Lithobates sylvatica metapopulations resulting in sustainable populations, and we demonstrate how the results could be used to define meaningful objectives that serve as the basis of adaptive management. We explored scenarios involving metapopulations with different numbers of patches (pools using estimates of breeding occurrence and successful metamorphosis from two study areas to estimate the probability of quasi-extinction and calculate the proportion of vernal pools producing metamorphs. Our results suggest that ≥50 pools are required to ensure long-term persistence with approximately 16% of pools producing metamorphs in stable metapopulations. We demonstrate one way to incorporate the BMPVA results into a utility function that balances the trade-offs between ecological and financial objectives, which can be used in an adaptive management framework to make optimal, transparent decisions. Our approach provides a framework for using a standard method (i.e., PVA and available information to inform a formal decision process to determine optimal and timely management policies.
Using Bayesian Population Viability Analysis to Define Relevant Conservation Objectives.
Green, Adam W; Bailey, Larissa L
2015-01-01
Adaptive management provides a useful framework for managing natural resources in the face of uncertainty. An important component of adaptive management is identifying clear, measurable conservation objectives that reflect the desired outcomes of stakeholders. A common objective is to have a sustainable population, or metapopulation, but it can be difficult to quantify a threshold above which such a population is likely to persist. We performed a Bayesian metapopulation viability analysis (BMPVA) using a dynamic occupancy model to quantify the characteristics of two wood frog (Lithobates sylvatica) metapopulations resulting in sustainable populations, and we demonstrate how the results could be used to define meaningful objectives that serve as the basis of adaptive management. We explored scenarios involving metapopulations with different numbers of patches (pools) using estimates of breeding occurrence and successful metamorphosis from two study areas to estimate the probability of quasi-extinction and calculate the proportion of vernal pools producing metamorphs. Our results suggest that ≥50 pools are required to ensure long-term persistence with approximately 16% of pools producing metamorphs in stable metapopulations. We demonstrate one way to incorporate the BMPVA results into a utility function that balances the trade-offs between ecological and financial objectives, which can be used in an adaptive management framework to make optimal, transparent decisions. Our approach provides a framework for using a standard method (i.e., PVA) and available information to inform a formal decision process to determine optimal and timely management policies. PMID:26658734
Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).
Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W
2016-07-20
Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022
Classical methods of assessing the uncertainty associated with radiation doses estimated using cytogenetic techniques are now extremely well defined. However, several authors have suggested that a Bayesian approach to uncertainty estimation may be more suitable for cytogenetic data, which are inherently stochastic in nature. The Bayesian analysis framework focuses on identification of probability distributions (for yield of aberrations or estimated dose), which also means that uncertainty is an intrinsic part of the analysis, rather than an 'afterthought'. In this paper Bayesian, as well as some more advanced classical, data analysis methods for radiation cytogenetics are reviewed that have been proposed in the literature. A practical overview of Bayesian cytogenetic dose estimation is also presented, with worked examples from the literature. (authors)
Missing data treatment method on cluster analysis
Elsiddig Elsadig Mohamed Koko; Amin Ibrahim Adam Mohamed
2015-01-01
The missing data in household health survey was challenged for the researcher because of incomplete analysis. The statistical tool cluster analysis methodology implemented in the collected data of Sudan's household health survey in 2006. Current research specifically focuses on the data analysis as the objective is to deal with the missing values in cluster analysis. Two-Step Cluster Analysis is applied in which each participant is classified into one of the identified pattern and the opt...
Cluster analysis for portfolio optimization
Tola, V; Gallegati, M; Mantegna, R N; Tola, Vincenzo; Lillo, Fabrizio; Gallegati, Mauro; Mantegna, Rosario N.
2005-01-01
We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio composition of the selected portfolio for a given value of the portfolio return are also investigated for each considered filtering method.
Cybis, Gabriela Bettella
2014-01-01
Combining models for phenotypic and molecular evolution can lead to powerful inference tools.Under the flexible framework of Bayesian phylogenetics, I develop statistical methods to address phylodynamic problems in this intersection.First, I present a hierarchical phylogeographic method that combines information across multiple datasets to draw inference on a common geographical spread process. Each dataset represents a parallel realization of this geographic process on a different group of ...
Uncertainty analysis using Beta-Bayesian approach in nuclear safety code validation
Highlights: • To meet the 95/95 criterion, the Wilks’ method is identical to the Bayesian approach. • A prior selection in Bayesian approach is of strong influenced on the code run times. • It is possible to utilize prior experience to reduce code runs to meet the 95/95 criterion. • The variation of the probability for each code runs is provided. - Abstract: Since best-estimate plus uncertainty analysis was approved by Nuclear Regulatory Commission for nuclear reactor safety evaluation, several uncertainty assessment methods have been proposed and applied in the framework of best-estimate code validation in nuclear industry. Among them, the Wilks’ method and Bayesian approach are the two most popular statistical methods for uncertainty quantification. This study explores the inherent relation between the two methods using the Beta distribution function as the prior in the Bayesian analysis. Subsequently, the Wilks’ method can be considered as a special case of Beta-Bayesian approach, equivalent to the conservative case with Wallis’ “pessimistic” prior in the Bayesian analysis. However, the results do depend on the choice of the pessimistic prior function forms. The analysis of mean and variance through Beta-Bayesian approach provides insight into the Wilks’ 95/95 results with different orders. It indicates that the 95/95 results of Wilks’ method become more accurate and more precise with the increasing of the order. Furthermore, Bayesian updating process is well demonstrated in the code validation practice. The selection of updating prior can make use of the current experience of the code failure and success statistics, so as to effectively predict further needed number of numerical simulations to reach the 95/95 criterion
Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters
Wang, Ziyu; De Freitas, Nando
2014-01-01
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Baye...
Gruber, Lutz F.; West, Mike
2016-01-01
The recently introduced class of simultaneous graphical dynamic linear models (SGDLMs) defines an ability to scale on-line Bayesian analysis and forecasting to higher-dimensional time series. This paper advances the methodology of SGDLMs, developing and embedding a novel, adaptive method of simultaneous predictor selection in forward filtering for on-line learning and forecasting. The advances include developments in Bayesian computation for scalability, and a case study in exploring the resu...
Risks Analysis of Logistics Financial Business Based on Evidential Bayesian Network
Bin Suo; Ying Yan
2013-01-01
Risks in logistics financial business are identified and classified. Making the failure of the business as the root node, a Bayesian network is constructed to measure the risk levels in the business. Three importance indexes are calculated to find the most important risks in the business. And more, considering the epistemic uncertainties in the risks, evidence theory associate with Bayesian network is used as an evidential network in the risk analysis of logistics finance. To find how much un...
Exploiting sensitivity analysis in Bayesian networks for consumer satisfaction study
Jaronski, W.; Bloemer, J.M.M.; Vanhoof, K.; Wets, G.
2004-01-01
The paper presents an application of Bayesian network technology in a empirical customer satisfaction study. The findings of the study should provide insight as to the importance of product/service dimensions in terms of the strength of their influence on overall satisfaction. To this end we apply a
In this paper, we present RADYBAN (Reliability Analysis with DYnamic BAyesian Networks), a software tool which allows to analyze a dynamic fault tree relying on its conversion into a dynamic Bayesian network. The tool implements a modular algorithm for automatically translating a dynamic fault tree into the corresponding dynamic Bayesian network and exploits classical algorithms for the inference on dynamic Bayesian networks, in order to compute reliability measures. After having described the basic features of the tool, we show how it operates on a real world example and we compare the unreliability results it generates with those returned by other methodologies, in order to verify the correctness and the consistency of the results obtained
A Latent Variable Bayesian Approach to Spatial Clustering with Background Noise
Kayabol, K.
2011-01-01
We propose a finite mixture model for clustering of the spatial data patterns. The model is based on the spatial distances between the data locations in such a way that both the distances of the points to the cluster centers and the distances of a given point to its neighbors within a defined window
Complexity analysis of accelerated MCMC methods for Bayesian inversion
Hoang, Viet Ha; Schwab, Christoph; Stuart, Andrew M.
2013-08-01
The Bayesian approach to inverse problems, in which the posterior probability distribution on an unknown field is sampled for the purposes of computing posterior expectations of quantities of interest, is starting to become computationally feasible for partial differential equation (PDE) inverse problems. Balancing the sources of error arising from finite-dimensional approximation of the unknown field, the PDE forward solution map and the sampling of the probability space under the posterior distribution are essential for the design of efficient computational Bayesian methods for PDE inverse problems. We study Bayesian inversion for a model elliptic PDE with an unknown diffusion coefficient. We provide complexity analyses of several Markov chain Monte Carlo (MCMC) methods for the efficient numerical evaluation of expectations under the Bayesian posterior distribution, given data δ. Particular attention is given to bounds on the overall work required to achieve a prescribed error level ε. Specifically, we first bound the computational complexity of ‘plain’ MCMC, based on combining MCMC sampling with linear complexity multi-level solvers for elliptic PDE. Our (new) work versus accuracy bounds show that the complexity of this approach can be quite prohibitive. Two strategies for reducing the computational complexity are then proposed and analyzed: first, a sparse, parametric and deterministic generalized polynomial chaos (gpc) ‘surrogate’ representation of the forward response map of the PDE over the entire parameter space, and, second, a novel multi-level Markov chain Monte Carlo strategy which utilizes sampling from a multi-level discretization of the posterior and the forward PDE. For both of these strategies, we derive asymptotic bounds on work versus accuracy, and hence asymptotic bounds on the computational complexity of the algorithms. In particular, we provide sufficient conditions on the regularity of the unknown coefficients of the PDE and on the
Bayesian analysis of the dynamic cosmic web in the SDSS galaxy survey
Leclercq, Florent; Wandelt, Benjamin
2015-01-01
Recent application of the Bayesian algorithm BORG to the Sloan Digital Sky Survey (SDSS) main sample galaxies resulted in the physical inference of the formation history of the observed large-scale structure from its origin to the present epoch. In this work, we use these inferences as inputs for a detailed probabilistic cosmic web-type analysis. To do so, we generate a large set of data-constrained realizations of the large-scale structure using a fast, fully non-linear gravitational model. We then perform a dynamic classification of the cosmic web into four distinct components (voids, sheets, filaments and clusters) on the basis of the tidal field. Our inference framework automatically and self-consistently propagates typical observational uncertainties to web-type classification. As a result, this study produces highly detailed and accurate cosmographic classification of large-scale structure elements in the SDSS volume. By also providing the history of these structure maps, the approach allows an analysis...
一种基于非参数贝叶斯模型的聚类算法%Data Clustering via Nonparametric Bayesian Models
张媛媛
2013-01-01
鉴于聚类分析是机器学习和数据挖掘领域的一项重要技术，并且与监督学习不同的是聚类分析中没有类别或标签的指导信息，所以如何选择合适的聚类个数(即模型选择)一直是聚类分析中的难点。由此提出了一种基于Dirichlet过程混合模型的聚类算法，并用collapsed Gibbs采样算法对混合模型的参数进行估计。新算法基于非参数贝叶斯模型的框架，能够在不断的采样过程中优化模型参数并形成合适的聚类个数。在人工合成数据集和真实数据集上的聚类实验结果表明：基于Dirichlet过程混合模型的聚类算法不但能够自动确定聚类个数，而且具有较强灵活性和鲁棒性。%Clustering is one of the most useful techniques in machine learning and data mining. In cluster analysis, model selection concerning how to determine the number of clusters is an important issue. Unlike supervised learning, there are no class labels and criteria to guide the search, so the model for clustering is always difficult to select. To tackle this problem, we present the concept of nonparametric clustering approach based on Dirichlet process mixture model (DPMM), and apply a collapsed Gibbs sampling technique to sample the posterior distribution. The proposed clustering algorithm follows the Bayesian nonparametric framework and can optimize the number of components and the parameters of the model. The experimental result of clustering shows that this Bayes model has promising properties and robust performance.
Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants
Jin, Ick Hoon
2014-03-01
Statistical inference for the models with intractable normalizing constants has attracted much attention. During the past two decades, various approximation- or simulation-based methods have been proposed for the problem, such as the Monte Carlo maximum likelihood method and the auxiliary variable Markov chain Monte Carlo methods. The Bayesian stochastic approximation Monte Carlo algorithm specifically addresses this problem: It works by sampling from a sequence of approximate distributions with their average converging to the target posterior distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm. A strong law of large numbers is established for the Bayesian stochastic approximation Monte Carlo estimator under mild conditions. Compared to the Monte Carlo maximum likelihood method, the Bayesian stochastic approximation Monte Carlo algorithm is more robust to the initial guess of model parameters. Compared to the auxiliary variable MCMC methods, the Bayesian stochastic approximation Monte Carlo algorithm avoids the requirement for perfect samples, and thus can be applied to many models for which perfect sampling is not available or very expensive. The Bayesian stochastic approximation Monte Carlo algorithm also provides a general framework for approximate Bayesian analysis. © 2012 Elsevier B.V. All rights reserved.
A Dynamic Bayesian Approach to Computational Laban Shape Quality Analysis
Dilip Swaminathan
2009-01-01
kinesiology. LMA (especially Effort/Shape emphasizes how internal feelings and intentions govern the patterning of movement throughout the whole body. As we argue, a complex understanding of intention via LMA is necessary for human-computer interaction to become embodied in ways that resemble interaction in the physical world. We thus introduce a novel, flexible Bayesian fusion approach for identifying LMA Shape qualities from raw motion capture data in real time. The method uses a dynamic Bayesian network (DBN to fuse movement features across the body and across time and as we discuss can be readily adapted for low-cost video. It has delivered excellent performance in preliminary studies comprising improvisatory movements. Our approach has been incorporated in Response, a mixed-reality environment where users interact via natural, full-body human movement and enhance their bodily-kinesthetic awareness through immersive sound and light feedback, with applications to kinesiology training, Parkinson's patient rehabilitation, interactive dance, and many other areas.
A Bayesian Analysis of the Radioactive Releases of Fukushima
Tomioka, Ryota; Mørup, Morten
2012-01-01
types of nuclides and their levels of concentration from the recorded mixture of radiations to take necessary measures. We presently formulate a Bayesian generative model for the data available on radioactive releases from the Fukushima Daiichi disaster across Japan. From the sparsely sampled...... Fukushima Daiichi plant we establish that the model is able to account for the data. We further demonstrate how the model extends to include all the available measurements recorded throughout Japan. The model can be considered a first attempt to apply Bayesian learning unsupervised in order to give a more......The Fukushima Daiichi disaster 11 March, 2011 is considered the largest nuclear accident since the 1986 Chernobyl disaster and has been rated at level 7 on the International Nuclear Event Scale. As different radioactive materials have different effects to human body, it is important to know the...
Bayesian Analysis of the Black-Scholes Option Price
Darsinos, Theofanis; Stephen E Satchell
2001-01-01
This paper investigates the statistical properties of the Black-Scholes option price under a Bayesian approach. We incorporate randomness, both in the price process and in volatility, to derive the prior and posterior densities of a European call option. Expressions for the density of the option price conditional on the sample estimates of volatility and on the asset price respectively, are also derived. Numerical results are presented to compare how the dispersion of the option price changes...
Bayesian analysis of recursive SVAR models with overidentifying restrictions
Kociecki, Andrzej; Rubaszek, Michał; Ca' Zorzi, Michele
2012-01-01
The paper provides a novel Bayesian methodological framework to estimate structural VAR (SVAR) models with recursive identification schemes that allows for the inclusion of over-identifying restrictions. The proposed framework enables the researcher to (i) elicit the prior on the non-zero contemporaneous relations between economic variables and to (ii) derive an analytical expression for the posterior distribution and marginal data density. We illustrate our methodological framework by estima...
SPAM FILTERING FOR OPTIMIZATION IN INTERNET PROMOTIONS USING BAYESIAN ANALYSIS
Ion SMEUREANU; Madalina ZURINI
2010-01-01
The main characteristics of an e-business and its promoting are presented. It contains ways of promoting an e-business, examined in depth the e-mail marketing principle along with advantages and disadvantages of the implementation. E-mail marketing metrics are defined for analyzing the impact on customers. A model for optimization the promoting process via email is created for reaching the threshold of profitability for electronic business. The model implements Bayesian spam filtering and app...
Bayesian analysis of the Hector’s Dolphin data
King, R; Brooks, S.P.
2004-01-01
In recent years there have been increasing concerns for many wildlife populations, due to decreasing population trends. This has led to the introduction of management schemes to increase the survival rates and hence the population size of many species of animals. We concentrate on a particular dolphin population situated off the coast of New Zealand, and investigate whether the introduction of a fishing gill net ban was effective in decreasing dolphin mortality. We undertake a Bayesian analys...
A genetic and spatial Bayesian analysis of mastitis resistance
Frigessi Arnoldo; Sæbø Solve
2004-01-01
Abstract A nationwide health card recording system for dairy cattle was introduced in Norway in 1975 (the Norwegian Cattle Health Services). The data base holds information on mastitis occurrences on an individual cow basis. A reduction in mastitis frequency across the population is desired, and for this purpose risk factors are investigated. In this paper a Bayesian proportional hazards model is used for modelling the time to first veterinary treatment of clinical mastitis, including both ge...
A genetic and spatial Bayesian analysis of mastitis resistance
Sæbø, Solve; Frigessi, Arnoldo
2004-01-01
A nationwide health card recording system for dairy cattle was introduced in Norway in 1975 (the Norwegian Cattle Health Services). The data base holds information on mastitis occurrences on an individual cow basis. A reduction in mastitis frequency across the population is desired, and for this purpose risk factors are investigated. In this paper a Bayesian proportional hazards model is used for modelling the time to first veterinary treatment of clinical mastitis, including both genetic and...
Bayesian network models in brain functional connectivity analysis
Ide, Jaime S.; Zhang, Sheng; Chiang-shan R. Li
2013-01-01
Much effort has been made to better understand the complex integration of distinct parts of the human brain using functional magnetic resonance imaging (fMRI). Altered functional connectivity between brain regions is associated with many neurological and mental illnesses, such as Alzheimer and Parkinson diseases, addiction, and depression. In computational science, Bayesian networks (BN) have been used in a broad range of studies to model complex data set in the presence of uncertainty and wh...
Hierarchical Bayesian analysis of somatic mutation data in cancer
Ding, Jie; Trippa, Lorenzo; Zhong, Xiaogang; Parmigiani, Giovanni
2013-01-01
Identifying genes underlying cancer development is critical to cancer biology and has important implications across prevention, diagnosis and treatment. Cancer sequencing studies aim at discovering genes with high frequencies of somatic mutations in specific types of cancer, as these genes are potential driving factors (drivers) for cancer development. We introduce a hierarchical Bayesian methodology to estimate gene-specific mutation rates and driver probabilities from somatic mutation data ...
Regional fertility data analysis: A small area Bayesian approach
Eduardo A. Castro; Zhen Zhang; Arnab Bhattacharjee; Martins, José M.; Taps Maiti
2013-01-01
Accurate estimation of demographic variables such as mortality, fertility and migrations, by age groups and regions, is important for analyses and policy. However, traditional estimates based on within cohort counts are often inaccurate, particularly when the sub-populations considered are small. We use small area Bayesian statistics to develop a model for age-specific fertility rates. In turn, such small area estimation requires accurate descriptions of spatial and cross-section dependence. ...
Bayesian analysis of hierarchical multi-fidelity codes
Gratiet, Loic Le
2011-01-01
This paper deals with the Gaussian process based approximation of a code which can be run at different levels of accuracy. This co-kriging method allows us to improve a surrogate model of a complex computer code using fast approximations of it. In particular, we focus on the case of a large number of code levels on the one hand and on a Bayesian approach when we have 2 levels on the other hand. Moreover, based on a Bayes linear formulation, an extension of the universal kriging equations are provided for the co-kriging model. We also address the problem of nested space-filling design for multi-fidelity computer experiments and we provide a significant simplification of the computation of the co-kriging cross-validation equations. A hydrodynamic simulator example is used to illustrate the comparison Bayesian versus non-Bayesian co-kriging. A thermodynamic example is used to illustrate the comparison between 2-level and 3-level co-kriging.
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection
Dhavala, Soma S.
2010-09-01
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.
Bayesian Analysis for Stellar Evolution with Nine Parameters (BASE-9): User's Manual
von Hippel, Ted; Jeffery, Elizabeth; Wagner-Kaiser, Rachel; DeGennaro, Steven; Stein, Nathan; Stenning, David; Jefferys, William H; van Dyk, David
2014-01-01
BASE-9 is a Bayesian software suite that recovers star cluster and stellar parameters from photometry. BASE-9 is useful for analyzing single-age, single-metallicity star clusters, binaries, or single stars, and for simulating such systems. BASE-9 uses Markov chain Monte Carlo and brute-force numerical integration techniques to estimate the posterior probability distributions for the age, metallicity, helium abundance, distance modulus, and line-of-sight absorption for a cluster, and the mass, binary mass ratio, and cluster membership probability for every stellar object. BASE-9 is provided as open source code on a version-controlled web server. The executables are also available as Amazon Elastic Compute Cloud images. This manual provides potential users with an overview of BASE-9, including instructions for installation and use.
A Gibbs sampler for Bayesian analysis of site-occupancy data
Dorazio, Robert M.; Rodriguez, Daniel Taylor
2012-01-01
1. A Bayesian analysis of site-occupancy data containing covariates of species occurrence and species detection probabilities is usually completed using Markov chain Monte Carlo methods in conjunction with software programs that can implement those methods for any statistical model, not just site-occupancy models. Although these software programs are quite flexible, considerable experience is often required to specify a model and to initialize the Markov chain so that summaries of the posterior distribution can be estimated efficiently and accurately. 2. As an alternative to these programs, we develop a Gibbs sampler for Bayesian analysis of site-occupancy data that include covariates of species occurrence and species detection probabilities. This Gibbs sampler is based on a class of site-occupancy models in which probabilities of species occurrence and detection are specified as probit-regression functions of site- and survey-specific covariate measurements. 3. To illustrate the Gibbs sampler, we analyse site-occupancy data of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly species in Switzerland. Our analysis includes a comparison of results based on Bayesian and classical (non-Bayesian) methods of inference. We also provide code (based on the R software program) for conducting Bayesian and classical analyses of site-occupancy data.
Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel
2012-01-01
In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the mo
PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off
Seldin, Yevgeny; Laviolette, François; Auer, Peter; Shawe-Taylor, John; Peters, Jan
2011-01-01
We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.
Stakhovych, Stanislav; Bijmolt, Tammo H. A.; Wedel, Michel
2012-01-01
In this article, we present a Bayesian spatial factor analysis model. We extend previous work on confirmatory factor analysis by including geographically distributed latent variables and accounting for heterogeneity and spatial autocorrelation. The simulation study shows excellent recovery of the model parameters and demonstrates the consequences…
PUNJABI TEXT CLUSTERING BY SENTENCE STRUCTURE ANALYSIS
Saurabh Sharma
2012-10-01
Full Text Available Punjabi Text Document Clustering is done by analyzing the sentence structure of similar documents sharing same topics and grouping them into clusters. The prevalent algorithms in this field utilize the vector space model which treats the documents as a bag of words. The meaning in natural language inherently depends on the word sequences which are overlooked and ignored while clustering. The current paper deals with a new Punjabi text clustering algorithm named Clustering by Sentence Structure Analysis(CSSA which has been carried out on 221 Punjabi news articles available on news sites. The phrases are extracted for processing by a meticulous analysis of the structure of a sentence by applying the basic grammatical rules of Karaka. Sequences formed from phrases, are used to find the topic and for finding similarities among all documents which results in the formation of meaningful clusters.
Mugnes, J-M
2015-01-01
Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-M\\'egantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence ve...
Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.
Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka
2014-02-01
In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain. PMID:24246289
Pooled Bayesian meta-analysis of two Polish studies on radiation-induced cancers
The robust Bayesian regression method was applied to perform meta-analysis of two independent studies on influence of low ionising radiation doses on the occurrence of fatal cancers. The re-analysed data come from occupational exposure analysis of nuclear workers in Swierk (Poland) and from ecological study of cancer risk from natural background radiation in Poland. Such two different types of data were analysed, and three popular models were tested: constant, linear and quadratic dose-response dependencies. The Bayesian model selection algorithm was used for all models. The Bayesian statistics clearly indicates that the popular linear no-threshold (LNT) assumption is not valid for presented cancer risks in the range of low doses of ionising radiation. The subject of LNT hypothesis use in radiation risk prediction and assessment is also discussed. (authors)
Type Ia Supernova Light Curve Inference: Hierarchical Bayesian Analysis in the Near Infrared
Mandel, Kaisey S; Friedman, Andrew S; Kirshner, Robert P
2009-01-01
We present a comprehensive statistical analysis of the properties of Type Ia SN light curves in the near infrared using recent data from PAIRITEL and the literature. We construct a hierarchical Bayesian framework, incorporating several uncertainties including photometric error, peculiar velocities, dust extinction and intrinsic variations, for coherent statistical inference. SN Ia light curve inferences are drawn from the global posterior probability of parameters describing both individual supernovae and the population conditioned on the entire SN Ia NIR dataset. The logical structure of the hierarchical Bayesian model is represented by a directed acyclic graph. Fully Bayesian analysis of the model and data is enabled by an efficient MCMC algorithm exploiting the conditional structure using Gibbs sampling. We apply this framework to the JHK_s SN Ia light curve data. A new light curve model captures the observed J-band light curve shape variations. The intrinsic variances in peak absolute magnitudes are: sigm...
A Bayesian analysis of extrasolar planet data for HD 208487
Gregory, P. C.
2005-01-01
Precision radial velocity data for HD 208487 has been re-analyzed using a new Bayesian multi-planet Kepler periodogram. The periodgram employs a parallel tempering Markov chain Monte Carlo algorithm with a novel statistical control system. We confirm the previously reported orbit (Tinney et al. 2005) of 130 days. In addition, we conclude there is strong evidence for a second planet with a period of 998 -62 +57 days, an eccentricity of 0.19 -0.18 +0.05, and an M sin i = 0.46 -0.13 +0.05 of Jup...
Bayesian Analysis of Demand Elasticity in the Italian Electricity Market
Maria Chiara D'Errico; Carlo Andrea Bollino
2015-01-01
The liberalization of the Italian electricity market is a decade old. Within these last ten years, the supply side has been extensively analyzed, but not the demand side. The aim of this paper is to provide a new method for estimation of the demand elasticity, based on Bayesian methods applied to the Italian electricity market. We used individual demand bids data in the day-ahead market in the Italian Power Exchange (IPEX), for 2011, in order to construct an aggregate demand function at the h...
Risk Analysis of New Product Development Using Bayesian Networks
MohammadRahim Ramezanian
2012-06-01
Full Text Available The process of presenting new product development (NPD to market is of great importance due to variability of competitive rules in the business world. The product development teams face a lot of pressures due to rapid growth of technology, increased risk-taking of world markets and increasing variations in the customers` needs. However, the process of NPD is always associated with high uncertainties and complexities. To be successful in completing NPD project, existing risks should be identified and assessed. On the other hand, the Bayesian networks as a strong approach of decision making modeling of uncertain situations has attracted many researchers in various areas. These networks provide a decision supporting system for problems with uncertainties or probable reasoning. In this paper, the available risk factors in product development have been first identified in an electric company and then, the Bayesian network has been utilized and their interrelationships have been modeled to evaluate the available risk in the process. To determine the primary and conditional probabilities of the nodes, the viewpoints of experts in this area have been applied. The available risks in this process have been divided to High (H, Medium (M and Low (L groups and analyzed by the Agena Risk software. The findings derived from software output indicate that the production of the desired product has relatively high risk. In addition, Predictive support and Diagnostic support have been performed on the model with two different scenarios..
Bayesian Analysis of Demand Elasticity in the Italian Electricity Market
Maria Chiara D'Errico
2015-09-01
Full Text Available The liberalization of the Italian electricity market is a decade old. Within these last ten years, the supply side has been extensively analyzed, but not the demand side. The aim of this paper is to provide a new method for estimation of the demand elasticity, based on Bayesian methods applied to the Italian electricity market. We used individual demand bids data in the day-ahead market in the Italian Power Exchange (IPEX, for 2011, in order to construct an aggregate demand function at the hourly level. We took into account the existence of both elastic and inelastic bidders on the demand side. The empirical results show that elasticity varies significantly during the day and across periods of the year. In addition, the elasticity hourly distribution is clearly skewed and more so in the daily hours. The Bayesian method is a useful tool for policy-making, insofar as the regulator can start with a priori historical information on market behavior and estimate actual market outcomes in response to new policy actions.
Exclusive breastfeeding practice in Nigeria: a bayesian stepwise regression analysis.
Gayawan, Ezra; Adebayo, Samson B; Chitekwe, Stanley
2014-11-01
Despite the importance of breast milk, the prevalence of exclusive breastfeeding (EBF) in Nigeria is far lower than what has been recommended for developing countries. Worse still, the practise has been on downward trend in the country recently. This study was aimed at investigating the determinants and geographical variations of EBF in Nigeria. Any intervention programme would require a good knowledge of factors that enhance the practise. A pooled data set from Nigeria Demographic and Health Survey conducted in 1999, 2003, and 2008 were analyzed using a Bayesian stepwise approach that involves simultaneous selection of variables and smoothing parameters. Further, the approach allows for geographical variations at a highly disaggregated level of states to be investigated. Within a Bayesian context, appropriate priors are assigned on all the parameters and functions. Findings reveal that education of women and their partners, place of delivery, mother's age at birth, and current age of child are associated with increasing prevalence of EBF. However, visits for antenatal care during pregnancy are not associated with EBF in Nigeria. Further, results reveal considerable geographical variations in the practise of EBF. The likelihood of exclusively breastfeeding children are significantly higher in Kwara, Kogi, Osun, and Oyo states but lower in Jigawa, Katsina, and Yobe. Intensive interventions that can lead to improved practise are required in all states in Nigeria. The importance of breastfeeding needs to be emphasized to women during antenatal visits as this can encourage and enhance the practise after delivery. PMID:24619227
Bayesian analysis of deterministic and stochastic prisoner's dilemma games
Howard Kunreuther
2009-08-01
Full Text Available This paper compares the behavior of individuals playing a classic two-person deterministic prisoner's dilemma (PD game with choice data obtained from repeated interdependent security prisoner's dilemma games with varying probabilities of loss and the ability to learn (or not learn about the actions of one's counterpart, an area of recent interest in experimental economics. This novel data set, from a series of controlled laboratory experiments, is analyzed using Bayesian hierarchical methods, the first application of such methods in this research domain. We find that individuals are much more likely to be cooperative when payoffs are deterministic than when the outcomes are probabilistic. A key factor explaining this difference is that subjects in a stochastic PD game respond not just to what their counterparts did but also to whether or not they suffered a loss. These findings are interpreted in the context of behavioral theories of commitment, altruism and reciprocity. The work provides a linkage between Bayesian statistics, experimental economics, and consumer psychology.
Risk Analysis of New Product Development Using Bayesian Networks
Mohammad Rahim Ramezanian
2012-01-01
Full Text Available The process of presenting new product development (NPD to market is of great importance due to variability of competitive rules in the business world. The product development teams face a lot of pressures due to rapid growth of technology, increased risk-taking of world markets and increasing variations in the customers` needs. However, the process of NPD is always associated with high uncertainties and complexities. To be successful in completing NPD project, existing risks should be identified and assessed. On the other hand, the Bayesian networks as a strong approach of decision making modeling of uncertain situations has attracted many researchers in various areas. These networks provide a decision supporting system for problems with uncertainties or probable reasoning. In this paper, the available risk factors in product development have been first identified in an electric company and then, the Bayesian network has been utilized and their interrelationships have been modeled to evaluate the available risk in the process. To determine the primary and conditional probabilities of the nodes, the viewpoints of experts in this area have been applied. The available risks in this process have been divided to High (H, Medium (M and Low (L groups and analyzed by the Agena Risk software. The findings derived from software output indicate that the production of the desired product has relatively high risk. In addition, Predictive support and Diagnostic support have been performed on the model with two different scenarios.
New Developments in Fuzzy Cluster Analysis
Řezanková, H.; Húsek, Dušan
Praha: Nakladatelství Oeconomica, 2009 - (Fischer, J.), s. 403-416 ISBN 978-80-245-1600-4. [AMSE 2009. International Conference on Mathematics and Statistics in Economy /12./. Uherské Hradiště (CZ), 26.08.2009-28.08.2009] R&D Projects: GA ČR GA205/09/1079; GA MŠk(CZ) 1M0567 Institutional research plan: CEZ:AV0Z10300504 Keywords : fuzzy cluster analysis * ensembles of fuzzy clustering * relationships between clusters and variables * cluster number determination Subject RIV: BB - Applied Statistics, Operational Research
Gutiérrez, Jose Manuel; San Martín, Daniel; Herrera, Sixto; Santiago Cofiño, Antonio
2016-04-01
The growing availability of spatial datasets (observations, reanalysis, and regional and global climate models) demands efficient multivariate spatial modeling techniques for many problems of interest (e.g. teleconnection analysis, multi-site downscaling, etc.). Complex networks have been recently applied in this context using graphs built from pairwise correlations between the different stations (or grid boxes) forming the dataset. However, this analysis does not take into account the full dependence structure underlying the data, gien by all possible marginal and conditional dependencies among the stations, and does not allow a probabilistic analysis of the dataset. In this talk we introduce Bayesian networks as an alternative multivariate analysis and modeling data-driven technique which allows building a joint probability distribution of the stations including all relevant dependencies in the dataset. Bayesian networks is a sound machine learning technique using a graph to 1) encode the main dependencies among the variables and 2) to obtain a factorization of the joint probability distribution of the stations given by a reduced number of parameters. For a particular problem, the resulting graph provides a qualitative analysis of the spatial relationships in the dataset (alternative to complex network analysis), and the resulting model allows for a probabilistic analysis of the dataset. Bayesian networks have been widely applied in many fields, but their use in climate problems is hampered by the large number of variables (stations) involved in this field, since the complexity of the existing algorithms to learn from data the graphical structure grows nonlinearly with the number of variables. In this contribution we present a modified local learning algorithm for Bayesian networks adapted to this problem, which allows inferring the graphical structure for thousands of stations (from observations) and/or gridboxes (from model simulations) thus providing new
A Bayesian Surrogate Model for Rapid Time Series Analysis and Application to Exoplanet Observations
Ford, Eric B; Veras, Dimitri
2011-01-01
We present a Bayesian surrogate model for the analysis of periodic or quasi-periodic time series data. We describe a computationally efficient implementation that enables Bayesian model comparison. We apply this model to simulated and real exoplanet observations. We discuss the results and demonstrate some of the challenges for applying our surrogate model to realistic exoplanet data sets. In particular, we find that analyses of real world data should pay careful attention to the effects of uneven spacing of observations and the choice of prior for the "jitter" parameter.
Application of Bayesian networks for risk analysis of MV air insulated switch operation
Electricity distribution companies regard risk-based approaches as a good philosophy to address their asset management challenges, and there is an increasing trend on developing methods to support decisions where different aspects of risks are taken into consideration. This paper describes a methodology for application of Bayesian networks for risk analysis in electricity distribution system maintenance management. The methodology is used on a case analysing safety risk related to operation of MV air insulated switches. The paper summarises some challenges and benefits of using Bayesian networks as a part of distribution system maintenance management.
Bayesian Factor Analysis as a Variable-Selection Problem: Alternative Priors and Consequences.
Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric
2016-01-01
Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor-loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, Muthén & Asparouhov proposed a Bayesian structural equation modeling (BSEM) approach to explore the presence of cross loadings in CFA models. We show that the issue of determining factor-loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov's approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike-and-slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set is used to demonstrate our approach. PMID:27314566
Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis
Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy
2012-01-01
The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…
In this paper, the Genetic Algorithms (GA) and Bayesian model averaging (BMA) were combined to simultaneously conduct calibration and uncertainty analysis for the Soil and Water Assessment Tool (SWAT). In this hybrid method, several SWAT models with different structures are first selected; next GA i...
A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data
DeSarbo, WS; Kim, Y; Fong, D
1999-01-01
We present a new Bayesian formulation of a vector multidimensional scaling procedure for the spatial analysis of binary choice data. The Gibbs sampler is gainfully employed to estimate the posterior distribution of the specified scalar products, bilinear model parameters. The computational procedure
Exact WKB analysis and cluster algebras
We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)
Variational Bayesian Causal Connectivity Analysis for fMRI
Martin eLuessi
2014-05-01
Full Text Available The ability to accurately estimate effective connectivity among brain regions from neuroimaging data could help answering many open questions in neuroscience. We propose a method which uses causality to obtain a measure of effective connectivity from fMRI data. The method uses a vector autoregressive model for the latent variables describing neuronal activity in combination with a linear observation model based on a convolution with a hemodynamic response function. Due to the employed modeling, it is possible to efficiently estimate all latent variables of the model using a variational Bayesian inference algorithm. The computational efficiency of the method enables us to apply it to large scale problems with high sampling rates and several hundred regions of interest. We use a comprehensive empirical evaluation with synthetic and real fMRI data to evaluate the performance of our method under various conditions.
Unsupervised Transient Light Curve Analysis Via Hierarchical Bayesian Inference
Sanders, Nathan; Soderberg, Alicia
2014-01-01
Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometr...
A Software Risk Analysis Model Using Bayesian Belief Network
Yong Hu; Juhua Chen; Mei Liu; Yang Yun; Junbiao Tang
2006-01-01
The uncertainty during the period of software project development often brings huge risks to contractors and clients. Ifwe can find an effective method to predict the cost and quality of software projects based on facts like the project character and two-side cooperating capability at the beginning of the project, we can reduce the risk.Bayesian Belief Network(BBN) is a good tool for analyzing uncertain consequences, but it is difficult to produce precise network structure and conditional probability table. In this paper, we built up network structure by Delphi method for conditional probability table learning, and learn update probability table and nodes' confidence levels continuously according to the application cases, which made the evaluation network have learning abilities, and evaluate the software development risk of organization more accurately. This paper also introduces EM algorithm, which will enhance the ability to produce hidden nodes caused by variant software projects.
Bayesian analysis of repairable systems showing a bounded failure intensity
The failure pattern of repairable mechanical equipment subject to deterioration phenomena sometimes shows a finite bound for the increasing failure intensity. A non-homogeneous Poisson process with bounded increasing failure intensity is then illustrated and its characteristics are discussed. A Bayesian procedure, based on prior information on model-free quantities, is developed in order to allow technical information on the failure process to be incorporated into the inferential procedure and to improve the inference accuracy. Posterior estimation of the model-free quantities and of other quantities of interest (such as the optimal replacement interval) is provided, as well as prediction on the waiting time to the next failure and on the number of failures in a future time interval is given. Finally, numerical examples are given to illustrate the proposed inferential procedure
Direct message passing for hybrid Bayesian networks and performance analysis
Sun, Wei; Chang, K. C.
2010-04-01
Probabilistic inference for hybrid Bayesian networks, which involves both discrete and continuous variables, has been an important research topic over the recent years. This is not only because a number of efficient inference algorithms have been developed and used maturely for simple types of networks such as pure discrete model, but also for the practical needs that continuous variables are inevitable in modeling complex systems. Pearl's message passing algorithm provides a simple framework to compute posterior distribution by propagating messages between nodes and can provides exact answer for polytree models with pure discrete or continuous variables. In addition, applying Pearl's message passing to network with loops usually converges and results in good approximation. However, for hybrid model, there is a need of a general message passing algorithm between different types of variables. In this paper, we develop a method called Direct Message Passing (DMP) for exchanging messages between discrete and continuous variables. Based on Pearl's algorithm, we derive formulae to compute messages for variables in various dependence relationships encoded in conditional probability distributions. Mixture of Gaussian is used to represent continuous messages, with the number of mixture components up to the size of the joint state space of all discrete parents. For polytree Conditional Linear Gaussian (CLG) Bayesian network, DMP has the same computational requirements and can provide exact solution as the one obtained by the Junction Tree (JT) algorithm. However, while JT can only work for the CLG model, DMP can be applied for general nonlinear, non-Gaussian hybrid model to produce approximate solution using unscented transformation and loopy propagation. Furthermore, we can scale the algorithm by restricting the number of mixture components in the messages. Empirically, we found that the approximation errors are relatively small especially for nodes that are far away from
Urbi Garay
2016-03-01
Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.
Cluster Analysis of the Malaysian Hipposideros
Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.
2008-01-01
A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.
Buddhavarapu, Prasad; Smit, Andre F; Prozzi, Jorge A
2015-07-01
Permeable friction course (PFC), a porous hot-mix asphalt, is typically applied to improve wet weather safety on high-speed roadways in Texas. In order to warrant expensive PFC construction, a statistical evaluation of its safety benefits is essential. Generally, the literature on the effectiveness of porous mixes in reducing wet-weather crashes is limited and often inconclusive. In this study, the safety effectiveness of PFC was evaluated using a fully Bayesian before-after safety analysis. First, two groups of road segments overlaid with PFC and non-PFC material were identified across Texas; the non-PFC or reference road segments selected were similar to their PFC counterparts in terms of site specific features. Second, a negative binomial data generating process was assumed to model the underlying distribution of crash counts of PFC and reference road segments to perform Bayesian inference on the safety effectiveness. A data-augmentation based computationally efficient algorithm was employed for a fully Bayesian estimation. The statistical analysis shows that PFC is not effective in reducing wet weather crashes. It should be noted that the findings of this study are in agreement with the existing literature, although these studies were not based on a fully Bayesian statistical analysis. Our study suggests that the safety effectiveness of PFC road surfaces, or any other safety infrastructure, largely relies on its interrelationship with the road user. The results suggest that the safety infrastructure must be properly used to reap the benefits of the substantial investments. PMID:25897515
Figueira, P.; Faria, J. P.; Adibekyan, V. Zh.; Oshagh, M.; Santos, N. C.
2016-05-01
We apply the Bayesian framework to assess the presence of a correlation between two quantities. To do so, we estimate the probability distribution of the parameter of interest, ρ, characterizing the strength of the correlation. We provide an implementation of these ideas and concepts using python programming language and the pyMC module in a very short (˜ 130 lines of code, heavily commented) and user-friendly program. We used this tool to assess the presence and properties of the correlation between planetary surface gravity and stellar activity level as measured by the log( R^' }_{{HK}}) indicator. The results of the Bayesian analysis are qualitatively similar to those obtained via p-value analysis, and support the presence of a correlation in the data. The results are more robust in their derivation and more informative, revealing interesting features such as asymmetric posterior distributions or markedly different credible intervals, and allowing for a deeper exploration. We encourage the reader interested in this kind of problem to apply our code to his/her own scientific problems. The full understanding of what the Bayesian framework is can only be gained through the insight that comes by handling priors, assessing the convergence of Monte Carlo runs, and a multitude of other practical problems. We hope to contribute so that Bayesian analysis becomes a tool in the toolkit of researchers, and they understand by experience its advantages and limitations.
Bayesian analysis of data for a stochastic detector
Lesimple, M
2000-07-01
A study of the inverse problem, related to individual electron detectors for track nanodosimetry is presented. It is shown that, despite the stochastic character of these detectors, events such as the presence of clusters can be inferred from data independently of the conditions of measurement. An algorithmic reconstruction of the a priori probability distribution of ionisation is proposed. (author)
Bayesian analysis of data for a stochastic detector
A study of the inverse problem, related to individual electron detectors for track nanodosimetry is presented. It is shown that, despite the stochastic character of these detectors, events such as the presence of clusters can be inferred from data independently of the conditions of measurement. An algorithmic reconstruction of the a priori probability distribution of ionisation is proposed. (author)
Towards optimal cluster power spectrum analysis
Smith, Robert E.; Marian, Laura
2016-04-01
The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper, we develop a formalism to compute the optimal weights for the estimation of the matter power spectrum from cluster power spectrum measurements. We find a closed-form analytic expression for the optimal weights, which takes into account: the cluster mass, finite survey volume effects, survey masking, and a flux limit. The optimal weights are w(M,χ ) ∝ b(M,χ )/[1+bar{n}_h(χ ) overline{b^2}(χ )overline{P}(k)], where b(M, χ) is the bias of clusters of mass M at radial position χ(z), bar{n}_h(χ ) and overline{b^2}(χ ) are the expected space density and bias squared of all clusters, and overline{P}(k) is the matter power spectrum at wavenumber k. This result is analogous to that of Percival et al. We compare our optimal weighting scheme with mass weighting and also with the original power spectrum scheme of Feldman et al. We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster surveys. Finally, we present a new expression for the Fisher information matrix for cluster power spectrum analysis. Our expression shows that for an optimally weighted cluster survey the cosmological information content is boosted, relative to the standard approach of Tegmark.
Bayesian Analysis of Cosmic Ray Propagation: Evidence against Homogeneous Diffusion
Jóhannesson, G.; Ruiz de Austri, R.; Vincent, A. C.; Moskalenko, I. V.; Orlando, E.; Porter, T. A.; Strong, A. W.; Trotta, R.; Feroz, F.; Graff, P.; Hobson, M. P.
2016-06-01
We present the results of the most complete scan of the parameter space for cosmic ray (CR) injection and propagation. We perform a Bayesian search of the main GALPROP parameters, using the MultiNest nested sampling algorithm, augmented by the BAMBI neural network machine-learning package. This is the first study to separate out low-mass isotopes (p, \\bar{p}, and He) from the usual light elements (Be, B, C, N, and O). We find that the propagation parameters that best-fit p,\\bar{p}, and He data are significantly different from those that fit light elements, including the B/C and 10Be/9Be secondary-to-primary ratios normally used to calibrate propagation parameters. This suggests that each set of species is probing a very different interstellar medium, and that the standard approach of calibrating propagation parameters using B/C can lead to incorrect results. We present posterior distributions and best-fit parameters for propagation of both sets of nuclei, as well as for the injection abundances of elements from H to Si. The input GALDEF files with these new parameters will be included in an upcoming public GALPROP update.
OVERALL SENSITIVITY ANALYSIS UTILIZING BAYESIAN NETWORK FOR THE QUESTIONNAIRE INVESTIGATION ON SNS
Tsuyoshi Aburai; Kazuhiro Takeyasu
2013-01-01
Social Networking Service (SNS) is prevailing rapidly in Japan in recent years. The most popular ones are Facebook, mixi, and Twitter, which are utilized in various fields of life together with the convenient tool such as smart-phone. In this work, a questionnaire investigation is carried out in order to clarify the current usage condition, issues and desired functions. More than 1,000 samples are gathered. Bayesian network is utilized for this analysis. Sensitivity analysis is carried out by...
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Clustering analysis of telecommunication customers
REN Hong; ZHENG Yan; WU Ye-rong
2009-01-01
In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.
Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.
Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S
2016-05-01
A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection. PMID:26978823
Predicting the effect of missense mutations on protein function: analysis with Bayesian networks
Care Matthew A
2006-09-01
Full Text Available Abstract Background A number of methods that use both protein structural and evolutionary information are available to predict the functional consequences of missense mutations. However, many of these methods break down if either one of the two types of data are missing. Furthermore, there is a lack of rigorous assessment of how important the different factors are to prediction. Results Here we use Bayesian networks to predict whether or not a missense mutation will affect the function of the protein. Bayesian networks provide a concise representation for inferring models from data, and are known to generalise well to new data. More importantly, they can handle the noisy, incomplete and uncertain nature of biological data. Our Bayesian network achieved comparable performance with previous machine learning methods. The predictive performance of learned model structures was no better than a naïve Bayes classifier. However, analysis of the posterior distribution of model structures allows biologically meaningful interpretation of relationships between the input variables. Conclusion The ability of the Bayesian network to make predictions when only structural or evolutionary data was observed allowed us to conclude that structural information is a significantly better predictor of the functional consequences of a missense mutation than evolutionary information, for the dataset used. Analysis of the posterior distribution of model structures revealed that the top three strongest connections with the class node all involved structural nodes. With this in mind, we derived a simplified Bayesian network that used just these three structural descriptors, with comparable performance to that of an all node network.
In Bayesian inference, the initial knowledge regarding the value of a parameter, before additional data are considered, is represented as a prior probability distribution. This paper describes the derivation of a prior distribution of intake that was used for the Bayesian analysis of plutonium and uranium worker doses in a recent epidemiology study. The chosen distribution is log- normal with a geometric standard deviation of 6 and a median value that is derived for each worker based on the duration of the work history and the number of reported acute intakes. The median value is a function of the work history and a constant related to activity in air concentration, M, which is derived separately for uranium and plutonium. The value of M is based primarily on measurements of plutonium and uranium in air derived from historical personal air sampler (PAS) data. However, there is significant uncertainty on the value of M that results from paucity of PAS data and from extrapolating these measurements to actual intakes. This paper compares posterior and prior distributions of intake and investigates the sensitivity of the Bayesian analyses to the assumed value of M. It is found that varying M by a factor of 10 results in a much smaller factor of 2 variation in mean intake and lung dose for both plutonium and uranium. It is concluded that if a log-normal distribution is considered to adequately represent worker intakes, then the Bayesian posterior distribution of dose is relatively insensitive to the value assumed of M. (authors)
Using Cluster Analysis to Examine Husband-Wife Decision Making
Bonds-Raacke, Jennifer M.
2006-01-01
Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE
Sanders, N. E.; Soderberg, A. M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Betancourt, M., E-mail: nsanders@cfa.harvard.edu [Department of Statistics, University of Warwick, Coventry CV4 7AL (United Kingdom)
2015-02-10
Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST.
Mugnes, J.-M.; Robert, C.
2015-11-01
Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here, we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-Mégantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence velocities are often underestimated. We also confirm the trend that B stars in clusters are on average faster rotators than field B stars.
Application of Bayesian Network Learning Methods to Land Resource Evaluation
HUANG Jiejun; HE Xiaorong; WAN Youchuan
2006-01-01
Bayesian network has a powerful ability for reasoning and semantic representation, which combined with qualitative analysis and quantitative analysis, with prior knowledge and observed data, and provides an effective way to deal with prediction, classification and clustering. Firstly, this paper presented an overview of Bayesian network and its characteristics, and discussed how to learn a Bayesian network structure from given data, and then constructed a Bayesian network model for land resource evaluation with expert knowledge and the dataset. The experimental results based on the test dataset are that evaluation accuracy is 87.5%, and Kappa index is 0.826. All these prove the method is feasible and efficient, and indicate that Bayesian network is a promising approach for land resource evaluation.
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.
Orbanz, Peter; Roy, Daniel M
2015-02-01
The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation. Dirichlet process clustering, Gaussian process regression, and many other parametric and nonparametric Bayesian models fall within the remit of this framework; many problems arising in modern data analysis do not. This article provides an introduction to Bayesian models of graphs, matrices, and other data that can be modeled by random structures. We describe results in probability theory that generalize de Finetti's theorem to such data and discuss their relevance to nonparametric Bayesian modeling. With the basic ideas in place, we survey example models available in the literature; applications of such models include collaborative filtering, link prediction, and graph and network analysis. We also highlight connections to recent developments in graph theory and probability, and sketch the more general mathematical foundation of Bayesian methods for other types of data beyond sequences and arrays. PMID:26353253